Since its founding in 2019, the Research Organization Registry (a.k.a. ROR), the first global community-led database of open persistent identifiers for scholarly organizations, has been helping to disambiguate institutional information, promote interoperability across research systems, and make content more easily discoverable online. And with IDs for over 104,000 organizations so far — ROR is growing fast.
At Scholastica, we’re proud to be one of the first service providers to integrate with ROR, which we now support across our peer review system, digital-first production service, and open access journal hosting platform. As ROR continues to develop, we know many are interested in examples of current use cases and how all scholarly communication stakeholders can best adopt and support ROR IDs. So we caught up with ROR’s Technical Community Manager, Amanda French, to learn more.
Many thanks to Amanda for taking the time for this interview!
Q&A with Amanda French
Can you explain what PID means exactly for those unfamiliar and what the benefits are in the context of ROR? Why are ROR IDs so important?
AF: Of course! The acronym PID stands for Persistent Identifier, and that’s one of the key features of ROR. We promise that ROR IDs will persist not just for the foreseeable future but for as close to forever as we can promise, given the continuing existence of institutions and things like that.
So just like Digital Object Identifiers or DOIs for academic content, which differ from URLs because you can count on them to resolve for the long term, a ROR ID will also persist. We create persistent URLs for organizations, and you can count on ROR URLs to resolve. For instance, if an organization changes its website URL or name, that can lead to confusion if there aren’t redirects for all past URLs or if the old name is used in different places online. ROR IDs help to prevent those types of situations.
The other thing that makes ROR so important is that we offer unique identifiers, which is crucial in disambiguating organizations from one another. Many organizations have the same names or very similar names. For instance, there are two Wheaton Colleges just in the United States. And some organizations have headquarters in other countries that people need to be able to differentiate. The fact that ROR IDs are unique PIDs is obviously essential to telling organizations apart.
The other thing that makes ROR unique among organizational identifiers is that we are totally open, which is key for developing a shared infrastructure. When you look up a ROR ID or go to a ROR URL, you can see all the metadata and all the information about that organization. And if you think there’s an error, you can suggest a change, and we will review it and implement it if it’s deemed correct. This careful curation is entirely free of charge, and we do it with the assistance of a curation advisory board made up of volunteers from around the world.
Openness is essential in this kind of infrastructure. In closed identifier systems, organizations have no real insight into whether the information passing between systems is correct, which can lead to challenges down the line. The other thing that’s really important about ROR is we are global. We’re always working to improve our coverage in organizations around the world, in Asia, Latin America, and so forth. And we do have good coverage for that already.
What are the primary current and emerging use cases for ROR, and which scholarly communication stakeholders are involved?
AF: The primary use case for ROR is identifying author affiliations in research outputs. For example, when a researcher at a particular university, laboratory, or other institution produces a journal article or white paper and says I am affiliated with the University of California, Los Alamos, CERN, or anything like that, if they only input that information as a text string there’s the potential for them to misspell it or use a different version than the norm. As you can imagine, that can lead to confusion and dirty data down the line, which can be a big mess because it makes it difficult for systems to interchange that data. When you have a ROR, you can normalize and regularize the text.
That primary ROR use case enables institutions to tell what their researchers have produced a lot more easily. So Caltech can query different systems to see all their researchers’ outputs, for instance. That’s the primary use case for ROR, and it’s surprising how difficult it is to do at present. It’s obviously something institutions want and need to track. They want to be able to query databases to answer the question “what are our researchers producing?” — and they want to have confidence in the results and know they’ve gotten everything.
The secondary use case for ROR is using ROR IDs to tell which research outputs are associated with particular funders. For example, if the Department of Energy in the U.S. wants to track all of the research outputs it has funded, right now, it’s really relying on getting researchers to report that. Like institutions, they can do some searches, but it’s hard to be confident they’re finding everything. If people misspell the name of the Department of Energy or forget to include it in their acknowledgments or whatnot, that’s a problem. But when research outputs say this research is associated with such and such funder with a ROR ID, it makes it much easier for funders to track the research outputs they’re backing.
And I think we’re just getting started on funders using ROR to verify authors. I think the idea is that they will be doing that eventually. It’s sort of like the telephone in this case: telephones are only useful when everybody has them. So, right now, not enough systems are using PIDs to make it hugely consistent. But they’re getting there. One system that recently integrated ROR is Proposal Central, which allows people to apply for grants. They have incorporated ROR IDs to identify funders and identify the institutions authors applying for funds are associated with — so we know it’s happening there. I think funders are increasingly learning about PIDs. They’re learning to register their grants with Crossref and collect PIDs such as ROR and ORCID iDs, and we’re participating in those conversations.
We have also seen several other emerging use cases for ROR. One example ROR’s team has been talking about is that some publishers are starting to use ROR IDs to manage open access deals. As many know, a newer OA model that’s come out in scholarly publishing is transformative agreements or TAs, and you’ll hear names for them like “Read and Publish” as well. In those cases, institutions agree to a contract with a publisher to pay for publishing their researchers’ work OA. So instead of tracking journal subscriptions, publishers are tracking institutional agreements. We’ve seen some publishers using ROR IDs for this, and we’re actually planning to begin a project to see how widespread it is.
The two publishers we know are doing this are Rockefeller University Press and Optica. They’re using ROR IDs in their internal systems to check when submissions are associated with an institution with a TA. We’re very interested in that because it wasn’t an anticipated use of ROR. So now we’re talking about how much we want to support it. It’s not so straightforward because, especially in the case of universities, they have different budgetary systems. For example, in some cases, medical schools are part of institutional budgets, and sometimes they’re treated as separate entities. So contracts can differ quite a lot, and it can be hard to map them to particular ROR IDs. But people are definitely doing so in at least a couple of cases, and we think perhaps more.
In what ways does including ROR IDs in article-level metadata support and enhance content registration and indexing initiatives?
AF: The other benefit to adding PIDs to metadata for publishers and authors, beyond disambiguating information, is it increases the potential discoverability and visibility of research. In the case of Crossref DOI registration specifically, many entities use the Crossref API to find and index content. So sending more standardized and rich metadata to Crossref increases the visibility of the research, like adding keywords to a web page. When the metadata you send to Crossref is better, it makes your metadata better across that system and every system that pulls data from it.
And there are a number of what I think of as “meta metadata projects” that rely on this, such as organizations tracking compliance with national OA mandates that can now use ROR IDs to help simplify tracking. For instance, Germany has a national OA policy, and they developed a system called the OA Monitor that relies on ROR. So that’s another emerging use case that I think is really important.
ROR is the first global, community-led registry of institutional identifiers. What are the benefits of this? And how can members of the scholarly community get involved in ROR?
AF: First, I think it’s important for people to know that just the development of ROR took three years of conversations among all kinds of stakeholders in scholarly communication. We mapped out all the features that differentiate ROR IDs from other PIDs over three years with the help of publishers, researchers, institutions, and other infrastructure players like Crossref, DataCite, and the California Digital Library. And now, we continue to involve all interested stakeholders in our community curation model and regular meetings.
We have quite a bit of quality control internally, and, as noted, we have an open process by which anyone can suggest a change to a record. We also try to solicit all kinds of feedback on our policies, metadata schemas, practices, and technical roadmap. So we try to do all that in the open and with copious amounts of community feedback.
The best way to get involved in ROR right now is to join our bi-monthly community calls, which you can register for on our Events page.
You recently had the 2023 ROR annual meeting — can you share some highlights?
AF: Last year we added at least a couple of thousand ROR records and updated about 10% of the entire registry. And ROR added two full-time staff members, one of whom is Adam Buttrick, our new Metadata Curation Lead. Adam created the community curation model we’re talking about now. Before he came, we were releasing updates to the ROR registry quarterly. And now we’re doing it at least once a month or more. So it’s a much more frequent pace of updates.
We also made some improvements to our technology. Part of our API allows you to match text strings to ROR IDs. That’s very popular, and we updated it to work even better. Something else we did that might sound like a small thing but also took quite a lot of work is we incorporated the ability to handle inactive organizations in ROR. A lot of times, an organization will close, or a company will be acquired and whatnot, and previously we didn’t have a way of saying this organization did exist but no longer exists. Now we do. Finally, another big thing early in 2022 was Crossref introducing support for ROR IDs in their schema. And we’ve seen an increase in people sending ROR IDs to Crossref since then, so that’s been huge.
So those were some of the big things we discussed at that annual meeting. We also had a hugely popular though quite technical session on how to approach the task of matching text strings for names of organizations to ROR IDs. Many organizations have databases with institution names, and they can be pretty messy right now. They may be in different formats or even languages. So how do you match those to ROR IDs? We had some fantastic presentations from people who have built machine learning and artificial intelligence tools to do that, which are, to be honest, even more sophisticated than the way you can do it with the ROR API, which is also an option.
We have a nice blog post with a full recap of the annual meeting events and recordings that community members can check out.
Where is ROR heading next? What are you looking forward to this year?
AF: One of the main things we’re doing in 2023 is overhauling the ROR metadata schema. So right now, we have one version of our API and one metadata schema, and we’re working to move to version two of our API and version two of our metadata schema. We’ll continue to support the existing ones for at least a year and then encourage people to move to the new versions. We conducted a round of community feedback for our initial metadata schema updates proposal, and now we’re coming up with a revised proposal summarizing and synthesizing all of the input. Then we’re going to ask for another feedback round and implement the next steps. So that’ll be most of what we work on this year.
The best way to explain the benefits of updating the metadata schema is to give an example of something we want to change. For instance, one of the key pieces of information in a ROR record is the organization’s name obviously. So we have a field where we store the name of the organization. Right now, that name field does not have a language code. Most of them are in English, but we know that, for example, organizations in Portugal might want their information to be in Portuguese, so not only do we want to allow for that, but we need to add a language code to the name field so we can tell what language the name is in. That’s just one example of many kinds. And we can’t fix that without changing the metadata. We’re presenting the results of that community feedback round at a call focused exclusively on our proposed metadata changes on March 16, which people can sign up for on our Events page.
Looking to the new year, we also hope more service providers will adopt ROR as Scholastica did. One of the immediate benefits authors submitting via Scholastica’s peer review system will see is that it’s easier to enter their institutional information. The other benefits, such as improved content discoverability, are a bit more behind the scenes. But it’s like working on the electricity or plumbing in your house. You don’t see it, but believe me, it makes the experience of being in that house much better. So that’s something we hope to see more, and as more service providers adopt ROR, it will support the quality and exchange of metadata throughout the scholarly communication ecosystem.