Persistent identifiers, a.k.a. PIDs — most know them as unique identifiers that distinguish between and denote the location of information about objects, people, or concepts online (per the NNLM website definition).
But once PIDs are born, they take on a whole new, and still somewhat secret, life of their own. PIDs work quietly in the background of the scholarly communication ecosystem to not only provide persistent records/links to information about research funding, authorship, and more but also facilitate research linking, discovery, reporting, impact, and transparency.
Who are these unsung heroes? And what do journal teams and scholars need to know about PIDs to maximize their potential?
In this blog post, we’re taking a close-up lens to a PID’s life to answer the above questions and more. You can use the quick links below to jump to specific sections based on your level of PID knowledge. Let’s get to it!
- What are persistent identifiers (a.k.a. PIDs)?
- Why should scholarly publishers, journals, and researchers care about PIDs?
- Which PIDs should scholarly communication stakeholders prioritize right now?
- Key ways journals and scholars can implement PIDs to improve research discovery, impact, and transparency
What are persistent identifiers (a.k.a. PIDs)?
First, let’s flesh out what PIDs are exactly, or “who PIDs are,” keeping with our hero story!
As briefly discussed in the introduction to this blog post, PIDs are unique identifiers composed of strings of letters and numbers that provide persistent records for and often link to information about objects, people, or concepts online. Per CERN, the initial impetus for PIDs was to solve the problem of “link rot” when URLs go dead. So long as the data and URLs associated with a PID record are kept up to date, the PID will always resolve.
PIDs can be local to individual organizations, such as the unique IDs the National Library of Medicine internally assigns to each term in the National Library of Medicine’s Medical Subject Headings (MeSH), or managed by third parties and open to any applicable organization interested in using them, such as Digital Object Identifiers (DOIs), which provide persistent records/links to research content. Either way, proper PIDs are standardized, always following the same data syntax. For example, DOIs always consist of a prefix from the DOI registration agency and a unique suffix chosen by the publisher.
Pulling back to look at the broader research paradigm, PIDs are integral pieces of metadata (i.e., data applied to a digital object that provides information about the object’s contents), which is the foundation of the “research nexus.” The vision for which, as defined by Crossref, is a “rich and reusable open network of relationships connecting research organizations, people, things, and actions; a scholarly record that the global community can build on forever, for the benefit of society.”
PIDs’ standardized and verifiable nature helps ensure information accuracy and consistency, addressing common scholarly communication challenges like name changes, misspellings, or differing naming conventions. As a result, when publishers include PIDs in metadata formatted in a machine-readable markup language such as JATS XML, the ANSI/NISO standard for journal articles, they help to foster interoperability across digital tools and systems, including archives and abstracting and indexing (A&I) databases. In this way, one might also define PIDs as champions of research linking and discovery.
Why should scholarly publishers, journals, and researchers care about PIDs?
The initial use cases for PIDs — to prevent “link rot” and standardize information — are certainly more than sufficient reasons to implement them. But there’s so much more to PIDs.
Building off the points in the previous sections, PIDs support research discovery and consequently increase the potential impacts of research outputs, the researchers associated with them, and their organizations. In this way, PIDs also increase the online visibility of scholars and their organizations worldwide, contributing to Diversity, Equity, and Inclusion (DEI) in scholarly communication, as discussed during Scholastica’s past webinar “Connecting mosaics of scholarly identity through metadata.”
Among the key benefits of PIDs is that they help:
- Put research into context by providing standard information around authorship, institutional affiliations, funding, and more, thereby helping to paint a more comprehensive research picture and ensure proper credit where it’s due
- Promote research trust and transparency by eliminating ambiguity around information sources and associations and supporting research reproducibility and replicability
- Help streamline information flow/reduce administrative burdens by enabling publishers, funders, academic institutions, and service providers to automate metadata transmission across scholarly communication tools and systems so their teams or authors don’t have to do it manually (which can also lead to significant cost savings, as discussed by Alice Meadows, Co-Founder of the MoreBrains Cooperative)
- Improve research tracking by making it easier for research institutions and funders to monitor works from their authors and for authors to track their own impacts
- Improve research discoverability and accessibility by making it easier to link verified locations of research records and outputs, including any open access versions
- Strengthen research infrastructure by enabling shared tools/systems and individual organizations to more efficiently and effectively organize, describe, cross-reference, and associate pieces of meta(data)
Together, all these benefits make PIDs integral to realizing the vision of the FAIR data principles, which aim to make research Findable, Accessible, Interoperable, and Reusable. We cover additional ways scholarly journals can promote FAIR data here.
Which PIDs should scholarly communication stakeholders prioritize right now?
Since the dawn of “PID kind,” they have faced the proverbial chicken and egg problem. For PIDs to become trusted standard identifiers that help promote research linking and discovery, they first need significant uptake. That generally requires one or more organizations to dedicate their resources to promoting, maintaining, and continually honing data norms and infrastructure for any given PID to ensure its sustainability.
With that said, journals and scholars should prioritize PIDs backed by standard-setting organizations, including the following for research objects, research institutions, researchers themselves, and funding records:
- International Standard Serial Number (ISSN): Among the most well-known PIDs, ISSNs are unique identifiers for serial publications, including scholarly journals and magazines, managed by The International Centre for the registration of serial publications (CIEPS).
- Digital Object Identifier (DOI): While ISSNs identify journals and other serial publications, DOIs are unique persistent records/links for individual journal articles and other research outputs. Crossref is the leading DOI registration agency for scholarly content, and DataCite is the leading DOI registration agency for data sets.
- Research Organization Registry (ROR ID): ROR IDs are unique identifiers for research institutions created and managed by ROR, the only global, community-led registry of open persistent identifiers for research organizations.
- Open Researcher and Contributor ID (ORCID iD): ORCID iDs, created and managed by ORCID, are unique identifiers for authors of/contributors to scholarly communication that link to a profile collating their institutional affiliations and works.
- Crossref funder ID (funder ID): A unique identifier for an organization that funds research in the Crossref registry.
Crossref, DataCite, ROR, and ORCID all work closely together to develop and build trusted connections between PIDs and other identifiers and promote metadata management best practices.
Key ways journals and scholars can implement PIDs to improve research discovery, impact, and transparency
Now that we’ve covered the what, why, and how of PIDs, let’s dive into ways journal teams and researchers can leverage them to improve research discovery, impact, and transparency across the scholarly communication ecosystem.
Of course, the first step is to create and implement PIDs! To start:
Journals should join Crossref and register DOIs for all their content with machine-readable metadata, including as many PIDs as possible. Think of the metadata you add to your DOI records like strings that can be used to draw new connections between the research you publish and related works within Crossref’s database and beyond. We cover tips for iteratively expanding your article-level metadata in this blog post and more on how to harness the full discovery benefits of Crossref below.
Scholars should submit to journals that support PIDs, register DOIs for datasets when applicable, and create (free!) ORCID accounts to record and help connect their research contributions. When you register an ORCID, the default account visibility setting is “everyone,” which we recommend retaining to make your ORCID records as visible and discoverable as possible. However, you can limit the visibility of your account to “trusted organizations” if you’re more comfortable that way.
Below, we round up more specific steps journals and scholars can take from there.
Harness the full content linking and discovery potential of DOIs
As discussed above, arguably the best way to help PIDs get their footing is to include them in metadata deposits to content registration agencies like Crossref, which provides free research discovery tools and an open API that numerous A&I services use to retrieve Crossref records and include them in their search results.
Below are some of the many ways journals and scholars can harness the full discovery benefits of content registration.
Action steps for journals:
- Archive everything you publish: As discussed, the persistence of DOI URLs depends on publishers ensuring that they resolve with an active link. Of course, most scholarly journals share the aim of publishing quality content for as long as possible, but that can’t always be the case. That’s why it’s critical for journals to back up all of the content they publish in a persistent archive like Portico or CLOCKSS, both of which are dark archives meaning the content submitted to them will only release in the case of a “trigger event,” such as confirmation that a journal is no longer in publication. Archiving will ensure your journal content remains available in perpetuity, even if you have to cease publishing, and it enables Crossref to work with archives to ensure your DOIs continue to resolve to your content, as explained here. Journals using Scholastica’s OA hosting platform have the option to automatically integrate with Portico as explained here.
- Keep your DOI metadata current: When registering DOIs via Crossref, journals should aim to include as much rich metadata as possible and submit metadata updates for existing DOI records as needed (e.g., if article details change post publication). Doing so will maximize the metadata linking and discovery benefits of Crossref content deposits that we covered above. The best way to do this is by producing machine-readable JATS XML metadata in line with Crossref’s requirements and automating DOI registration deposits via an API integration. Software and service providers can help you here. For example, Scholastica automatically generates rich machine-readable metadata for all journals that use our digital-first production service and/or fully OA hosting platform, and our OA hosting platform includes an automatic DOI registration integration option for Crossref members.
- Make your citations open: As you’re working to enhance your Crossref deposits, one of the best steps you can take to increase your content’s discoverability, analysis, and linking potential is including open citations/references in your article-level metadata. There’s even a whole project around the benefits of this called the “Initiative for Open Citations“ (I4OC). Journals that submit open citations to Crossref are also eligible to participate in Crossref Cited-by to show which publications cited their articles. Crossref also automatically adds missing DOIs to the reference metadata of journals participating in Cited-by when possible, enriching them further. Of course, per Crossref’s member obligations, journals should also include DOI links for all references in the articles they publish.
- Keep honing your metadata deposits: Journals won’t have perfect metadata overnight or likely ever, actually. It’s essential to think of metadata management and enrichment as an ongoing process and reassess your metadata regularly to identify opportunities for improvement. For Crossref members, the best way to do this is using the Participation Reports tool (or you can try out the new beta Crossref Labs Reports version) to get a birds-eye view of what metadata your journal is depositing with DOIs, the percentage of DOI records with each metadata field, and which metadata fields are missing.
Action steps for authors:
- Register DOIs for gray literature: While most DOI duties fall on publishers, one area often dependent on researchers is registering DOIs for non-traditional research outputs like datasets, images, code, etc. — often called gray literature. For data, you can do this directly via Datacite or by uploading your datasets to a repository that provides DOIs like Dryad.
- Enable ORCID auto-update: One of the best ways for scholars to create new links between the DOIs for their articles and consequently increase the discoverability of their content and visibility of their research contributions is by adding all of their works and corresponding DOIs to their ORCID records. You can automate much of this process by turning on the ORCID auto-update feature to have Crossref automatically update your ORCID record with the details of any works you publish in a journal that includes ORCID iDs in the metadata they register with Crossref DOIs.
- Use DOIs to find open access versions of content: Scholars can also use DOIs as a tool to help expand access to research by checking Crossref’s metadata to look for open access versions of content associated with the metadata of the DOIs for articles they reference. There are also tools available to help you with this. For example, during Scholastica’s webinar on harnessing the discovery benefits of DOIs, Crossref Member Experience Manager Anna Tolwinska shared an example of Unpaywall using the publication information, license information, and preprint links in Crossref’s metadata to create an open browser plugin and database to help researchers find free versions of content.
In addition to improving research linking and discovery, another benefit of Crossref DOI registration for journals and authors is gaining access to valuable Crossref event data to track where and how their research outputs are being shared and cited online.
Promote metadata fitness to keep PIDs moving forward
The true power of PIDs as champions of information linking and discovery comes when they’re part of healthy metadata sets that can move between digital tools and systems to traverse the entire research landscape and forge the “research nexus.”
That’s why it’s so critical for journals and scholars to promote metadata fitness by helping to circulate quality machine-readable metadata (including PIDs!), starting with the metadata input into journal submission forms, which must move to final publications and discovery and tracking services from there.
At the highest level, as explained by Crossref’s head of metadata Patricia Feeney in a past Scholastica blog post, the idealistic best practices for journal metadata are that it be “clean, correct, and complete.” Below are steps journals and scholars can take to realize that vision.
Action steps for journal teams:
- Craft your metadata value proposition: A primary roadblock to promoting metadata fitness, as identified by the Metadata 2020 initiative, is that many question whether the benefits of prioritizing metadata production and management will outweigh the costs. One of the best ways journal publishers can help overcome this challenge is to identify and clearly communicate how quality metadata will bolster their organization, including reducing administrative burdens (which can again lead to cost savings!), ensuring compliance with open access mandates as applicable such as Plan S and the OSTP “Nelson Memo,” and improving the discoverability and accessibility of the research they publish, thereby increasing its value to scholars and society. A great example of this is the American Geophysical Union’s position statement on data.
- Make a metadata management plan: Once you’ve gotten all stakeholders on board to prioritize metadata, be sure to make a metadata management plan including what information you’ll collect; where, when, and how you’ll gather it; and who will be responsible for converting it into machine-readable JATS XML and depositing it into the archiving and discovery services you use (ideally in an automated way). Software and service providers like Scholastica can help you here. Be sure to also factor in metadata expansion initiatives!
- Verify PIDs when possible: It’s also critical to ensure the PID inputs you collect are accurate. Automating PID verification in entry forms is the fastest and easiest way to do this and, again, an area where software and services can help. For example, Scholastica’s peer review system integrates with ROR to automatically associate ROR IDs with organizations input into journals’ submission forms when applicable, ensuring verified inputs with no work on the part of authors.
Action steps for scholars:
- Watch what you enter into forms: This tip may be stating the obvious — but the first key to promoting metadata fitness as a researcher is ensuring the accuracy of any metadata you input into forms (e.g., for journal submissions, funding application forms, etc.), especially PIDs! Ideally, journals and institutions will offer automatic verification for PID inputs, but this isn’t always the case. Checking your metadata inputs is part of being a good Research Information Citizen, per the definition of Simon Porter, VP of Research Futures at Digital Science.
- Keep your ORCID up-to-date: For scholars, it’s so important to keep your ORCID account current by adding all your latest publications, activities, and institutional affiliations to it and keeping your biography up to date. Adding details and links to your ORCID record creates new opportunities for individuals and scholarly tools/services to find, discover, and connect your research contributions online via that PID. As discussed above, keeping your account current doesn’t have to be an entirely manual process, either. To ensure that new works/data automatically get pushed to your ORCID record when they become available, you can enable Crossref ORCID auto-update and link your ORCID with Datacite and Publons accounts if applicable.
- Include PIDs in references: Scholars can also help foster new connections between their research and that of others by including PIDs in the references/works cited section of their articles whenever possible, especially DOIs associated with data sets. At the highest level, this will help ensure your reference links resolve for those reading your work. And if/when journals include citations/references in the machine-readable metadata they register with Crossref DOIs, it will open the door for scholars and discovery services to make more metadata connections, as Crossref’s Geoffrey Bilder discussed in this blog post.
Join PID community forums and working groups
Finally, journal publishers, editors, and scholars can support PIDs to improve research discovery, impact, and transparency by participating in PID community forums and working groups.
You can follow international initiatives like the National PID Strategies Working Group, Persistent Identification of Instruments Working Group, and National Information Standards Organization (NISO) working groups, community updates, and events. Additionally, there are many more localized PID initiatives you can take part in. For example, ROR hosts bi-monthly Community Calls and an Annual Meeting. Crossref also has an active online Community Forum and hosts regular Community Events, including member update calls.
Additionally, you can help test new PID tools, like the beta Crossref Labs Reports, and keep tabs on Crossref working groups to identify ones you may be interested in joining.
Crossref also recently helped launch a new public forum for sharing publishing best practices and initiatives in partnership with the Committee on Publication Ethics (COPE), the Directory of Open Access Journals (DOAJ), and the Open Access Scholarly Publishers Association (OASPA) called The PLACE (Publishers Learning & Exchange Community). It’s an exciting emerging venue to find relevant publishing tips and connect with key stakeholders across the scholarly community.
We hope you’ve found this deep dive into the secret life of PIDs helpful! We’ve also included links to some further reading below.
If you have questions about any of the PID best practices we covered here or additional recommendations to add, we invite you to share them in the comments section or on social media. You can find Scholastica on LinkedIn and Twitter.
Further Reading:
- Building a Sustainable Research Infrastructure
- Building the plane as we fly it: the promise of Persistent Identifiers
- Making the Case for a PID-Optimized World
- Metadata connects the global community – summary of our Community update 2023
- The PID-optimised Research Lifecycle
- Why Publishers Should Care About Persistent Identifiers