Image Credit: Fabian Grohs on Unsplash
Image Credit: Fabian Grohs on Unsplash

If you care about the articles in your academic journal showing up in online searches and scholarly indexes, then having quality metadata should be one of your top priorities. Metadata is data applied to a digital object, such as a journal article, that provides information about the object’s contents. When you think of metadata fields you can think of anything that someone might use to organize or look for particular articles online - from basics like journal title, ISSN, or article name, to more specifics like relevant keywords and authors’ ORCIDs.

Crossref, an official Digital Object Identifier (DOI) registration agency, knows metadata inside and out. They use metadata to populate the DOIs that they issue to thousands of journals with information about the content each DOI is associated with. In a previous blog post, Crossref’s member experience manager Anna Tolwinska shared the basics of the Crossref DOI application and submission process, noting the all-important rule that Crossref cannot process the details of a journal’s articles from the published text alone. Rather, they need to be given metadata in a “machine-readable” format that their computer system can “read” and use to store information about the article with it’s DOI. Similarly, many scholarly databases and mainstream online browsers like Google and Google Scholar rely on machine-readable metadata in full or in part to index digital objects and return them in relevant searches. So it’s vital that journals not just publish articles online and assume indexes and browsers will be able to process them - you need to add rich metadata to all articles to make them indexable and discoverable!

Recently, Crossref released a beta version of a new tool called “Participation Reports“ that helps their member journals know the quality of the machine-readable metadata associated with each of their Crossref DOIs. The Participation Reports tool can tell what metadata is stored in each of a jouranl’s DOIs and what’s missing to help journals know where they can improve their records. In this interview, Crossref’s head of metadata Patricia Feeney discusses the new tool, Crossref’s metadata requirements, and some general best practices to tell if your journal articles have rich machine-readable metadata.

Interview with Patricia Feeney

Can you briefly explain what machine-readable metadata is and why it’s so important that journals have machine-readable metadata for each of their articles?

PF: Machine-readable metadata is descriptive metadata that can be “read” by a computer. That means it’s organized, clean metadata that follows certain rules that make it easy for a machine to process. Today, many journal articles are processed using XML, which follows a certain set of rules that require you to label your titles, author names, and that sort of thing clearly so that a machine can “understand” the rules and know what each piece of data is for.

Machine-readable metadata is so important for discoverability. It is vital that there at least be basic bits of metadata about journals everywhere so that search engines and library discovery systems can import that metadata into their systems, connect readers with what they’re looking for, and also help readers discover new things to further their research. Machine-readable metadata is also important for proper citations. You want to make sure that articles are cited accurately. Someone looking at your paper needs to be able to find a clean link to what you’re citing and, at this point, it’s too much for people to do that all manually so it has to be done by machines.

How do journals submit machine-readable metadata to Crossref so your system can process it?

PF: We have our own metadata schematic that all of our members have to comply with. They have to be able to create XML and send that to us following the rules in our metadata schema. At this point it’s pretty standard practice, most platforms are able to export Crossref metadata. We also have tools for people who aren’t able to generate XML themselves. They can enter their metadata in manually and then those tools will generate the XML and send it to our system. But it’s all based on that XML schema, which has pretty strict rules.

I think the core metadata that publishers need to provide is basic citation information so that the metadata records we have can be identified. So the big things are title, author names, and any kind of identifiers like an ISSN needed to make a complete metadata record. We also have a lot of extra and optional metadata fields, like funding data. A lot of our members are sending that to us now. The main thing to remember overall is that you won’t have any data associated with your articles’ DOIs unless you send us clean machine-readable information like title, author, keywords and everything that we need to create a rich metadata record.

What are metadata best practices that you recommend? What would you say are the major dos and don’ts?

PF: We have some pretty simple, idealistic best practices for journal metadata, which is that it be clean, correct, and complete. That can be pretty tricky. I think for journals, in particular, it’s important to pay attention to author information - that’s where we can have a lot of issues with incorrect data. Including ORCIDs in metadata goes a long way to help clean that up. If publishers aren’t already doing that they should really consider collecting ORCIDs from authors. My other advice is to avoid cutting corners. You may say “it takes X amount of work to quality control our metadata to make sure that we don’t have any funny characters in our titles, I don’t think it’s worth it.” But journals should know that it is worth it. The metadata travels very far and it’s important to make sure that the details are correct.

Can you explain the new Participation Reports tool? How can members use it?

PF: Our Participation Reports tackle one of the biggest challenges our members have expressed with metadata, which is being unsure whether or not the metadata they’ve submitted is complete in our system. Basically, it looks at whether you are sending us all of the metadata that you think you’re sending.

When you first enter the Reports page there is a search box that you can use to look up reports by member name or publisher name. Then, once you’re in the publisher view, you can search for a specific article title. When you pull up the publisher view you will see a summary of all of the metadata overall at your journal or journals and then you can dive deeper from there. You can see the status of metadata for current content is, you can see exactly how many items have been registered with Crossref overall, and more.

It’s a very simple interface but it gets a lot of valuable information. For example, it displays whether or not a member is sending us full references and what portion of a journal’s article has DOIs populated with reference metadata. It doesn’t give you information about the underlying quality of your metadata. For example, it won’t be able to tell if your author names are correct, but it displays how many of articles have ORCIDs and that sort of thing. And that goes a long way for a lot of our members to see where they may be coming up short. Especially members who work with vendors and might not have a clear picture of what they’re sending us. Or some members work with a system that is processing a high volume of articles and it can be really hard to check every article to see, for example, if the references they are sending with articles are reaching Crossref every time. But now if they look at the Participation Report they can see things like - “I have only gotten fifty two percent of my references populated that tells me I need to dig a little deeper into what I am sending to Crossref.” So I think of it as a very simple but powerful tool. And it’s public, so other people can look at a publisher’s metadata.

What are the main benefits of journals using these reports to check their metadata? What do you hope will come from this new tool?

PF: I think that the real benefit to our members is that it gives them a snapshot of how they are doing in terms of sending us quality metadata. We have found in discussions with members that a lot of them want to send us better metadata but they sometimes struggle to get an overall picture of how they’re doing with their Crossref records. So this is kind of the first step towards letting publishers see that for themselves. This makes it easy for them to look at their journal articles and say things like - “so and so is supposed to be sending this information and I don’t see it here, I’m going to follow up with that.”

I think the other thing that we are hoping is that this new tool will raise awareness about the different types of metadata that can be sent to us. Our members know they need to send us basic citation metadata, but they might not know that they can send us text and data mining URLs for example. By looking at this report members can see the metadata options they are not yet using and that might spark something on their end to start sending us that new piece of metadata.

The Participation Report tool is still in the beta stage, and we would love to have more feedback on it and more feedback in general on what our members need from us to understand and improve the quality of their metadata.

Webinar Connecting Scholarly Metadata