Image Credit: Ishan on Unsplash

In the scholarly communication ecosystem metadata is one of the primary means by which information flows between different groups and systems. Metadata, which is any data applied to a digital object that provides information about the object’s contents, can be titled and organized in a variety of ways. However, in order for metadata to flow freely between groups, agreed upon data standards, use cases, and distribution processes are essential.

While there are fairly established metadata norms for scholarly communication outputs, there are still many grey areas in terms of how metadata is defined, organized, and shared. Since 2016, members of the Metadata 2020 initiative have been working to address these areas. The goal of the initiative is to understand how metadata is used throughout the research lifecycle and to develop recommendations to improve metadata including:

  • Defining key metadata terms
  • Developing shared data mapping standards and expectations
  • Establishing metadata best practices and principles
  • Highlighting the benefits of creating rich metadata and the stakeholders involved

In the below interview, Metadata 2020’s chief coordinator Laura Paglione discusses how Metadata 2020 got started, the stakeholders involved, and the projects they’re working on.

Q&A with Laura Paglione

Can you briefly overview how Metadata 2020 got started and how the different community groups involved were formed?

LP: Metadata 2020 is an initiative to advocate for richer metadata throughout the entire research ecosystem. So we’re looking at the different groups that metadata impacts including researchers, publishers, librarians, data publishers, repositories, and service providers to see how those groups are affected by metadata and the different roles that each group plays in developing and using metadata. We’re really focused on the benefits of having richer metadata.

The initiative was started about two years ago and, as it says in the name, the goal of Metadata 2020 is to have tangible insights by the end of 2020. So we’re right at the midpoint in the process now. A majority of the work that we’re doing is with committees made up of volunteers who are interested in affecting change. My role is to coordinate their efforts and help with the overall strategy of Metadata 2020 along with Ginny Hendricks and John Chodacki who are essentially the founding parties of Metadata 2020.

The vision for Metadata 2020 was developed in 2016 but it was really 2017 when we started having workshops and bringing people together to unpack the different groups involved in metadata and the main benefits and challenges that they saw. We started out with an advocacy group in 2017 and then that quickly formed into several community groups. There are now six community groups each talking about how metadata affects their part of the scholarly ecosystem and working on specific outputs.

Can you give an overview of some of the main metadata challenges the community groups are working to address?

LP: Each one of the project teams came up with a set of challenges that they feel are important that they’re now working to address. For example, one group is working on “researcher communications,” including how different groups talk to researchers about metadata, what researchers think about metadata, and how metadata affects them. There’s another group that is primarily concerned with the metadata terms that we all use and how we define them, as well as how metadata schemas are conceptualized. One challenge the groups are looking at is the fact that sometimes different terms are used to represent the same thing in different metadata schemas, which can cause confusion and make it harder for technical systems to talk to each other. Groups are also looking at the concept of richer metadata and what that means in different contexts and for different audiences. Another area being worked on is automation— we’ve been exploring what metadata automation exists and where there are opportunities.

Overall, probably the most important thing that all of the groups have been doing is gathering stories to understand how metadata is used and what the main challenges are to having richer more unified metadata.

Can you explain what you mean by richer metadata and why having richer metadata is so important?

LP: From Metadata 2020 we are recognizing that the term richer metadata means different things in different contexts. We feel like there isn’t really one definition for what “richer” means. Richer for one audience might mean that there are valid dates on information and all of the contributors to a particular item are included. For another audience, it might mean that the source of where the information came from is very rich and deep. So it really is dependent on the context. We’ve been talking about aspects of “richer metadata” that we feel are important in core areas including the idea of metadata being complete. Completeness is really relative depending on the situation, but essentially we mean that metadata is credible and that there is metadata about the sources the content is made up of for reference so that you can bring it back to the source. Complete metadata should also be maintained over time within the context of the item it’s being applied to. Sometimes metadata doesn’t need to be maintained for eternity and a shorter life span for the data is appropriate. So it is very context dependent and that has to be defined by the communities that are using the metadata.

What is the scope of metadata usage that you’re working within?

LP: Everything that we’re doing is around understanding how metadata is supporting future research now and what needs to be done to improve metadata for this purpose. A core concept that we’ve been focusing on is that if metadata is good for the researcher it’s good for everyone— it’s good for the publisher it’s good for digital repositories and whomever is using it. When we talk about metadata being good for researchers we mean that metadata should be making scholarly literature and resources more discoverable so that research can advance. So our focus is pretty narrow in that respect. It’s not about all kinds of metadata for all purposes it’s really about the advancement of research knowledge.

We’ve been trying to make this not just be a technical problem. We’ve been talking about the business impacts that metadata has and how changes in incentives or changes in how metadata is produced can change the cost of producing and consuming metadata. We’re really focused on answering questions around what the underlying incentives are for having richer metadata and how to get those benefits.

In terms of coming up with broad best practices for metadata what are the main challenges you see?

LP: The core understanding that we’re gaining is that nobody is against the idea of making metadata better. The roadblock is that improving metadata can be costly and many people sort of question the direct benefit that they or their group will get from metadata improvements. So what we’ve been looking at is how we measure the impact of good versus bad metadata to make the case for having good metadata wherever you are in the research life cycle— whether you’re a curator of metadata, a creator of metadata, a custodian, or a consumer.

One of the ways that we’ve been trying to gain insights into this is by doing flow diagrams of where metadata goes, who is touched by different components of metadata, and at what point in the research lifecycle different groups are affected. We’re looking at where metadata affects individuals and organizations that are using it or producing it so that we can start quantifying what the true impact is of having richer metadata, what the benefit is, and what the returns could be in different contexts. That is sort of the end game of what we’re trying to achieve and we’re right in the middle of that so we don’t have findings just yet.

How are you planning on collecting feedback from the scholarly community?

LP: During the first half of this year we’re going to be releasing all of the findings that the over 150 volunteers have led and contributed to. In some cases that will mean sharing clear results from research and in other cases that will mean sharing what the group learned from the research and other areas they’ve identified that need to be addressed in order to come up with clear results.

Those findings are going to be released between now and mid-year and we are going to be doing that in a variety of ways. For many of the projects, we’ll be running webinars to kind of kick off the feedback process. We’ll also be publishing all of the project results in a way that allows people to give written feedback.