Image Credit: Mimi Thian on Unsplash

The quality of the machine-readable metadata associated with academic journal articles is virtually as important as the quality of the research itself. Online search engines and academic indexes require machine-readable metadata to ingest and interpret information about journal articles. Without rich machine-readable metadata, the potential reach and impacts of journal articles are sure to be stunted because search engines and indexes will struggle to parse the articles and return them in relevant search results.

Most journal publishers know the importance of article-level metadata. But producing machine-readable metadata for all articles can be a challenge. Many journal teams lack the technical knowledge or time to create rich machine-readable metadata for all of their articles and are missing out on search optimization and indexing opportunities as a result.

Sound familiar? You’re not alone.

At Scholastica, we believe that all journals should be able to have high-quality metadata without technical hassles or manual work. So we’re making metadata production easier for journals. Scholastica automatically produces machine-readable metadata for all of the articles published using our open access publishing platform, and we make it easy to apply metadata collected during peer review to published articles saving journals time.

In this post, we overview the role of machine-readable metadata in article discovery and how Scholastica is helping open access journals produce the rich machine-readable metadata they need.

Web page metadata for all published articles

The first step to journal discoverability is having a search-optimized website. Many web search engines like Google and Google Scholar index content using “web crawlers,” also known as “spiders” or “bots,” which are automated internet programs that systematically “crawl” websites to identify and ingest new information. When web crawlers come to a journal website (or any website for that matter) they look for HTML metatags. HTML metatags are code tags that provide descriptive metadata to search engines in a format they can parse. In order for crawlers to be able to parse journal articles, bibliographic article information must be accessible to them in HTML metatags.

It’s also important for crawlers to be able to quickly locate individual articles and the article-level metadata associated with them. To optimize articles for search engines, all journal articles should be hosted on their own webpage (unique subdomain) that includes bibliographic article-level metadata. Some search engines like Google Scholar will only index articles if they are hosted on their own webpage, so this is very important!

Journals that use Scholastica’s open access publishing platform are able to publish on search optimized websites without any added work. Journals get a website template structured to enable search engine indexing that includes HTML metadata for all articles. Each article is published on its own webpage within the journal website, using a search-friendly URL structure, and articles automatically include bibliographic metadata formatted in HTML meta tags. Scholastica ensures that article pages are available to web crawlers at all times and that they’re easy for search engines to parse. So all articles published using Scholastica’s open access publishing platform can be indexed by web search engines right away.

Machine-readable XML files with rich article-level metadata

Example of XML article-level metadata file produced by Scholastica

Other core discovery outlets for academic journals are deposit-based abstracting and indexing databases (A&Is) and public archives. If you want your journal to be discoverable in relevant A&Is, such as MEDLINE, as well as many archives, like PubMed Central, being able to submit article details to those databases in machine-readable XML is a must.

With Scholastica’s open access publishing platform, journals don’t have to worry about figuring out how to format XML article-level metadata files. Scholastica automatically generates front-matter XML files for all journals using our open access publishing platform with the core metadata required by publishing standards organizations and open access initiatives like Plan S including:

  • Publisher
  • Journal title
  • ISSN
  • Article title
  • Author names
  • Abstract
  • Persistent Identifiers (e.g. ORCID, DOI)
  • Copyright license

Journals using Scholastica for typesetting also have the option to add additional funding metadata to articles. Scholastica formats all XML article files in the “Journal Article Tag Suite” or JATS standard developed by the National Information Standards Organization (NISO), so they are ready to be ingested by major academic discovery services. XML files are easily accessible from article pages, just click “Save article as” and select “XML” as shown in the gif above.

Import metadata from peer review saving you time

Import peer-reviewed article and metadata into Scholastica's publishing platform

Another aspect of metadata management that we know can be tricky for journals is moving metadata. No journal team likes having to manually apply the metadata they collected with article submissions to final published articles. At least not any that we know of!

Applying metadata to published articles is easier with Scholastica. Journals that use Scholastica for peer review and publishing can import all of their accepted articles and accompanying metadata straight from peer review into publishing. We make sure that the metadata can be easily parsed by search engines so all of your articles will be discoverable online.

Helping to make journals more discoverable: Metadata and beyond

At Scholastica, we’re committed to helping publishing programs of any size meet core journal standards sustainably. Producing rich-machine readable metadata for all journals using our open access publishing platform is just one of the ways we’re doing this. We’re also working to help journals submit article-level metadata to discovery services more easily with automated deposits to major indexes including the Directory of Open Access Journals (DOAJ).

Journals that use Scholastica’s typesetting service can enhance their article indexing and potential usage even further with full-text XML versions of all articles. Full-text XML files include article-level metadata and the full-text of the articles in structured machine-readable format, making it possible for indexes to parse and “understand” whole articles at a deeper level. With full-text XML files, articles can be used for text and data mining wherein online scripts or machine-learning tools are used to analyze article information for purposes such as language or data analysis. For example, a scholar might employ text and data mining to compile an aggregate of articles that reference a particular subject or to analyze related data sets within different articles.

To learn more about how Scholastica is supporting sustainable open access publishing and helping journals fulfill the Plan S implementation guidelines, visit our Product Roadmap: Plan S, Core Open Access Publishing Standards & Scholastica.