The quality of the machine-readable metadata associated with academic journal articles matters virtually as much as the quality of the research itself. That’s because digital web browsers and scholarly indexes rely on metadata to ingest and interpret information about content so they can serve it up in search results. Without rich machine-readable metadata, discovery services will have a hard time finding and parsing journal articles.
Most journal publishers know the importance of article-level metadata. But producing machine-readable metadata for articles can be a challenge. Many journal teams lack the technical skills and/or time to produce rich machine-readable metadata, and they’re missing out on search optimization and indexing opportunities as a result.
Sound familiar? You’re not alone.
At Scholastica, we believe that all journals should be able to have rich machine-readable metadata without technical hassles or manual work. So we’re making metadata production easier through smart automations. We automatically generate machine-readable metadata for all articles typeset by our digital-first production service and any articles published using our Open Access journal hosting platform. We also make it easy to move metadata between our peer review system, production service, and OA journal hosting platform, so editorial teams don’t have to spend time re-inputting the same information in different places.
In this post, we overview the role of machine-readable metadata in article discovery and how Scholastica is helping journals produce the rich machine-readable metadata they need.
The first step to journal discoverability is having a search-optimized website. Many search engines like Google and Google Scholar index content using “web crawlers,” also known as “spiders” or “bots,” which are automated internet programs that systematically “crawl” websites to identify and ingest information about them. When crawlers come to a journal website (or any website for that matter) they look for HTML metatags. HTML metatags are code tags that provide descriptive metadata to search engines in a format they can understand. In order for crawlers to be able to parse journal articles, bibliographic information must be accessible to them in HTML metatags.
It’s also important for crawlers to be able to quickly locate individual articles and the metadata associated with them. To optimize articles for search engines, all journal articles should be hosted on their own webpage (unique subdomain) that includes bibliographic article-level metadata. Some search engines like Google Scholar will only index articles if they are hosted on their own webpage, so this is very important!
Journals that use Scholastica’s OA publishing platform are able to publish on search optimized websites without any added work. Journals get a website template structured to enable search engine indexing that includes HTML metadata for all articles. Each article is published on its own webpage within the journal website, using a search-friendly URL structure, and articles automatically include bibliographic metadata formatted in HTML meta tags. Scholastica ensures that article pages are available to web crawlers at all times and that they’re easy for search engines to parse. So all articles published using Scholastica’s OA publishing platform can be indexed by search engines right away.
Scholastica’s OA journal hosting platform also includes built-in search functionality, so metadata applied to published articles is searchable via journal websites. All Scholastica journal websites have a search bar in the upper right-hand corner that readers can use to find relevant articles and blogs faster. Readers can type in search terms like author names, keywords, and phrases to browse for content on particular topics, and even DOIs to find specific articles. Journals can enhance those search outcomes by applying more relevant semantically related keywords to their articles.
In addition to built-in search for Scholastica journal websites, we also give Scholastica users the ability to search across all articles hosted on our OA publishing platform via a universal “Browse Articles” search page. So editors, authors, and reviewers can explore and stumble upon content from all of the OA journals using Scholastica.
Of course, in addition to online search, the core discovery outlets for academic journals are scholarly abstracting and indexing databases (A&Is) and public archives. If you want your journal to be discoverable in relevant A&Is, such as MEDLINE, and many digital archives, like PubMed Central, being able to submit article details to those databases in machine-readable XML is a must.
Using Scholastica’s production service and/or OA publishing platform, journals don’t have to worry about figuring out how to format XML article-level metadata files. Scholastica automatically generates front-matter XML files for journals with the core metadata required by publishing standards organizations and open access initiatives like Plan S, including:
- Journal title
- Article title
- Author names
- Persistent Identifiers (e.g. ORCID, DOI)
- Related article DOIs
- Copyright license
- Funding information
- Journal issue details (e.g., publication date, volume, and issue number)
Scholastica formats all XML article files in the “Journal Article Tag Suite“ or JATS standard developed by the National Information Standards Organization (NISO), so they are ready to be ingested by academic discovery services. And the full-text XML of all articles typeset by our production service is also formatted to comply with PubMed Central’s (PMC) indexing criteria with the option to set up automated PMC deposits through Scholastica.
Another aspect of metadata management that we know can be tricky for journals is moving metadata. No journal team likes having to manually apply the metadata they collected with article submissions to final published articles. At least not any that we know of!
Applying metadata to published articles is easier with Scholastica. Journals that use Scholastica for peer review can import all of their accepted articles and accompanying metadata straight from our peer review system into our production service and/or publishing platform. We make sure that the metadata can be easily parsed by search engines so all of your articles will be discoverable online.
At Scholastica, we’re committed to helping scholarly journal publishers meet the latest industry standards efficiently and sustainably. Producing rich machine-readable metadata for all journals using our production service and/or OA publishing platform is just one of the ways we’re doing this. We’re also working to help journals submit article-level metadata to discovery services more easily with automated deposits to indexes including the Directory of Open Access Journals (DOAJ).
Journals that use Scholastica’s production service also get full-text XML versions of all articles, enhancing their discovery potential further. Full-text XML files include metadata and the full-text of the article in a structured machine-readable format, making it possible for indexes to understand the content at a deeper level. With full-text XML files, text and data mining of articles also becomes possible (i.e., using online scripts or machine-learning tools to analyze the content). For example, a scholar might employ text and data mining to compile an aggregate of articles that reference a particular subject or to compare related data sets. Having full-text machine-readable files is becoming a must in many disciplines as more scholars incorporate (meta)data analysis into their work, so we want to ensure it’s possible for journal programs of any size.
To learn more about how Scholastica is supporting sustainable Open Access publishing and helping journals meet digital publishing standards and the Plan S implementation guidelines, visit our Product Roadmap: Plan S, Core Open Access Publishing Standards & Scholastica.
This post was originally published on August 29, 2019 and updated with new feature information on July 28, 2021.