Archiving and indexing articles is paramount to expanding the reputation and reach of scholarly journals — but it’s not always easy. Many academic databases require machine-readable metadata to ingest content, which can be challenging for small publishers to produce in-house and, traditionally, expensive to outsource. And, even when archives and indexes do support manual metadata entry, filling out deposit forms can be incredibly time-consuming for journal teams and prone to error. Sound familiar?
At Scholastica, we understand these challenges, and we’re committed to making it possible for publishers of all sizes to reach the highest journal standards more sustainably. So we’re continually enhancing our peer review, production, and OA publishing software and services to help journals using those solutions reach their archiving and indexing goals with less manual work and fewer technical headaches all at lower cost, including by:
- Automatically generating machine-readable metadata files for journals using our production service and/or OA publishing platform in industry-standard formats
- Regularly taking steps to enrich the metadata we produce to help journals expand the reach and impacts of their articles
- Introducing integrations with major indexes and discovery services and enhancing our metadata deposits
- Supporting more required submission form metadata fields for journals using our peer review system
In this blog post, we’re rounding up everything you need to know about Scholastica’s archiving and indexing automations, including our latest metadata enhancements. Read on for the full details!
In the world of abstracting and indexing, producing article-level metadata in machine-readable formats is a must. Indexes can’t “read” article text (at least not yet!). They can only process information available in computer markup languages. The two main types of machine-readable metadata that matter for journals are:
- HTML metatags for crawler-based general and academic search engines like Google and Google Scholar
- XML metadata — the standard markup language used by academic journal archives and indexes
At a minimum, journal publishers should have HTML metatags on all their article web pages and produce front-matter XML metadata files for their articles. However, typesetting articles in full-text XML is preferable to allow for text and data mining, and it’s a publishing best practice. For example, Plan S lists producing full-text machine-readable article files in its strongly recommended technical criteria.
Adding another layer of complexity for many publishers is the need for XML to be in the JATS format. JATS, which stands for “Journal Article Tag Suite,” is a specific way of marking up XML articles developed by the National Information Standards Organization (NISO), and it’s the technical standard for scholarly journals. Most academic databases prefer JATS, and some require it, including all National Library of Medicine (NLM) indexes — PubMed, PubMed Central, and MEDLINE.
At Scholastica, we know reaching these kinds of article metadata and formatting standards can be daunting and expensive for small journal teams, so we’re introducing smart automations to make metadata production easier and more affordable. Scholastica automatically generates full-text JATS XML article files with rich metadata for all journals that use our software-based production service. And we create article-level HTML and JATS XML metadata for all journals that use our OA publishing platform. You can learn more about how we’re helping journals produce machine-readable metadata to make articles more discoverable here.
Machine-readable metadata is the foundation of digital content discovery. You can think of the metadata elements attached to articles like strings that online browsers and indexes can use to draw connections between pieces of content and return increasingly sophisticated search results, such as all articles associated with a particular author ORCID iD.
At Scholastica, we’re continually taking steps to enrich and enhance the metadata we create to add even more indexing “strings.”
- Adding full-text abstracts: We now include full-text abstracts in the article-level metadata we generate for journals using Scholastica’s OA publishing platform. And we include abstracts in content registration metadata deposits for journals using our Crossref integration, in line with the “Initiative for Open Abstracts“ (I4OA). Indexes and discovery services can parse full-text abstracts to draw deeper connections between articles and return more specific search results. Full-text abstracts can also be used for text and data mining to support large-scale content analysis.
- Including journal issue details in Crossref imports: For journals that follow issue-based publishing models and use Scholastica’s OA publishing platform Crossref integration, we’re also now including issue details in all Crossref article-level metadata deposits (e.g., publication date, volume, and issue number) to make finding and drawing connections between articles from particular issues easier.
- Supporting more required submission form metadata fields: As part of our peer review system submission form customization enhancements, we’ve also made it possible to require authors to input metadata fields, including ORCID iDs and institution details. And we’ve added the option to add custom instructions to submission form fields or sections that journals can use to explain metadata requirements. For example, if your journal wants to require all authors to input ORCID iDs but you’re concerned some may not be familiar with ORCID, you can add custom hint text to that form field with more information.
In addition to automatically producing rich machine-readable metadata for journals using our production service and/or OA publishing platform, we’re working to help journals automate deposits to major archives and indexes and get picked up by web browsers. We cover some of Scholastica’s key discovery support features below.
Crossref DOI registration: One of the most common indexing requirements and overall discovery best practices that journal publishers should follow is registering Digital Object Identifiers or DOIs for all of their articles. For journals using Scholastica’s OA publishing platform, adding DOIs to articles is easy using our automated Crossref DOI registration integration. We deposit rich machine-readable metadata with all the elements noted above, including full-text abstracts, ORCID iDs, and citation information, to help you get the full discovery benefits of Crossref membership. You can learn how to turn on this integration and the ways Crossref supports content indexing and discoverability in this blog post.
Portico digital archive: We also offer the option to integrate with Portico for journals using Scholastica’s OA publishing platform. When the integration is enabled, Scholastica deposits all new and updated articles into the archive, so Portico always has the latest version. Depositing articles into archives like Portico ensures that they will be available in perpetuity, even if your journal goes out of publication.
PubMed Central/PubMed search: Scholastica’s production service now offers a PMC integration option for journals admitted to that database, so publishers don’t have to wrestle with uploading files to the PMC server. We take care of all initial article deposits, and we even send any revisions made to published articles to PMC automatically. The new integration also makes it possible to speed up indexing in the PubMed search engine since PubMed pulls in content from PMC. Learn more about how we’re supporting journals with PMC indexing here.
The Directory of Open Access Journals (DOAJ): For journals admitted to the DOAJ using Scholastica’s OA publishing software, we also offer automatic DOAJ index deposits. To turn on that integration, follow the steps here. Scholastica will take care of the rest — automatically formatting article XML files to meet DOAJ’s standards and sending new articles to the index.
Google Scholar: We’re also working to help journals hosted on Scholastica’s OA Publishing Platform get indexed in Google Scholar. Our OA Publishing Platform includes a customizable journal website template structured to meet Google Scholar’s indexing criteria. And we automatically generate HTML metadata for all articles to make it easier for Google Scholar to find and crawl them. We cover why having your journals indexed in Google Scholar matters more than ever in this blog post.
If you publish a journal in the biomedical or life sciences and you’re looking to add articles to PubMed Central (PMC)/PubMed search, we have good news for you. The full-text JATS XML article files that Scholastica generates for journals using our production service now meet all of PMC’s technical requirements — so they’re ready for deposit with no work on your part. And we’re continuing to enhance how the XML we generate is structured to conforms to all of PMC’s preferred style criteria. Having Scholastica automatically typset articles in PMC compliant XML is helping publications like the Spartan Medical Research Journal more easily pursue PMC/PubMed search indexing.
Submitting journal content to academic archives and indexes is one of the most beneficial steps that publishers can take to improve their reputation among authors and attract more readers. However, it can also be one of the most challenging. At Scholastica, we’re working to help publishers of all sizes professionalize their journals and reach their archiving and indexing goals.
We’re constantly introducing new features, so stay tuned for more metadata enhancements and archiving and indexing integrations in the future! We’ll be updating this blog post with the latest options.
Update note: This blog post was originally published on the 7th of March 2019, and updated on the 25th of May 2022