“When XML Marks the Spot: Machine-readable journal articles for discovery and preservation,” a free webinar co-hosted by Scholastica, UOregon Libraries, and the GWU Masters in Publishing program, offers a crash course in the benefits of XML production and use cases. Read on to learn more about the discussion. You can also register to receive the full webinar recording on-demand here!
Are you working with a campus-based journal publishing program and looking for ways to make your articles more discoverable?
Depositing content into relevant academic indexes and archives presents a treasure trove of opportunities to expand their readership while ensuring long-term content preservation. Indexes and archives serve as valuable distribution channels, and inclusion in selective ones (e.g., PubMed Central) signals research quality. You may have heard about Extensible Markup Language (a.k.a. XML), one of the primary machine-readable formats academic databases use to ingest content, and wonder if that’s something you need to reach your journal indexing and archiving goals.
Scholastica recently co-hosted a webinar with the University of Oregon Libraries and George Washington University Masters in Publishing program that can help — “When XML Marks the Spot: Machine-readable journal articles for discovery and preservation“! During the webinar, Scholastica co-founder and CEO Brian Cody and co-founder and CTO Cory Schires covered the what, why, when, and how of XML article production, including:
- The different types of XML used by indexes and archives
- How producing article-level metadata and/or full-text XML can help unlock journal discovery and archiving opportunities with examples
- Additional benefits of XML for journal accessibility as well as publishing program and professional development
- When XML is needed and when it may not be the best use of journal resources
- Ways to produce XML, including an overview of Scholastica’s digital-first production service
This free webinar is now available to view on demand. Click here to register to receive the full recording!
Why are we talking about XML? The “When XML Marks the Spot” webinar kicked off with that question. Throughout the webinar, Brian and Cory focused on XML use cases for improving article discoverability and preservation, as well as the tradeoffs of time and money that come with XML article production and tips to help journal publishers weigh if it’s right for them.
Cory started off by providing a helpful XML primer, explaining that XML is essentially information wrapped in machine-readable tags creating structured data to help computers disambiguate information and “learn” about content (e.g., via specific metadata elements). He shared examples of Journal Article Tag Suite (a.k.a. JATS) XML from the National Information Standards Organization (NISO), the industry standard for journal publishing designed to support interoperability across scholarly communication systems.
Brian went on to discuss the distinction between XML metadata, which “identifies or describes the article” (JATS 1.3), and full-text XML, which “conveys the narrative content” (JATS 1.3). He also overviewed some of the many “flavors” of XML that different archives and indexes require, including PMC (NLM DTD), Silverchair (SCJATS), Crossref, and DOAJ. Brian explained that many indexes and archives only require XML metadata, not full-text XML, so journals should consider their specific needs when deciding whether full-text XML production is right for them.
Among clear-cut cases when journals are sure to benefit from full-text XML article production discussed during the webinar were:
- When full-text XML is necessary for indexing (e.g., PubMed Central requires full-text XML)
- When the journal or its production service can generate other formats simultaneously with XML (e.g., PDF and HTML)
- When it’s EASY (e.g., handled by a software/service) and allows the journal to include more Persistent Identifiers (a.k.a. PIDS) in its article-level metadata without more work
- When journal team members are technical and really enjoy working in XML
Brian also shared alternate ways to create structured data for articles in cases where journals don’t have a clear need for XML or lack the time and/or budgetary resources to embark on XML production. He also provided tips for getting the most out of manual metadata submissions to discovery services, noting that both Crossref and the Directory of Open Access Journals (DOAJ) offer manual metadata submission forms.
The webinar also walked through various possible approaches to XML production ranging from DIY to working with vendors/services, and the pros and cons of each, including:
- Hand coding
- OJS (converting files to XML and editing them via OJS plugins per this LPForum 2020 case study)
- Oxygen XML Editor
- Inera’s eXstyles / eXstyles Arc
- Scholastica’s Digital-First Production Service
The webinar wrapped up with key takeaways on XML production use cases and best practices for indexing and preservation and how to determine XML needs and opportunities for journals at every stage.
Below are links to some helpful tools and references from the webinar that we encourage you to check out:
- Library Publishing Curriculum Textbook (Library Publishing Coalition, 2021)
- The Important Role of the Editor in Making Science Accessible (DAISY Consortium, 2021)
- Improving journal metadata outputs, from basics to semantics: Interview with Jabin White (Scholastica Blog, 2019)
- Journal Publishing Tag Library NISO JATS Version 1.3 (ANSI/NISO Z39.96-2021)
- JATS4R (JATS for Reuse), a working group devoted to optimizing the reusability of scholarly content (2020)
- Crossref Web deposit form
- PMC Style Checker
- PMC XML Validator
- Google Scholar Inclusion Guidelines for Webmasters
The full webinar is available for free via an on-demand Zoom recording. Click here to register for the webinar and have the recording link sent directly to your inbox.
We’d love to hear your thoughts on this webinar and invite you to share them in the comments section!
This post was originally published on the 9th of December 2022 and updated on the 20th of January 2023. This webinar aired live on the 19th of January, 2023.