Image Credit: Samuel Zeller on Unsplash
Image Credit: Samuel Zeller on Unsplash

When you think about the role of journal publishers in the research lifecycle, content production and dissemination are two givens. But what about content preservation?

As journals move online and articles are published in digital, rather than physical, formats it’s becoming increasingly important for publishers to take steps to ensure that their articles will always be available to readers, even in the event of a publication being lost or discontinued. The best way to ensure that journal articles will always be accessible to readers is to deposit all published articles into a long-term digital preservation service or archive.

For open access journals, in addition to ensuring content preservation, archiving can help raise awareness of published articles. Many scholars use public archives, such as the National Library of Medicine’s full-text archive PubMed Central (PMC), to search for relevant content. Depositing articles into publicly accessible archives can expand their reach, use, and consequently impacts.

Today, archiving is considered a core function of journal publishers. For example, publishing standards organizations such as the Committee on Publication Ethics (COPE) and open access publishing initiatives such as Plan S include archiving in their publisher guidelines. These organizations and many others expect journal publishers to deposit all articles into at least one designated long-term archive and to also provide authors with information about where and how they can archive their articles.

In this post, we overview the different archiving options available and best practices for open access journals. We also cover how to establish an archiving plan for the journals you publish as well as an archiving policy for authors.

Journal archiving options: Dark archives and public archives

There are two main archiving options that all open access journals should consider—“dark” archives and publicly accessible archives. Let’s break it down:

  • Dark archives: A dark archive is a private archive that cannot be accessed by any users. The purpose of a dark archive is to secure access to content in the event of a publication being lost or discontinued. Dark archives will only release content when there is a “trigger event” such as confirmation that a journal is no longer in publication. Commonly used dark archives include Portico and CLOCKSS.

  • Public archives: As the name suggests, public archives are openly accessible to users. Public archiving options include preprint servers, public archive databases, and institutional repositories. Some well-known public archives include SSRN, arXiv, PMC, and Deep Blue.

While many public archives are open to content from any author or publisher, some only accept research from the communities that they directly serve. For example, Deep Blue is the official institutional repository for the University of Michigan Libraries and therefore only accepts content produced by the University of Michigan research community. Such institutional archives generally follow an author deposits model, meaning that the author is responsible for depositing articles into the archive.

When it comes to public archiving options, it’s also worth noting that while the name “preprint” has a pre-publication connotation, preprints can and do house many final published versions of articles. Some journals even publish via preprint servers using what’s known as a preprint overlay model. Examples of preprint servers include the social sciences preprint SocArxiv and the STEM preprint arXiv.

Which archiving option should you choose for open access journals—a dark archive or a publicly accessible archive? You can get the most benefits from archiving if you deposit articles into both types of databases. Dark archives are among the most secure options because they guarantee access to content in perpetuity. Public archives may also have content guarantees and, as noted, adding articles to them can help expand the reach of your publications.

Joining archives and depositing content

As you’re looking at different archiving options, it’s important to review the requirements for each. Many archives have inclusion criteria, such as only admitting journals that publish within designated subject areas or that meet specified publishing standards. For example, PMC only accepts journals in the biomedical and life sciences and requires journals to meet core publishing standards, including providing publicly available editorial board information and publication policies on the journal website. Archives with higher publishing standards, such as PMC, tend to be more trusted by scholars, improving the reputation and reach of the journals in them.

Many archives also have specific article formatting and deposit requirements to ensure that they can properly process content. In the past, we’ve written about how indexes must ingest conent in machine-readable formats in order to parse it, and archives are much the same. Most archives require journal publishers to deposit machine-readable, front-matter XML metadata files into them. Front-matter XML files contain the front matter of the article-but do not include the article’s actual body text.

For example, Portico’s journal submission guidelines state that PDF renditions of articles must be accompanied by metadata files, preferably in machine-readable XML. Metadata files submitted to Portico must include all front-matter article information including, journal title, ISSN, article ID or Digital Object Identifier, and a copyright statement. There are also some archives that require full-text machine-readable XML article file deposits, such as PMC. Full-text XML article files contain the complete article text in machine-readable language along with front-matter metadata.

Producing XML article files is on the more technical side, but today software can automate much of the process. For example, at Scholastica, we produce front-matter machine-readable XML files in the JATS standard for all journals that use our open access publishing platform and full-text JATS XML files for all journals that use our digital-first production service. These files are ready to be ingested by most major archives, so journals using Scholastica don’t have to worry about formatting files.

When publishers are ready to deposit XML files into archives they can usually do so in one of two ways: either uploading article files in batches (usually via an FTP server) or setting up automatic article deposits via an API content deposit feed. API stands for “Application Programming Interface” and is essentially a channel that different software applications can use to communicate with each other.

At Scholastica, we’re also working to make depositing content into archives easier for journals. If you use Scholastica’s open access publishing platform and you have a Portico account, you can set up automatic article deposits to Portico. To integrate your journal with Portico, just click “Manage Integrations” on your Publishing Settings page > select the Portico integration > and enter your Portico login credentials. We’ll be introducing additional archive integrations in the near future—so stay tuned!

Establishing archiving policies for authors

Many research funders and academic institutions now encourage or require authors to archive their articles. As a result, authors will want to know about your journals’ archiving policies both in terms of what archiving steps your journals are taking and what steps they can take as an author to preserve their research. You should include information on how you archive articles as well as a self-archiving policy for authors on the For Authors page of each of the journals you publish. Even for journals that publish under a CC BY 4.0 license, it’s best to provide authors with an explicit self-archiving policy.

You should also register a copyright policy and self-archiving policy for all of your journals via the SHERPA RoMEO publisher copyright policies and self-archiving database. Researchers use this database to check publisher and journal policies. SHERPA RoMEO is constantly being updated with suggestions from its users, based on publisher’s copyright transfer agreements, open access policies, and other publisher documents available online. So before registering an archiving policy in SHERPA RoMEO, it’s a good idea to check to see if you are already listed in the database by doing a quick search for your publisher name.

If you’re not yet included in SHERPA RoMEO, you can request to have a Publisher listing and Publisher Policy added to the database. Just click “Suggest“ in the top navigation and complete the appropriate form.

Ensuring access to your journal articles in the long term

Archiving is becoming a core function of journal publishers, particularly those producing online-only open access content. The advent of digital publishing made journal articles more accessible than ever before, but it also made them somewhat less permanent. It’s up to publishers and authors to take steps to back up digital content to ensure that research outputs will remain available in the long term. And, with added discovery benefits, archiving can even help expand your journal impacts.

To learn more about how Scholastica is supporting sustainable open access publishing and helping journals fulfill the Plan S implementation guidelines and core standards like long-term archiving, visit our Product Roadmap: Plan S, Core Open Access Publishing Standards & Scholastica.

OA Publishing Guide