Over the last 10 years, in our quest to develop modern software and services to make journal publishing more efficient and affordable, article production has long struck the Scholastica team as one of the most labor-intensive publication areas in need of an overhaul.
Manually formatting PDF articles in InDesign or an equivalent takes a lot of time, and in the age of digital scholarship, PDFs alone just aren’t cutting it anymore. Journals need HTML for online readers and XML for indexes, archives, and discovery services. That’s why Scholastica developed a digital-first production service that generates the PDF, HTML, and XML article files journals need at one time from a single source.
In the interview below, Scholastica CTO and Co-Founder Cory Schires discusses the development of Scholastica’s single-source production service, launched in 2018, how it differs from other XML-based workflows, and why we’re working to optimize the production process for both humans and machines.
How did the tech team go about exploring possible single-source production approaches? What was the development process like?
CS: Talking to customers, we knew production was a hard problem (probably the hardest technical problem in our industry). So we didn’t have any illusions about this being a quick or easy project.
We started by looking at how other platforms or services attacked the production problem. What we found was a mosaic of different approaches – some technical, some manual – and most often some mixture of the two. Of course, our goal wasn’t to copy an existing design but rather to gather all the good ideas and add a few new ideas to make a superior piece of technology – something that could produce beautiful, professional articles at a fraction of the cost.
As for the development process, we built it the same way we build everything – by following a methodical, iterative process of continuous improvement. In other words, we got something minimally working and gradually improved it by feeding it more and more complex articles.
What challenges did/do you see with XML-first production approaches, and why did the Scholastica development team choose to go in another direction?
CS: It makes perfect sense to follow an XML-first production process. There are lots of existing tools for converting from XML to other formats (e.g., PDF, HTML, etc). And, while these tools are far from perfect, it’s still a strong starting point – especially if your team lacks the technical chops to venture off the beaten path.
We ultimately chose not to place XML at the heart of our single-source production process because, to put it bluntly, XML is not human-friendly. It’s a schema for encoding metadata designed primarily to facilitate interoperability between machines. It’s just not for humans. If it were, it would look very different.
CS: Given the current industry landscape, we knew the production process could never be 100% fully automated. It doesn’t matter how much machine learning computer magic you throw at it, producing flawless articles will always require some degree of manual, human intervention. But we did want to create a single-source production solution using the power of software to automate manual tasks as much as possible.
From the very beginning, we decided to build a software tool that would treat our (human) typesetters like the human beings that they are (and not like robots being forced to parse dense XML documents day after day). In a sense, we approached it like we do all our software. We thought about our users and prioritized compassion, even if it made the technical challenges more difficult.
From a practical standpoint, this also makes sound business sense. By having a tool that’s easier and more pleasant to use, typesetters can actually enjoy their work, which means we can spend less time worrying about training and turnover. We also, of course, continue to improve the code so it’s doing as much of the heavy lifting as possible.
How is Scholastica supporting the flow of metadata and factoring interoperability into its production service?
CS: Supporting the flow and interoperability of metadata is easy – at least conceptually. You need to make sure you have a single, comprehensive, and flawless source of truth. With this in hand, you simply transform the data into whatever other format you need (e.g., PDF, HTML, etc.). It’s sort of the opposite of the famous “Garbage in, Garbage out” adage. If you like, you can think of it as, “Perfect in, Perfect out.”
To support the flow of metadata, Scholastica makes it easy for journals using our peer review system to collect key metadata elements in their manuscript submission form. And, from there, we make it easy for journals using our production service and/or Open Access hosting platform to carry that metadata over to those phases of publishing. We’re also continually working to provide journals with even more enriched, interoperable XML metadata that they can easily export to send other tools and systems, like discovery services. And we’re building ready-to-go integration options within our software as well, like our integrations with Crossref and PubMed Central. So it’s an ongoing process. We’re making it easier for journals to not only get the high-quality metadata they need but also move it where it needs to go.
Of course, the devil’s in the details. And there are many, many details to address when building a production tool.
CS: Machine learning is good at solving probabilistic problems (i.e., problems where the correct answer inherently involves some degree of guesswork). For example, machine learning can predict the likelihood that someone will open a customer outreach email. But will a real person in the world actually read the email? No algorithm can say for sure because no model can fully account for the vagaries of human behavior.
Returning to the question at hand, the process of typesetting an article often requires guesswork. For example, would this table look better on a landscape page? Did the author intend this text to be a header or a paragraph? How should this citation be parsed? And, trust me, I could keep going.
Using machine learning, we can automate some of this guesswork. While I doubt machine learning will ever have the power to automate the production process completely, we can still use it to automate more of the production process – eliminating tedious work and making things more efficient.
At Scholastica, that’s what we’re focused on, providing publishers with solutions that maximize efficiency and, in doing so, help keep costs down.