Considering global advances from the first lunar landing to preparations for humans to venture to Mars (in the seemingly not so distant future!), you’d be hard-pressed to quantify the singular and collective value of the myriad research findings that have and are helping to make them possible. Yet, within the academic community, stakeholders must grapple with figuring out how to effectively evaluate and often “weigh” scholars’ contributions every day to inform decisions from grant allocation to tenure proceedings. And in this era of unprecedented research speed, many are understandably eager to have numerical research impact measures to more easily interpret ever-expanding digital “piles” of information and the growing number of journals publishing new research.
In an illuminating Nature review, Professor of Informatics at Indiana University Bloomington and President of the International Society for Scientometrics and Informetrics Cassidy R. Sugimoto spoke to the challenges of data-based research assessment and the perils of ignoring the flaws in once-deemed “gold standard” impact measures, including the Journal Impact Factor (JIF) and h-index. Sugimoto argued, “these measures often do more harm than good, creating what economists Margit Osterloh and Bruno Frey call a ‘taste for rankings,’ rather than a ‘taste for science’.” She pointed to established initiatives to reform research evaluation, including the Leiden Manifesto and the Declaration on Research Assessment (DORA).
Of course, such calls for community action rely on just that — widespread activities to create systemic change — and shifting entrenched norms is, well, hard. As exhibited by Sugimoto’s review, guidance for tracking research impacts still leans heavily on bibliometric systems that have been found to perpetuate inequities and reinforce Godard’s law — “when a measure becomes a target, it ceases to be a good measure.” The dilemma of figuring out how to use data to assess research outputs and their effects without directly or implicitly aggrandizing ranking-like approaches runs deep.
So what steps can the scholarly community take to reform research evaluation and move away from relying too heavily on finite metrics that could skew the results? And what are the roles of different stakeholders in doing so, including journal publishers?
The first of a two-part virtual NISO panel recently explored how to identify and employ more “meaningful” metrics in line with community initiatives like DORA to support rather than preclude holistic research assessment. “We, as a species, have a tendency to want to give things a grade or rate performance on a scale,” said NISO’s Executive Director Todd Carpenter, setting the tone for the panel. “There’s a fundamental question of measuring value and what value means. Is it something we can easily put a figure on and compare in the way we do prices at a grocery store?”
The resulting discussion between Rachel Borchardt, Science Librarian at American University Library, Brian Cody Scholastica’s CEO and Co-Founder, Rebecca Kennison, Executive Director of K|N Consultants, Stacy Konkiel, Senior Data Analyst at Altmetric, and Marie McVeigh, Head of Editorial Integrity for Web of Science yielded critical starting points for reframing conversations about research assessment.
Perhaps the most fundamental aspect of compiling and implementing more meaningful research metrics that the NISO panelists discussed is the importance of putting data into context. And, as the speakers noted, there are multiple facets of context to consider, including:
- The strengths and limitations of different metrics by discipline/subject matter (e.g., some metrics are better suited to certain types of research)
- The intended uses and overall strengths and limitations of particular data points (e.g., altmetrics are “indicators” of impact, not measures of quality and the JIF was never meant to be used to measure the impact of individual articles or scholars)
- The cultural context that a researcher is operating within and the opportunities, challenges, and biases they have experienced
- How and where a research output fits within scholars’ other professional contributions (e.g., recognizing how individual research outputs are part of broader bodies of work and also measuring the impacts of scholarly outputs that do not fit within traditional publication-based assessment systems)
As Rebecca Kennison explained, “We have the famous 3-legged stool of research, teaching, and service, and now each leg has kind of become its own bucket of data to shift analogies. Research is the biggest bucket in many ways because it has the most widgets we can count. There’s research output, grant funding, citations to published works, mentions on social media, and so on. But what’s missing is the context within which even those data points are gathered. Context is crucial to understanding the actual full range of work that’s being done, and that’s true to all disciplines.”
Stacy Konkiel added that factoring non-traditional research outputs into metrics-based assessment is crucial to recognizing the full breadth of scholars’ contributions and where their time is being spent. Konkiel noted how many scholars currently find themselves having to carve out a “second shift” for public scholarly work such as community engagement efforts or doing interviews for media outlets in addition to keeping up with the pervasive “publish or perish” culture. “They’re really having to do that engagement work on their, quote, spare time, and that to me demonstrates this continuing encroachment of work on life.”
Speaking to assessing journal quality for Web of Science, Marie McVeigh said it’s also important for the editors of journals and the scholars publishing in them to understand that indexers are and should be looking at more than just data points to evaluate publications in context. “The metrics that we produce are not the cause of our selection; they are the inheritors of that selection process, which is detailed. We do a data-driven, not metrics-driven, analysis of the value of publications to their communities as well as to the literature. And we look at journals’ adherence to publication standards […], some of that does become field-dependent in the same way that engagement with scholarship or engagement with the public becomes field-dependent in different contexts.”
Building off the need for context in research assessment, the NISO panelists also spoke to the importance of recognizing the network effects of scholarly outputs. Stacy Konkiel gave the example of researchers highlighting the work of colleagues within public engagement. “You have folks who are out there talking about ideas in their field and highlighting the work of other scholars. These are the ties and linkages between research that are difficult for tools like Altmetric to pick up on right now […] public engagement is so much more than how much people are tweeting about our research. So well Altmetric and tools like it get us part of the way, there’s still so much more to be done.”
Kennison later noted, “going back to the public scholarship part, that’s one of the things that we don’t know how to measure at all really. We don’t know how to say this particular project or this particular activity or this particular set of people working together actually transformed this neighborhood or class.”
McVeigh added the importance of also considering researchers’ career trajectories and how they contribute to scholarship within and across disciplines at different stages. “At different phases, different types of activities are expected with different levels of achievement or output or engagement. There are baseline values, and one of them is that publishing is not the only thing you do as a person that promotes scholarship. Teaching is not the only thing you do as a person that promotes scholarship. Also, those things must be done with deliberation and integrity. Those are necessary features and things that we can all agree on.”
McVeigh later spoke to the importance of considering the development stages and varying resources of scholarly journals in title assessment as well. She pointed to the Web of Science’s Emerging Sources Citation Index as an example. “Those are journals that meet our quality criteria but are not necessarily cited. We don’t use citation data as an assessment for them — because maybe a journal hasn’t had enough time to accrue citations. It may also be from a developing nation where the value of that content is certainly worth bringing into this index. How do we elevate the voices that it’s mentioned?”
Of course, putting scholarship into context requires more holistic means of communicating research impacts — an area that the NISO panelists emphasized all stakeholders can support, including research funders, institutions, publishers, discovery services, and academics. To enable more well-rounded research assessment, Rachel Borchardt said facilitating the creation of research narratives is paramount. “Metrics can only do so much. Numbers can only do so much. What we’re after is stories. It’s making your case in a way that makes sense for you as a researcher, that makes sense within your discipline, and that makes sense within your institution. That’s a lot of different things, and, ideally, they should all line up.”
Scholastica’s Co-Founder Brian Cody spoke to how journal editors seeking to demonstrate why they should receive support for a publication or highlight their editorships in tenure review can also take a narrative approach to conveying journal impacts. “On the technological side, we do see people coming to us saying ‘I need data to show the value of my publication or this article.’ And in those cases, you can get a number, but I think you also need to ask — do these numbers matter? Should you be making the case that the existence of this journal is actually a good? That’s a normative argument, not a quantitative one, so it requires shifting the paradigm. And that can sometimes be helpful and sometimes not depending on the power dynamics at play.”
In a related vein, Borchardt spoke to the need for research evaluation standards to be more inclusive. “We need to be inclusive and allow researchers to make their own case of saying this is my value, this is my impact, and this is how I want to measure it. I think the more we allow interplay between qualitative and quantitative conversations to happen rather than saying things like ‘everyone must demonstrate their Impact Factors,’ the more successful we can be.” Borchardt noted that doing this will require disciplinary and institutional leaders to come together and initiate conversations about the use and misuse of research metrics, the research values they care about, and when metrics may or may not be the best means of communicating that information.
Borchardt later added, “one of the things that kind of underpins a lot of this conversation is that our traditional tools have been built by power structures that exist in academia. The more that we can break away from traditional measures, the more we can say, it’s okay to fail, it’s okay to express research in different ways, it’s okay to publish somewhere other than a high-impact journal. Then we are not just valuing what we’ve traditionally valued, and we can start to find different ways to celebrate failure, for example.”
A theme highlighted throughout the NISO panel was that research metrics reform requires recognition of the multifaceted nature of the scholarly communication ecosystem. Marie McVeigh provided a helpful analogy saying, “I like to think of the effort of research assessment as creating a mosaic of metrics or points of information. It’s about thinking in terms of how to assemble things in a meaningful way to create a bigger picture, rather than whether any one tile is a good piece to put into the mosaic. How do you bring everything together so that you’re representing something far more complex?”
Harkening back to Sugimoto’s article, as she powerfully conveyed, “science does not happen in a vacuum,” and research evaluation shouldn’t either.