For more than six years a team at Harvard University Law School’s Library Innovation Lab has been busy working on the Caselaw Access Project (CAP), an initiative to digitize a collection of 360 years worth of United States court cases dating from 1658 to 2018. The project was initiated in an effort to make case law freely and easily available to legal scholars and the public. Last month, the fruits of the team’s labors were realized with the official launch of CAP. The published CAP corpus comprises 6.4 million unique cases and over 40 million pages of U.S. federal, state, and territorial case law documents from the Law School library.
CAP was funded and made possible by Harvard Law School and, in part, through a partnership with legal research and analytics startup Ravel. The new digital repository will help lower the cost of accessing historical court cases and it opens up new opportunities for legal scholars and programmers to process large sets of legal data via the CAP API and bulk data service. The CAP API enables users to browse and download cases using a few short commands and through its “bulk data” feature users can download whole zip files of content.
In the interview below, Kelly Fitzpatrick, Research Associate at Harvard University’s Berkman Klein Center for Internet & Society, discusses how CAP got started and the goals of the project.
KF: The Caselaw Access Project makes corpus of published U.S caselaw freely available online. After digitizing the collection of U.S. case law held by the Harvard Law School Library, we’re making that data available as part of the Caselaw Access Project API and bulk data service.
KF: The Caselaw Access Project was started with the goal of putting all published U.S. case law online in a freely available and usable way. Based at the Harvard Library Innovation Lab, we partnered with legal startup Ravel to support the digitization of the Harvard Law School Library case law collection. Over the past few years, we’ve had the benefit of working with an extended team from digitization specialists to developers and designers which I was able to join this September to work on building a research community around this dataset.
Between 2013 and 2018, the Library digitized over 40 million pages of U.S. court decisions. How did you process that information and what will legal scholars be able to do with it via the CAP API and bulk data service?
KF: There was a long road between starting this project and launching the CAP API and bulk data service. After digitizing the collection, followed by Optical Character Recognition (OCR) processing and the application of case level metadata, this is the first time the entirety of published U.S. case law has been available at this scale. We’re hoping that the CAP API and bulk data service will facilitate new types of scholarship, like using lines of code to find answers in 360 years of U.S. legal history.
KF: It’s been great watching the research community start to develop! In weeks since launch, we’ve had researchers from across academic disciplines express interest in working with the data. When our biggest goal here is having people start using this data to support their scholarship, building new things, or using new methodology to answer research questions, having users start generating this kind of activity is really something.
KF: Right now, we’re working on developing a research community around this dataset. We want to see how researchers are using Caselaw Access Project data, and look forward to seeing how as examples roll in. We’re excited to see where this goes!