Caselaw Access Project gives free access to 360 years of American court cases
Screenshot of the Caselaw Access Project website.
A major project digitizing court cases launched Monday, making available 6.4 million American cases dating back to 1658.
The Caselaw Access Project—a partnership between Harvard Law School’s Library Innovation Lab and the legal research company Ravel Law—spent the last three years digitizing 627 reporters for a total of 40 million scanned pages. Outside of the Library of Congress, it is the most comprehensive database of its kind—totaling 200 terabytes of information.
“A project like this should be unnecessary,” Andrew Ziegler, director of the Library Innovation Lab, told LawSites Blog. “But many states are still putting stuff in books first.”
The collection includes nearly all cases from an American court—including territorial courts—between the 1658 Maryland case William Stone against William Boreman and June 30, 2018. The collection may or may not be added to in the future. The Caselaw Access Project says that it does not include “cases not designated as officially published, such as most lower court decisions; non-published trial documents such as party filings, orders, and exhibits; parallel versions of cases from regional reporters, unless those cases were designated by a court as official; [and] cases officially published in digital form, such as recent cases from Illinois and Arkansas.”
The project allows people two ways to access the data: through an API and as a bulk download. The API, or application program interface, is a conduit between the database and a user creating remote access to the entire database.
The database information is free to the public, though per a deal with Ravel, no one can access more than 500 full-text cases a day. However, researchers can agree to different terms that lift the limit.
This cap also does not apply to “whitelisted jurisdictions”—jurisdictions that already make their new cases freely available online. Currently, that only includes Arkansas and Illinois. A user can also bulk download every case from these two jurisdictions in one fell swoop.
The data can be retrieved in HTML or XML formats.
Beyond the raw data, the website also provides tools that help a user make a free caselaw textbook; a word cloud visualization from cases issued between 1852 and 2015; and a limerick generator.
LexisNexis, which bought Ravel Law in 2017, has control over the commercial use of the resource through March 2024, according to LawSites Blog. Companies interested in using the database will need a license from LexisNexis.
As part of the launch, the Caselaw Access Project is calling for users to report any errors they find that were introduced during the digitization process.
“Some parts of our data are higher quality than others,” the site states. “Case metadata, such as the party names, docket number, citation, and date, has received human review. Case text and general head matter has been generated by machine OCR and has not received human review.”
ABAJournal.com: Digitization of Harvard case law library will show court patterns and trends