Why scraping publicly available information online isn't a crime
Jason Tashea. Photo by Saverio Truglia.
Earlier this month, the 9th U.S. Circuit Court of Appeals at San Francisco took a stand for an open internet. A three-judge panel found that automated searching of a public website, also called web scraping, is not a violation of the Computer Fraud and Abuse Act, the country’s main anti-hacking law.
At issue was whether or not hiQ Labs, a data analytics company, could continue to scrape publicly available data from LinkedIn, which is owned by Microsoft, even after the resumé website sent a cease-and-desist letter.
LinkedIn argued that, after receiving the cease-and-desist letter, hiQ Labs’s scraping was “unauthorized access”—the internet’s version of trespass—under the CFAA. HiQ Labs thought that, since the data it collected was public, its actions were legal. The appellate court sided with hiQ Labs.
“The CFAA was enacted to prevent intentional intrusion onto someone else’s computer—specifically, computer hacking,” wrote Judge Marsha S. Berzon in hiQ Labs v. LinkedIn for the panel as she drew comparisons between hacking and breaking and entering.
“It is likely that when a computer network generally permits public access to its data, a user’s accessing that publicly available data will not constitute access without authorization under the CFAA,” the opinion continued, affirming the trial court.
Living in the information age, everyone must be able to build on the public data compiled by big companies such as LinkedIn, Craigslist and Facebook. To criminalize public website scraping castrates an open internet by curtailing access to information. This isn’t just an issue for internet startups and academic researchers but also the legal community.
I recently wrote about the CFAA for its 35th birthday and found that foundational questions remain regarding the law’s anti-hacking provisions, which criminalizes access “without authorization” and “exceeding authorized access” of a protected computer. The statute, while having gone through numerous revisions over the past three decades, has never seen added clarity to these terms, which has led to overreaching legal actions, such as the one initiated by LinkedIn.
Because of the lack of legislative discretion, it’s unclear whether companies that use web scraping break the law when searching public websites. Nonmalicious security researchers often find themselves in precarious legal situations when they discover and disclose a vulnerability on a website. Further, it is an open question whether sharing a password in violation of a company’s terms of service qualifies as a crime.
This is why the hiQ Labs case is important. In this narrow application—where a password or other technical barrier by the host site is not circumvented by the scraping party—public data is free for the taking. To find for LinkedIn in this instance would have meant a major blow to the way an open internet functions. While the right decision was made, this wasn’t a foregone conclusion.
In a federal case also out of California, decided in 2017, Craigslist sued two startups using its public data after sending a cease-and-desist letter. When the trial court did not toss the case on spec, the defendants were worried enough to settle with the richer, eponymous plaintiff, indicating that the uncertainty surrounding the law was creating a chilling effect on legal acts.
Creating necessary clarity, the hiQ Labs decision is worth celebrating—even though the case comes in the form of an affirmed preliminary injunction, which considers the merits of the legal claims without making a definitive ruling.
This is the second win for web scrapers in as many years. Last year, a district court judge in Washington, D.C., allowed a group of researchers and journalists conducting discrimination auditing—common in employment and housing discrimination research—to bring a case challenging the constitutionality of the CFAA.
Their research might require violating website terms of service, including things as benign as creating multiple accounts or collecting publicly available information (sound familiar?), which could put them in conflict with the law.
While the finding was a part of preliminary motions, U.S. District Judge John Bates wrote in Sandvig v. Sessions: “By incorporating [terms of service] that purport to prohibit the purposes for which one accesses a website or the uses to which one can put information obtained there, the CFAA threatens to burden a great deal of expressive activity, even on publicly accessible websites—which brings the First Amendment into play.”
Awaiting summary judgment, the case is ongoing.
While the D.C. trial court mirrored the 9th Circuit’s more narrow reading of the statute, the 1st, 5th, 7th and 11th Circuits—encompassing about a third of the U.S. population—have all adopted a broader interpretation that could leave companies such as hiQ Labs and the researchers in D.C. in legal jeopardy.
As the D.C. case indicates, this issue isn’t just about one tech company vs. another. With more federally and state-protected processes moving online, such as home loans, job postings and housing ads, the legal community has reason to support the 9th Circuit’s view of the CFAA.
The work being done by researchers and journalists can set the stage for legal action—such as the U.S. Department of Housing and Urban Development’s suit against Facebook’s alleged violations of the Fair Housing Act. Scraping may also be the means to a strong defense.
The harm of the current circuit split is compounded by the internet being a distributed network. Not knowing where a company’s servers are located can be the difference between a federal crime or not.
It’s time either Congress or the U.S. Supreme Court resolve this decadeslong uncertainty and side with the 9th Circuit’s growing corpus: Scraping publicly available information, regardless of a site’s terms of service or a cease-and-desist letter, is not a crime.
An open and healthy internet demands it.
Jason Tashea is the author of the Law Scribbler column and a legal affairs writer for the ABA Journal. Follow him on Twitter @LawScribbler.