Essential insights from Hacker News discussions

Anthropic agrees to pay $1.5B to settle lawsuit with book authors

Here's a summary of the themes from the Hacker News discussion:

The Impact of Settlements on Legal Precedent

A central theme is the debate over whether Anthropic's settlement truly sets a legal precedent for AI training on copyrighted material. Many users believe that settlements, by design, avoid admitting wrongdoing and thus preclude definitive legal precedent.

  • "On a legal precedent, it does sort of open the flood gates for others." - SlowTao
  • "I’m sure this’ll be misreported and wilfully misinterpreted because of the current fractious state of the AI discourse, but given the lawsuit was to do with piracy, not the copyright-compliance of LLMs, and in any case, given they settled out of court, thus presumably admit no wrongdoing, conveniently no legal precedent is established either way." - lewdwig
  • "To be very clear on this point - this is not related to model training. It’s important in the fair use assessment to understand that the training itself is fair use, but the pirating of the books is the issue at hand here, and is what Anthropic “whoopsied” into in acquiring the training data." - aeon_ai
  • "To be even more clear - this is a settlement, it does not establish precedent, nor admit wrongdoing. This does not establish that training is fair use, nor that scanning books is fine." - gnabgib
  • "The ruling also doesn’t establish precedent, because it is a trial court ruling, which is never binding precedent, and under normal circumstances can’t even be cited as persuasive precedent, and the settlement ensures there will be no appellate ruling." - dragonwriter
  • "Judge Alsup’s ruling is not binding precedent, no." - jkaplowitz

Piracy vs. Fair Use in AI Training

A significant portion of the discussion distinguishes between the act of pirating books and the act of training AI models on copyrighted material. Many users emphasize that the Anthropic settlement was based on the former, not necessarily the latter, raising questions about the legality of the training process itself.

  • "Maybe, though this lawsuit is different in respect to the piracy issue. Anthropic is paying the settlement because they pirated the books, not because training on copyrighted books isn’t fair use which isn’t necessarily true with the other cases." - typs
  • "It’s important in the fair use assessment to understand that the training itself is fair use, but the pirating of the books is the issue at hand here, and is what Anthropic “whoopsied” into in acquiring the training data." - aeon_ai
  • "anthropic legally purchased the books it used to train its model according to the judge. And the judge said that was fine. Anthropic also downloaded books from a pirate site and the judge said that was bad -- even though the judge also said they didn't use those books for training...." - greensoap
  • "The judge's ruling from earlier certainly seemed to me to suggest that the training was fair use." - ijk
  • "So not only had his work been consumed into this machine that is being used to threaten his day job as a court reporter, not only was that done without seeking his permission in any way, but they didn’t even pay for a single copy." - Nursie

The Financial Incentives and Consequences of Settling

Users discuss the financial motivations behind the settlement, ranging from avoiding potentially catastrophic damages to the cost of doing business. The settlement amount is often viewed in relation to Anthropic's funding and potential future profits.

  • "A trial was scheduled to begin in December to determine how much Anthropic owed for the alleged piracy, with potential damages ranging into the hundreds of billions of dollars. It has been admitted and Anthropic knew that this trial would totally bankrupt them had they said they were innocent and continued to fight the case." - rvz
  • "The first of many." - rvz
  • "If it was a sure thing, then the rights holders wouldn't have accepted a settlement deal for a measly couple billion. Both sides are happier to avoid risking losing the suit." - f33d5173
  • "A lot of times companies settle without admitting guilt. And in any case, given they settled out of court, thus presumably admit no wrongdoing, conveniently no legal precedent is established either way." - lewdwig
  • "Option 1: $183B valuation, $1.5B settlement. Option 2: near-$0 valuation, $15M purchasing cost. To an investor, that just looks like a pretty good deal, I reckon. It's just the cost of doing business - which in my opionion is exactly what is wrong with practices like these." - crote
  • "Yes, but the cat is out of the bag now. Welcome to the era of every piece of creative work coming with an EULA that you cannot train on it." - GabeIsko
  • "Maybe small compared to the money raised, but it is in fact enormous compared to the money earned. Their revenue was under $1b last year and they projected themselves as likely to make $2b this year. This payout equals their average yearly revenue of the last two years." - slg

The Role of Piracy Sites and Their Age

The discussion touches on the existence and longevity of piracy sites like Library Genesis (Libgen), highlighting that these resources have been around for a considerable time and are well-known within certain communities.

  • "Is lib still around anymore. I can't find any functioning urls" - jay_kyburz
  • "There are mirrors on its' wikipedia page: https://en.wikipedia.org/wiki/Library_Genesis" - kibae
  • "pxx: libgen.help is frequently updated" - pxx
  • "I knew about library genesis by 2012. It was at least 10 TiB large by then, IIRC. With the amount of Russian language content I got the impression it was more popular in that sphere, but an impressive collection for anyone and not especially secret." - edgineer
  • "ants_everywhere: I had no idea libgen was that old, thanks!" - ants_everywhere

Analogy to Other Industries and Business Models

Comparisons are drawn to practices in other tech sectors, such as ride-sharing services and Google Books, to illustrate how companies have navigated regulatory and legal hurdles. This also leads to discussions about the ethics of disruption versus outright illegality.

  • "It was faster to just put unlicensed taxis on the streets and use investor money to pay fines and lobby for favorable legislation. In the same way, it was faster for Anthropic to load up their models with un-DRM'd PDFs and ePUBs from wherever instead of licensing them publisher by publisher." - rchaud
  • "The same could be said of grand larceny. The difference would seem to be a mix of social norms and, more notably for this conversation, very different consequences." - alpinisme
  • "And thank god they did. There was no perfectly legal channel to fix the taxi cartel. Now you don't even have to use Uber in many of these places because taxis had to compete - they otherwise never would have stopped pulling the "credit card reader is broken" scam, taking long routes on purpose..." - jimmaswell
  • "Original Silicon Valley model, and generally the engine of American innovation/growth/wealth equality for 200 years: Come up with a cool technology, build it in your garage, get people to fund it and sell it because it's a better mousetrap. New model: Still come up with a cool idea, still get it funded and sold, but the idea involves committing crime at a staggering scale (Uber, Google, AirBnB, all AI companies, long list here), and then paying your way out of the consequences later." - safety1st

The "Destructive Scanning" of Books

A specific detail that emerges is the method of "destructively scanning" books, which involves cutting off the spines. This practice is discussed in terms of scalability, cost, and environmental impact.

  • "Chopping the spine off books and putting the pages in an automated scanner is not scalable." - therobots927
  • "Onavo: > Chopping the spine off books and putting the pages in an automated scanner is not scalable. That's how Google Books, the Internet Archive, and Amazon (their book preview feature) operated before ebooks were common. It's not scalable-in-a-garage but perfectly scalable for a commercial operation." - Onavo
  • "I guess companies will pay for the cheapest copies for liability and then use the pirated dumps. Or just pretend that someone lent the books to them." - debugnik
  • "To be clear, they destructively scanned millions of books which in total were worth millions of dollars. They did not destroy old, valuable books which individually were worth millions." - nl

The Concept of "Fair Use" in the Context of AI

The fundamental concept of "fair use" is a recurring point of contention. Users grapple with whether AI training, particularly at scale, aligns with traditional interpretations of fair use, especially given the transformative nature and potential for reproduction.

  • "The pivotal fair-use question is still being debated in other AI copyright cases. Another San Francisco judge hearing a similar ongoing lawsuit against Meta ruled shortly after Alsup's decision that using copyrighted work without permission to train AI would be unlawful in 'many circumstances.'" - rvz
  • "It’s important in the fair use assessment to understand that the training itself is fair use, but the pirating of the books is the issue at hand here, and is what Anthropic “whoopsied” into in acquiring the training data." - aeon_ai
  • "Buying used copies of books, scanning them, and training on it is fine. The idea that training models is considered fair use just because you bought the work is naive. Fair use is not a law to leave open usage as long as it doesn’t fit a given description. It’s a law that specifically allows certain usages like criticism, comment, news reporting, teaching, scholarship, or research. Training AI models for purposes other than purely academic fits into none of these." - florbnit
  • "The purpose and character of AI models is transformative, and the effect of the model on the copyrighted works used in the model is largely negligible. That's what makes the use of copyrighted works in creating them fair use." - derektank
  • "I think the jury is still out on how fair use applies to AI. Fair use was not designed for what we have now." - amradio1989

The "Human Authorship" Debate and AI-Generated Content

The discussion extends to the copyrightability of AI-generated content and the legal requirement for human authorship, referencing related court cases such as the "monkey selfie" dispute.

  • "And what about all the other stuff that LLMs spit out? Who owns that. Well at present, no one. If you train a monkey or an elephant to paint, you cant copyright that work because they aren't human, and neither is an LLM." - zer00yz
  • "This seems too cute by half, courts are generally far more common sense than that in applying the law. This is like saying using rails generate model:example results in a bunch of code that isn't yours, because the tool generated it according to your specifications." - arcticfox
  • "I think you're thinking of this case [1], it was a monkey, it wasn't a painting but a selfie. A painting would have only made the no-copyright argument stronger." - gpm
  • "The Board’s decision was later upheld by the U.S. District Court for the District of Columbia, which rejected the applicant’s contention that the AI system itself should be acknowledged as the author, with any copyrights vesting in the AI’s owner. The court further held that the CO did not act arbitrarily or capriciously in denying the application, reiterating the requirement that copyright law requires human authorship and that copyright protection does not extend to works “generated by new forms of technology operating absent any guiding human hand, as plaintiff urges here.”" - zer00yz (quoting a legal alert)

Renewed Interest in Vernor Vinge

The discussion briefly touches on the passing of sci-fi author Vernor Vinge, with several users recalling his prescient ideas, particularly from his book "Rainbows End," which some found relevant to the AI and digital future being discussed.