AI governance and copyright

Dr. Lisa PalmerJuly 15, 20256 min read

6 min read

XLinkedIn

Anthropic used 7 million pirated books to train an AI model. A federal judge says that is copyright infringement, and your business could be on the hook if your tools were trained the same way.

In a pivotal moment for enterprise AI strategy, U.S. District Judge William Alsup issued a decision in Bartz v. Anthropic that reshapes how companies must evaluate AI tools and govern data use.

The ruling gives AI companies a narrow but important protection: it is legal to train models on copyrighted books only if the books were legally acquired and the model is used in a transformative way. That means the AI must produce new and original content (like summaries, insights, or fresh writing) not just copy or closely imitate the original work.

Analogy: It is like a person reading several books and then writing their own article with new ideas. That is legal. But copying pages from those books word-for-word? Still illegal.

The court was clear: using pirated or illegally sourced content is still copyright infringement, even if it is just for research or internal use.

Why This Matters to Your Business

Most enterprises are not training large language models (LLMs) in-house, but nearly all use them. Whether you are deploying chatbots, AI copilots, or embedded tools in third-party platforms, both the input data and output content could expose your organization to legal and reputational risk.

Key Business Risks and Leadership Actions

Even if your company is not training AI models, this ruling still affects you. You are likely using tools built on large models, and that means you are exposed to risks tied to how those models were trained and how their outputs are used.

Here are four actions every business leader should take now to reduce legal exposure and ensure responsible AI use:

1. Require Transparency from AI Vendors

You cannot manage risk if you do not know how your tools were built. Start with visibility.

Ask your vendors: Where did your training data come from? Insist on documentation or audit trails showing that training data was lawfully obtained, not scraped or pirated. If they will not disclose it, consider it a red flag.

2. Review Your Contracts Now

Legal exposure often hides in the fine print. Make sure your contracts work in your favor.

Confirm your contracts include IP indemnification for both training inputs and generated outputs. Add language to explicitly cover future AI-related claims.

3. Guard Your Own Internal Use

AI risk is not just external. It starts inside your organization. Do not build internal datasets using scraped or pirated content, even for research. Your company is liable even if an intern, analyst, or vendor did it. Educate employees and contractors about these rules to prevent accidental violations.

4. Monitor Outputs, Not Just Inputs

You are accountable for what the AI creates, not just what it was trained on. The ruling focused on inputs, but outputs that closely mimic or reproduce protected works can also trigger copyright issues. Be especially careful in marketing, publishing, and customer-facing use cases.

Authors, Creators, and Your Brand Reputation

This is not just a data issue; it is a brand trust issue. If your AI tools mimic an author's work without credit or recreate distinctive content, you are not just risking a lawsuit. You are risking your reputation. Future rulings will further tighten the rules. The smart move now is to lead with transparency and ethics. It is not just the right thing to do; it is good business.

Strategic Implications

This ruling reinforces a core principle: AI deployment demands clear oversight and strong internal controls. Leaders must ensure:

AI tools are used in ways that meet legal and ethical standards
Vendors are vetted for data transparency and compliance
Your brand is shielded from legal and reputational risk tied to how AI is built and used

Looking Ahead

The trial will now determine whether Anthropic acted willfully in using pirated data. The outcome could set the tone for how damages are awarded in future AI copyright cases.

With legal scrutiny expanding, leaders should anticipate and prepare for closer examination of:

Model training sources: How and where training data was acquired. Expect growing pressure to prove that datasets were lawfully sourced and not scraped or pirated.

Content outputs: What your AI tools generate. Outputs that closely resemble copyrighted material could become legal liabilities, especially in marketing, publishing, or customer-facing content.

Contract language with vendors and platforms: Your legal protections. Courts and regulators will increasingly look at whether contracts include IP indemnification and data use disclosures. Weak language equals high risk.

Boardroom Summary

Issue	What to Know / What to Do	Use Case Example
Training on Copyrighted Books	Legal if content was legally acquired and used to train transformative models that generate new, original content	A vendor trains a model on purchased eBooks to generate summaries, not full text
Use of Pirated Content	Still copyright infringement, even for internal or research use. Risk of litigation	Anthropic downloaded 7M pirated books; faces trial and potential statutory damages
Vendor Accountability	Demand proof of data provenance. Require contracts with IP indemnification clauses	Your marketing team uses an AI tool: confirm the vendor's training data was licensed
Enterprise Exposure	You are liable for both training inputs and model outputs. Monitor use in all workflows	An LLM generates blog copy too close to a known book: your company may be liable

FAQs

What did the federal judge decide in Bartz v. Anthropic? The judge ruled using 7 million pirated books to train an AI model constituted copyright infringement.

How does this ruling affect businesses using AI? Businesses should audit AI vendors, update contracts, and monitor use to mitigate legal and reputational risks.