A judge rejected key claims in a lawsuit by developers against GitHub, Microsoft, and OpenAI, the first decision in a series of court actions related to generative AI.
What’s new: A U.S. federal judge dismissed claims of copyright infringement and unfair profit in a class-action lawsuit that targeted GitHub Copilot and the OpenAI Codex language-to-code model that underpins it.
The case: In November 2022, programmer Matthew Butterick and the Joseph Saveri Law Firm filed the lawsuit in U.S. federal court. The plaintiffs claimed that GitHub Copilot had generated unauthorized copies of open-source code hosted on GitHub, which OpenAI Codex used as training data. The copies allegedly infringed on developers’ copyrights. The defendants tried repeatedly to get the lawsuit thrown out of court. In May 2023, the judge dismissed some claims, including a key argument that GitHub Copilot could generate copies of public code without proper attribution, and allowed the plaintiffs to revise their arguments.
The decision: The revised argument focused on GitHub Copilot’s duplication detection filter. When enabled, the filter detects output that matches public code on GitHub and revises it. The plaintiffs argued that the existence of this feature demonstrated GitHub Copilot’s ability to copy code in OpenAI Codex’s training set. The judge was not persuaded.
- The judge stated that the plaintiffs had not presented concrete evidence that Copilot could generate substantial copies of code. He dismissed this copyright claim with prejudice, meaning that the plaintiffs can’t refile it.
- The judge also dismissed a claim that GitHub illicitly profited from coders’ work by charging money for access to GitHub Copilot. To claim unjust enrichment under California law, plaintiffs must show that the defendant enriched itself through “mistake, fraud, coercion, or request.” The judge ruled that the plaintiffs had failed to demonstrate this.
Yes, but: The lawsuit is reduced, but it isn’t finished. A breach-of-contract claim remains. The plaintiffs aim to show that OpenAI and GitHub used open-source code without providing proper attribution and thus violated open-source licenses. In addition, the plaintiffs will refile their unjust-enrichment claim.
Behind the news: The suit against Github et al. is one of several underway that are testing the copyright implications of training AI systems. Getty Images, Authors’ Guild, The New York Times, and other media outlets along with a consortium of music-industry giants have sued OpenAI and other AI companies. All these cases rest on a claim that copying works protected by copyright for the purpose of training AI models violates the law — precisely what the plaintiffs failed to show in the GitHub case.
Why it matters: This lawsuit specifically concerns code written by open-source developers. A verdict could determine how code can be used and how developers can use generative AI in their work. However, it has broader implications. (Note: We are not lawyers and we do not provide legal advice.) This dismissal is not a final verdict, but it supports the view that AI developers have a broad right to use data for training models even if that data is protected by copyright.
We’re thinking: Broadly speaking, we would like AI to be allowed to do with data, including open source code, anything that humans can legally and ethically do, including study and learn. We hope the judge’s decision gives AI developers further clarity on how they can use training data, and we hope it establishes that it’s ethical to use code-completion tools trained on open-source code.