Can you protect AI-generated innovation?

October 10, 2023

TABERNAS DESERT, ALMERIA ANDALUSIA / SPAIN - SEPTEMBER 18, 2011: Post office movie location set for spaghetti western in desert, horse drawn carriage. Protected wilderness area. Europe

In late August, a Washington DC court ruled that work created by artificial intelligence (AI) cannot be protected by copyright.

Judge Beryl Howell said the case – a 2018 piece of art submitted by Stephen Thaler to the Copyright Office – was fairly straightforward but it had much deeper implications for the industry.

“We are approaching new frontiers in copyright as artists put AI in their toolbox,” which will raise challenging questions for copyright law, Howell wrote in her decision.

Thaler’s case is not alone. Writers are worried about AI companies using their works without licenses; other writers are angry at the use of pirated versions of their works to train AI and Getty Images is claiming over 12 million of its images were stolen.

All these cases share one key aspect: AI companies are taking advantage of a technology that has raced ahead of the legal status quo and have chosen an “act first, ask for forgiveness later” style of doing business.

Since invention and content are two important intangible asset classes, this situation is worth thinking about for the future of AI tools.

What if all the results of AI can’t be protected? Imagine investing millions in a system, only to find that the outputs are not protected by law and can be used without payment by anyone in the world. That might do more to put the brakes on AI development than any other single legal decision.

Copyright law is slowly adapting to AI technology, but it will take time and millions of dollars in litigation, costs, damages and appeals until the courts can reach a new equilibrium.

In the meantime, the existing legal settings do offer some guidance for companies hoping to protect AI-generated works or innovation.

For example, in Australia, China, the EU and UK, a work must be “original” to gain copyright protection. Although there is no express provision requiring the creator to be human, copyright law in most jurisdictions exists on a spectrum.

On the right-hand boundary sits any piece of work that was 100% created by a machine. Way down the other end on the left-hand boundary sits works created entirely by humans (do any such works truly exist? After all, a pencil is a piece of technology…). The problem today concerns anything that nudges up against the right-hand boundary.

At some point on that spectrum, a work that has too much machine input and not enough human input loses its copyright protection. However, that exact point is currently painted in a lawyer’s favourite colour – grey.

In March this year, Jason Allen submitted a stunning piece of art to a photography competition. Allen claimed 100% authorship of the work, even though he used an AI tool called Midjourney. After winning first place in the competition, he then tried to register the work with the US Copyright Office.

The Office said Allen needed to disclaim large parts of the work because all he did was give Midjourney prompts while the AI tool did the heavy lifting. Allen countered that it wasn’t easy to write those prompts – it took more than 400 different iterations. Adobe Photoshop was also used to clean up the image, Allen said.

But the Copyright Office was resolute. Allen’s labour wasn’t enough to constitute “human authorship” of the art. No matter how many tweaks Allen made to the prompt, it was the AI that determined the final output. So, that’s worth pondering for companies delving into AI.

Another grey aspect concerns the data companies are using to train their AI models.

Essentially, any AI needs to “learn” from enormous amounts of high-quality data so it can respond to human prompts (like Allen’s above). OpenAI’s ChatGPT model has been trained on more than 175 billion pieces of material that includes articles, books, pamphlets, studies and much more.

The problem is that much of this data is protected under copyright law which means using it to train an AI model requires gaining a licence from the copyright holder. While some AI companies do ask for licenses, others are sneakily using “shadow libraries” to find pirated versions of the copyrighted material instead, which the artists claim constitutes an infringement.

Such an infringement reflects the way copyright law is set up.

Broadly speaking, infringement could happen at both the input and the output stages. At the input stage, copyright infringement can occur the moment an AI company copies a pirated work (or an unlicenced version). The very act of copying the material constitutes an infringement.

On the output side, when an AI model is trained on pirated version of a copyrighted work, like George R. R. Martin’s Game of Thrones fantasy series, it will eventually be possible for that AI to produce its own version of Game of Thrones. If that version is “substantially similar” to Martin’s, then that would constitute an infringement.

Where this gets serious is that an AI company can get into trouble even if the company never directly asks its AI model to write Game of Thrones (primary infringement).

All it would take to trigger a secondary infringement is if a member of the public were to prompt the AI to write a version of Game of Thrones. At that point, the AI company itself could be held liable.

Many similar cases are presently in front of US courts, and around the world, and it will be interesting to see where they land. But there are already some lessons for companies.

First, ensure all training data is under license. It will almost certainly cost a lot more to secure licenses (potentially making the ROI for creating AI models too expensive for some), but there is a non-zero chance some infringements will result in large fines. Is that cost worth the risk of using unlicensed data just to get ahead?
Second, while copyright law will eventually find an equilibrium, that balance may be even harsher towards AI-generated works than it is today. Companies should always keep humans in the loop for innovation. If AI is only used to tweak the edges of an idea, then the resulting innovation can likely be protected since a clean chain of ownership (inventorship) should be clear.

AI is a bit like the Wild West at the moment. But just like the Wild West, the sheriff is coming to town, so it’s best that anyone using AI models – or developing their own – keep an eye on the changing rules.

Originally published on StartupDaily and BusinessTimes