Author Topic: Authors Join the Brewing Legal Battle Over AI (Read 2140 times)

APP · « **on:** July 22, 2023, 04:56:44 AM »

Here's some good news, though I doubt anything will come of it. Yes, I'm cynical.

https://www.publishersweekly.com/pw/by-topic/digital/copyright/article/92783-authors-join-the-brewing-legal-battle-over-ai.html

Bill Hiatt · « **Reply #1 on:** July 22, 2023, 06:11:27 AM »

If nothing else, new issues need to be litigated. Otherwise, they remain up in the air, potentially for decades.

I've said this before, but I do think the copyright experts saying that the AI training is fair use miss the mark. In the cases cited, like Turnitin and Google, the purpose was indeed transformative. (As a teacher, I actually applauded the Turnitin case.) But Turnitin was checking for plagiarism, not creating new essays that competed with the existing essays. Google was facilitating certain kinds of searches, not creating new books to compete with the indexed books. AI, on the other hand, is being trained to produce products that could some day compete with the books on which they were trained. It astounds me that experts in the field would miss such an obvious difference.

I can't find my source offhand, but I also recall that not having acknowledge the use of a source makes it harder to claim fair use. Acknowledgement by itself doesn't establish fair use, but if you don't acknowledge the borrowing (outside of parody, where it's typically obvious), it looks more as if you're trying to get away with something than that you're working within the law. Have the AI companies acknowledged which works they used? If so, I'm not aware of it.

We know that the creators of the database from which AI was trained mentioned publicly accessible internet as their source. But whole texts of books (except public domain works) typically aren't on the public internet, and there's no mention of such books being purchased. So where did they come from? Illicit copying or scraping would also reduce the likelihood of a finding of fair use.

Actors and screenwriters may be able to get contractual protection, as the article suggests. But novelists aren't a unionize group, and our relationship with our distributors is not that of employees. We don't have the same kind of recourse they do.

She-la-te-da · « **Reply #2 on:** July 27, 2023, 11:56:30 PM »

The thing about "fair use" is, it has to be established through the courts, right? Just claiming fair use doesn't make it so. I think it's being disingenuous for these people to be saying training "AI" on copyrighted matter is included in the narrow fair use guidelines. Fair use was intended to allow criticism and certain educational uses, not to allow some idiot to just collect all the copyright stuff and use it how they please.

PJ Post · « **Reply #3 on:** July 28, 2023, 04:55:40 AM »

There isn't really any law that covers AI. Infringement is a stretch at best, because the AI companies aren't producing anything, but it still feels wrong. So, legally, infringement is all there is. I can't imagine it going anywhere. But it will, as Bill noted, establish the rules moving forward, which is already long overdue.

Bill Hiatt · « **Reply #4 on:** July 29, 2023, 12:33:54 AM »

While there isn't any law that covers AI directly, infringement is still infringement. If I obtained thousands of copies of ebooks and then pulled snippets from them to create a new book, that would clearly be infringement. The fact that AI training can enable AI to do such a process much more effectively doesn't make the process any less infringing.

Also, since authors generally don't post copies of their own complete works online, the general claim that training materials came from the "publicly accessible internet" seems dubious to me. Public domain works could be accessed that way, but not anything still covered by copyright.

I can't recall where I read this, and right now, I can't access my files because I'm in the process of transferring files from an old computer to a new computer, but I'm pretty sure you have to give credit to copyrighted works in order to claim fair use of them. But images, videos, and texts being used in the training process are nowhere specifically acknowledged, making a claim of fair use problematic at best.

In addition, certain kinds of uses require that the material be obtained legally. For example, 2002 legislation allows teachers to show movies in classrooms as long as such movies are connected to the curriculum (overriding the "private home viewing" restriction on most videos) as long as the copy being used for the purpose is a legal copy. The law explicitly doesn't allow the showing of bootleg videos. I suppose someone could have bought all the books used in the training process, but I can't see any evidence that they did.

PJ Post · « **Reply #5 on:** July 31, 2023, 09:57:19 PM »

That's not how AI works, and data scraping isn't infringement now anymore than it has been for the past ten years. Like I said, the way they've trained AI just feels wrong (kind of a lot wrong), but they didn't do anything illegal because there are no laws on the subject. Is it fair use? I don't even know what that means in this context, and I suspect neither do the attorneys or the USCO. But the whole thing feels like infringement. Unfortunately, the legal system doesn't care about the 'feelz'.

Bill Hiatt · « **Reply #6 on:** August 01, 2023, 06:18:40 AM »

But that's just the point. It doesn't matter how AI works--not even a little bit. We know it's trained on copyrighted material. We can suspect that some of the material has been scraped. We know that some people are already moving toward using that data to create competing products that have the potential to infringe.

Without the data used to train it, AI would be nothing, at least in this context. The exact way it processes that data isn't legally relevant. How that data was obtained and what that data is being used for are the only relevant points.

Copyright stipulations against adding copyrighted works to electronic storage (such as databases) may be one avenue by which the training process can be challenged. No one disputes that the training data comes from databases. (That the AI doesn't retain exact copies is irrelevant to the fact that the data was stored in databases in the first place. Similarly, the fact that AI doesn't literally retain copies is also irrelevant. It retains enough to be able to quote copyrighted material verbatim and replicate image elements exactly, like trademarks, for example).

Obtaining data by scraping is another vulnerable point. There's is an area in which there are are laws. They vary from one jurisdiction to another and don't necessarily make all scraping illegal--but they do prohibit some activities.

Quote

What this means is that while it isn't illegal for you to scrape and gather copyrighted material per se, if you use that information, it certainly might be. Remember that specific laws in various countries are not entirely the same on this issue.

For example, in some places you may be able to use parts of the copyrighted data you've scraped, while in others you won't be able to use any of it at all

https://www.termsfeed.com/blog/web-scraping-laws/

The highest court in the EU has ruled that what data can be scraped and what can be done with it can be limited by the site's TOS. US courts have been more liberal, but the cases I can find all refer to publicly available information, not specifically to copyrighted material. In other words, in many areas, you can't scrape an original blog post and use the material contained therein, because it's protected by copyright.

Maybe the AI developers bought all the books, got permission from all the writers of articles on the web, licensed all the images, etc. But does that seem the likeliest scenario?

Also, since the use of the material relies heavily on transformativeness as a way to obtain fair use protection, that could be an avenue of attack as well. Cases in which transformative use has won have been ones like Google and Turnitin, in which competing products are not being generated. Also, as I have mentioned elsewhere, the Supreme Court has moved to narrow the scope of transformativeness (probably a response to lower courts using it in too elastic a way.

You're right that the legal system doesn't care about the "feelz." But there is enough actual law involved here that it is quite possible the legal system will care.

PJ Post · « **Reply #7 on:** August 01, 2023, 10:29:46 PM »

Quote from: Bill Hiatt on August 01, 2023, 06:18:40 AM

But there is enough actual law involved here that it is quite possible the legal system will care.

Is there?

If we look at AI as it is rather than how it feels like it should be - I'm not seeing it.

Bill Hiatt · « **Reply #8 on:** August 02, 2023, 04:11:51 AM »

At the risk of repeating what I literally just said,

Use of copyrighted material in an unauthorized electronically accessible format is already dubious under existing law. (Converting to a new format is considered an adaptation. Right of adaptation is reserved to the rights holder.) Some exceptions have been carved out through litigation (Google, Turnitin), but none of those is comparable to enabling AI to create competing works. Google and Turnitin prevailed because they neither did that nor impaired the commercial value of the original. AI has the potential to do both. Hence, there's no reason to think existing exemptions would apply to it.

Numerous federal and state laws restrict what can be scraped from the internet and how the data can be used. Against, there are exceptions, but they are not specific to copyrighted work. Courts have ruled that publicly available facts may be scraped (like product prices for use by comparison sites). But factual info isn't the concern here.

The idea that AI training (and even AI production of competing works) can be defended on a transformativeness basis wasn't a certainty to begin with, and a recent Supreme Court decision has reined in the tendency of lower courts to rely too much on a loose standard of transformativeness. Typically, creating works that compete directly with the copyrighted works involved is not transformative, and the few rulings that strayed outside that zone are unlikely to be repeated after the Supreme Court tightened up the standards for transformativeness.

While it is true that the courts could still surprise us, there's plenty of legal information suggesting that they won't surprise us too much.

PJ Post · « **Reply #9 on:** August 02, 2023, 10:18:41 PM »

At the risk of repeating what I literally just said...

That's still not how AI works.

Nothing is being copied or stored, nothing is being transformed, nothing is being converted, and there is no competition.

Again, we have to look at what AI is rather than what it feels like it is. Does Copyright Law really cover it?

Post-Crisis D · « **Reply #10 on:** August 03, 2023, 01:40:29 AM »

Maybe it "feels" like AI is "learning" or "being trained" but what it's doing is taking data (i.e., content) and transforming it into bits and bytes and algorithms and whatnot so it can spit out content resembling the content that went in. It may not store the content in a way that looks like the content, but neither does a QR code.

Current AI may feel like it has intelligence but it's just a sophisticated computer program that mimics intelligence. It has no actual understanding of the data/content it's been fed. It's been fed massive amounts of content, much of which is protected by copyright, transformed and converted that content into data points and algorithms, and can spit out content in ways to mimic (and thus compete with) the original (often copyrighted) works that were fed into it.

That's what it is despite what some may feel like it is.

Bill Hiatt · « **Reply #11 on:** August 03, 2023, 02:14:49 AM »

Quote from: PJ Post on August 02, 2023, 10:18:41 PM

At the risk of repeating what I literally just said...

That's still not how AI works.

Nothing is being copied or stored, nothing is being transformed, nothing is being converted, and there is no competition.

Again, we have to look at what AI is rather than what it feels like it is. Does Copyright Law really cover it?

Post-Crisis D summarized the situation quite well.

It's clear AI does something with copyrighted material. It's clear that use is not authorized by the rights holder, at least in most cases. That potentially invalidates the training method, depending upon what the courts decide. One could claim perhaps the that training process is transformative. The problem is that the ultimate goal of the training process is competitive. We feed the AI images so that it can produce more images. We feed the AI text so that it can produce more text, and so on. Companies that have licensed the AI technology are starting to compete with copyrighted materials used in the training process already.

In order for the training to qualify as transformative, the end result would have to be noncompetitive. For example, studying image data to compare the relative merits of various photographic devices or studying text to develop new theories about how the human brain works. For all I know, some of that could be happening. But competitive uses are happening as well. As I've mentioned, the reason Google and Turnitin were able to claim fair use was through the argument that their use was transformative. And indeed, Google was using snippets in search results, not to create new books. Turnitin was using essay texts to check for plagiarism in other essays, not to create new essays. Both of those uses were clearly transformative. The purpose was different, demonstrable in part by the fact that no competing products were created, and the market for the originals wasn't impaired. That simply isn't true of what AI is doing.

Does copyright law mention AI? Of course not. But the emphasis on the law is on prohibited behaviors, not the method by which those behaviors are performed or the the person or thing performed those behaviors.

PJ Post · « **Reply #12 on:** August 03, 2023, 08:25:37 AM »

Let's back up.

How do you all think AI works?

Bill Hiatt · « **Reply #13 on:** August 06, 2023, 12:04:04 AM »

I don't know precisely how it works. But what's legally important is the effect it has, not the specific process.

I'm sure you know more about it than I do, and so I totally believe you when you say that it isn't storing the copyrighted works, at least not as we usually understand the term. That doesn't prevent it from spitting out chunks of copyrighted text, Getty watermarks, etc. We've already seen examples of that, and there would be far more if we took every single piece of AI output and had a way of checking it, Turnitin.com style, and had the ability to compare it to existing copyrighted works. The potential infringement there is what it produces, not how it produces it. In the same way, the training is dubious because of the materials that are used, not the process whereby they are ingested.

Let's look at a non-copyright example for a minute. If I recall correctly, there are defamation suits in the works because AI created defamatory statements about people. The statements are clearly defamatory, for example, accusing people of crimes they didn't commit. The only real legal issue is who exactly to sue--the AI designer, the person whose prompts elicited the material, etc. But that the statements are defamatory, there is no doubt. The process whereby the defamatory statements were created is absolutely, totally irrelevant in a legal context. (They are relevant to the AI programmers, who will no doubt try to fix the problem.)

As with defamation, so with copyright. If a product infringes, then it infringes, regardless of whether it was created by an unaided human, an AI, or a human selling his or her soul to Satan. It's the result that constitutes infringement, not the process whereby the result was developed.

LBL · « **Reply #14 on:** August 06, 2023, 06:12:42 AM »

Quote from: Bill Hiatt on July 29, 2023, 12:33:54 AM

If I obtained thousands of copies of ebooks and then pulled snippets from them to create a new book, that would clearly be infringement.

But, this is what we do as humans when we produce just about anything. We 'pull snippets' from a lifetime of copyrighted consumption, reconstitute it, and spit it back out as new product - which we often then copyright and sell back to other humans for a price. Does that amount to litigation-level infringement? Imagine the size of that class action.

We constantly pattern-recognize, contextualize, recontextualize. We take what we consume, mimic what we consume, and put it back out into the world for someone else to take and do the same. It's culture.

Large language models like GPT are human mind emulators, consuming, recognizing patterns, contextualizing, recontextualizing, mimicking, and spitting what's been consumed back out in whatever capacity it's capable. How is that any more infringing than any human artist producing art ever?

The line of course, is plagiarism, which has always been a no-go. But, this lawsuit stuff involving copyright and LLMs comes off as a fear-based overreach at best, and a cynical cash-grab at worst.

TimothyEllis · « **Reply #15 on:** August 06, 2023, 11:29:08 PM »

Quote from: LBL on August 06, 2023, 06:12:42 AM

Quote from: Bill Hiatt on July 29, 2023, 12:33:54 AM
If I obtained thousands of copies of ebooks and then pulled snippets from them to create a new book, that would clearly be infringement.

But, this is what we do as humans when we produce just about anything. We 'pull snippets' from a lifetime of copyrighted consumption, reconstitute it, and spit it back out as new product

No we don't.

We create something new influenced by what we've seen in the past. We don't cut and paste snippets together.

The bots are clearly using actual cut and paste of pixels, otherwise they could not possibly be putting in actual signatures and watermarks.

Bill Hiatt · « **Reply #16 on:** August 07, 2023, 12:12:43 AM »

Quote from: TimothyEllis on August 06, 2023, 11:29:08 PM

No we don't.

We create something new influenced by what we've seen in the past. We don't cut and paste snippets together.

The bots are clearly using actual cut and paste of pixels, otherwise they could not possibly be putting in actual signatures and watermarks.

These are good points.

The way a human learns may be similar in some ways to the way AI absorbs information. I'll defer to people who know the process better. But there's no question the outcome is different.

Honest human authors and artists may be inspired by other works and take ideas from them. (Ideas are not copyrightable.) But only in rare cases do they spit back word-for-word (or pixel for pixel) replicas of what they've read or seen. Their process is much more complex than taking what they've experienced and making an elaborate collage out of it in response to a prompt.

AI can work much faster. But AI is also limited by what it's been trained on. (Were this not the case, developers could easily have trained it solely on material in public domain and avoided a large part of the current mess.) Even more important, it doesn't understand what it's trained on in the same way we understand what we experience. We've already seen examples of AI not understanding copyright, or distinctions between past and present, or distinctions between fiction and nonfiction. I've already shared examples to illustrate each of these issues.

That's not to say that humans never make mistakes. But we can distinguish between facts we've learned and things we've invented ourselves. AI doesn't seem able to do that, at least not yet.

PJ Post · « **Reply #17 on:** August 07, 2023, 12:20:16 AM »

Quote from: Bill Hiatt on August 06, 2023, 12:04:04 AM

I don't know precisely how it works. But what's legally important is the effect it has, not the specific process.

The case for infringement is based on copying and transferring digital images of copyrighted data. That's Process.

I ask, not to get into the nuts and bolts of it, but to examine this alleged copying and transferring of data, because that's not how AI works. It copies nothing. It transfers nothing. It analyzes data sets and forms associations - what we call learning. How is 'blue' not the same as 'sky'? It repeats this process a kajillion times over massive data sets, not to copy any specific IP, but to figure out all of the various meanings for 'blue' and 'sky' and how to tell the difference through context. That's not infringement. Whether or not end users want to test copyright with their prompts, that's on them, but they've never needed AI to do it before.

I could be wrong, but that's my understanding of how it works. And that's why we get signatures and logos in end user output, because they were overwhelming present in the learning stage. Early on, most of the soccer images used for training had a Getty logo, so the AI assumed that Getty was somehow a part of soccer. Ask for a soccer image and the AI will give you a soccer ball, green grass, players AND a Getty logo. I would call these errors training artifacts.

So, no, AI isn't cutting and pasting pixels.

Bill Hiatt · « **Reply #18 on:** August 07, 2023, 12:47:03 AM »

I'm sure you're right about the way AI works, but it still feels like a distinction without a difference.

To analyze something, AI has to assimilate it in some way. That doesn't mean it stores it in the way we usually understand the term. But if it can spit out an exact replica later, it has done something with the original. (And since the watermark is trademarked, images containing it, if made public, become a trademark violation and open up a whole new legal problem. Again, this happens because of the result, not the process.)

By the way, I've not worked directly with Getty, but visible watermarks only appear on unlicensed copies of an image with most stock photo vendors. Reproducing the Getty watermark could be evidence of what amounts to theft of the materials used to train AI. Or are we thinking someone licensed copies of all those stock images? (I raised this argument elsewhere, but we now have something like four threads about this, so forgive me if I'm repeating something you've already seen.) If they weren't licensed, they were scraped, which would certainly violate Getty's TOS (and therefore be illegal in some areas). If they were licensed, I imagine Getty, like most purveyors of stock images, has language about not using the images in a service that competes with Getty. While it may not explicitly mentioned the use of AI training materials to ultimately create competing images, a decent lawyer could certainly make the argument that the AI developers are violating the terms of the license. Any way you look at it, there are potential legal problems there.

Whether or not AI developers intended the product to be used to create competing texts or images is more a question of who should be sued rather than whether or not there's liability in the first place. People who use AI at this point run the risk of getting material that may contain enough copyrighted stuff to constitute infringement--but AI doesn't warn them in such a case, because it doesn't understand copyright. That could be on the developers, rather than the end users. But even if it's the end users who have the liability, enough successful suits against them would certainly destroy AI's customer base rather quickly.

Let me ask you this--if the developers didn't intend AI to be used to create competing products, is there language anywhere (terms of service, perhaps) that prohibits such a use? If so, what exactly did the developers intend. Private use only? In that case, what are we worried about? But if there is no such restriction, what exactly did the developers think people would do with the texts and images they created? I'd sure hate to be a developer who had to testify on that issue in the court, because I think there's only one obvious answer--unless I'm missing something.

LBL · « **Reply #19 on:** August 07, 2023, 01:36:57 AM »

Quote from: TimothyEllis on August 06, 2023, 11:29:08 PM

We create something new influenced by what we've seen in the past. We don't cut and paste snippets together.

Figuratively speaking, what's the difference?

Quote

The bots are clearly using actual cut and paste of pixels, otherwise they could not possibly be putting in actual signatures and watermarks.

Plagiarism, via human users of a commercial AI product, is obviously infringement. But, that's not the same thing as feeding words - copyrighted or not - into a large language model to train it, and improve its ability to form associations and improve its outputs.

Humans intentionally prompting infringing outputs should be the target for litigation, not the methodology behind an LLM's learned ability to create outputs in the first place.

Bill Hiatt · « **Reply #20 on:** August 07, 2023, 06:10:31 AM »

If humans are to blame, then you're right. They should be the ones getting sued.

But I doubt any human prompted AI to create an image including the Getty watermark, for example. It appears that AI doesn't know where the line between fair use and infringement (or trademark violation) is. To me, that indicates that AI could easily generate infringing output without the human prompter realizing that's what happened.

After all, that's exactly what happened with inaccurate content. No one prompts the AI to make stuff up. It just does. So a lawyer gets sanctioned for filing court paperwork including citations of nonexistent cases. Clearly, the lawyer didn't ask the AI to fake court cases. But that's what he got, anyway.

So yes, if it could be proved by the prompts the human used that the human wanted infringing content, then the human should be responsible. But given that AI doesn't know the difference, it just as likely it would give an infringing answer to someone who didn't ask for one. Who's responsible then? I think we both know the answer to that.

TimothyEllis · « **Reply #21 on:** August 07, 2023, 12:07:53 PM »

Just as an aside, this question was just posted on Quora:

Quote

During this writer's strike, are we still allowed to talk about books/promote/recommendation as it's all confusing and seeing if it's just related to visual media?

Makes me wonder if the downturn in book sales at the moment is because people are not buying books in support of the writer's strike.

PJ Post · « **Reply #22 on:** August 07, 2023, 10:25:27 PM »

Quote from: Bill Hiatt on August 07, 2023, 12:47:03 AM

But if it can spit out an exact replica later...

It's usually quite difficult to get an exact copy of an image. It's recognizable, but that's about it. At least that's been my experience in trying to get AI to anything the same twice. It's a tremendous limitation at the moment. Lots of folks have been searching for workarounds, and even the few they've come up with take a lot of trial and error. Close enough seems to be the guiding principle - which isn't anything close to an exact copy.

Quote

Reproducing the Getty watermark could be evidence of what amounts to theft of the materials used to train AI.

Not really. Getty and other image sites allow folks to download higher resolution images with their watermarks for free, to be used in mockups and comps or however they want really as long as it isn't for commercial use - specifically and limited to, reproducing the image itself in print or digital. Perfectly legal, otherwise they wouldn't allow/promote downloads as they do.

Quote

People who use AI at this point run the risk of getting material that may contain enough copyrighted stuff to constitute infringement...

I don't believe this to be true, at least not on the art side of things. Users have to be pretty deliberate in trying to copy an existing work.

Quote

Let me ask you this--if the developers didn't intend AI to be used to create competing products, is there language anywhere (terms of service, perhaps) that prohibits such a use? If so, what exactly did the developers intend. Private use only? In that case, what are we worried about? But if there is no such restriction, what exactly did the developers think people would do with the texts and images they created? I'd sure hate to be a developer who had to testify on that issue in the court, because I think there's only one obvious answer--unless I'm missing something.

The platforms are all different. Midjourney is part of Discord, so every image is searchable on their data base for anyone to download and use, which makes copyright kind of impossible, but the Agreement allows for non-exclusive commercial use. This is why the images need to be modified/transformed with obvious human creativity to get copyright protection. Other platforms don't have these restrictions.

As to intent, I think they did it, at least in part, because they could, because they 'had' to. However, I think the endgame was literally to improve the human condition. Remember, ChatGPT started as a non-prof. I don't think disrupting the graphic design industry was part of their calculus.

Bill Hiatt · « **Reply #23 on:** August 08, 2023, 12:24:17 AM »

I haven't tried using the tools, so I'll defer to your greater expertise on that. But you remember the old idea that enough monkeys pounding away on enough typewriters will eventually produce the complete works of Shakespeare. I doubt that's literally true, but if you think about thousands of people using ChatGPT and similar products, the odds go up far more than they would for any individual user. I don't think we can rule the process out.

Quote

Not really. Getty and other image sites allow folks to download higher resolution images with their watermarks for free, to be used in mockups and comps or however they want really as long as it isn't for commercial use - specifically and limited to, reproducing the image itself in print or digital. Perfectly legal, otherwise they wouldn't allow/promote downloads as they do.

This is not what I'm seeing on the Getty site. Here are the two exceptions to needing a license:

From the comp section (clicking the download arrow)

Quote

You are welcome to use content from the Getty Images site on a complimentary basis for test or sample (composite or comp) use only, for up to 30 days following download. However, unless a license is purchased, content cannot be used in any final materials or any publicly available materials. No other rights or warranties are granted for comp use.

If I'm Getty's lawyer, the first thing I'm going to argue is that "No other rights" means what it says. Is using the images to train AI included in the comp use? Gee, no it isn't. (And I also bet the database used to train isn't purged of Getty Images every thirty days, in accordance with the comp use requirements.

From the Help Center, "Using Files for Free"

Quote

The images on Getty Images are intended for use in commercial and editorial projects. This means you need to buy a license to use the image in most projects, including personal use.

You can use an image without paying for a license with our Embed feature, which lets you use over 70 million photos on any non-commercial website or blog (if you're using it to sell a product, raise money or promote or endorse something, Embed isn't for you). Just do a search, then go to Filters to turn on the Embeddable images filter on the search results page.

If I'm Getty's attorney, I'm going to point out that embedding images on a non-commercial website is not the same as using them in a database to train AI. Moreover, since AI is a licensable product now, the training certainly would fall under "using it [the image] to sell a product" prohibition, albeit indirectly. (If it weren't being trained on copyrighted material, there wouldn't be a product to license.) As with the other language, this language grants a narrow exception to licensing. In order for AI training to be covered, it would have to be explicitly included.

Also, Getty is a party in one of the anti-AI suits. That would suggest Getty doesn't interpret its policies in the way you do.

Does that mean Getty and others will win in court? No, it doesn't. But it does mean they have arguments that could win.

Quote

As to intent, I think they did it, at least in part, because they could, because they 'had' to. However, I think the endgame was literally to improve the human condition. Remember, ChatGPT started as a non-prof. I don't think disrupting the graphic design industry was part of their calculus.

Well, they're not a non-profit anymore. But if they want to avoid disrupting the graphic design industry, all they have to do is revise their license to prohibit material produced by AI being used for commercial purposes. Problem solved.

PJ Post · « **Reply #24 on:** August 08, 2023, 10:26:57 PM »

AI isn't 'comp' use.

Getty's fine print only restricts the right to reproduce their images, because, until now, that was the only way to use them. Legal fine print is tricky. If it doesn't expressly prohibit something, then is it okay?

___

Quote

Well, they're not a non-profit anymore. But if they want to avoid disrupting the graphic design industry, all they have to do is revise their license to prohibit material produced by AI being used for commercial purposes. Problem solved.

Well, sort of, but that would also limit medical advances, engineering use, making our cars safer, and maybe even curing cancer. AI is all or nothing. We can't have some infringement be okay but not others. AI is going to wreck our labor force in the coming years while simultaneously improving the human condition in ways we can't even imagine yet. It might be fifty years from now, it might be by Christmas. How we survive it will come down to educating folks, having a responsive government and embracing the massive social change that's going to be necessary.

Bill Hiatt · « **Reply #25 on:** August 09, 2023, 12:26:41 AM »

Quote

Getty's fine print only restricts the right to reproduce their images, because, until now, that was the only way to use them. Legal fine print is tricky. If it doesn't expressly prohibit something, then is it okay?

I'm not a lawyer, but I don't think that's the way copyright law works. In any case, Getty anticipated just that argument.

Quote

No other rights or warranties are granted for comp use.

That seems to close off any argument that something which isn't expressly prohibited can be done. The only other permitted use is in the form of an embed. I'd sure hate to be a lawyer having to argue that by allowing embeds, Getty images opened the door to database use.

As far as I can tell, AI's training method is only valid under a very expansive interpretation of fair use at a time when the Supreme Court seems to be trying to rein in just such expansive interpretations.

Fair use is also problematic if the copies of works used in the training database were not obtained legally. As I've asked before, do we think the AI developers bought copies of all the books used in the training database? And if they did, did they have some way of incorporating them in the database without violating DMCA? In Copyright Clarity, Hobbs addresses the point that a fair use claim is invalid if the work is obtained illegally. She is talking about copyright in education specifically, but if educators can't claim fair use of material obtained illegally, I don't know on what basis anyone else could claim it.

Also in the educational context, fair use can't be claimed if the source of the material isn't cited. Have the AI developers released a complete list of materials used? Again, if educators can't claim fair use on unacknowledged borrowing, it's difficult to see the rationale for allowing someone else to do it.

Quote

Well, sort of, but that would also limit medical advances, engineering use, making our cars safer, and maybe even curing cancer. AI is all or nothing. We can't have some infringement be okay but not others.

I'm not at all sure that AI is all or nothing. All that needs to happen to clear the legal hurdles is that material used to train AI needs to be licensed and/or otherwise used in a
way consistent with copyright law. For instance, if medical research needs to be digested for the AI to make the medical advances you're talking about, why not just have the rights holders for the relevant studies work out an agreement for profit sharing of any breakthroughs achieved by an AI model trained on protected IP? Also, I'm not sure how fair use works for medical studies, but researchers in general seem happy to share their work, particularly with other researchers working on the same problems. (It's hard for science to advance without cooperation among scientists.) Sure, lots of different companies want to have that cancer cure, but any individual company would take a lot longer to develop one. If AI can really do the job, companies would make more in the long run by letting it come up with the cure much faster, even if that means profit sharing among several companies. Similarly, car companies all benefit from auto safety advances (and presumably fewer product liability lawsuits). Sure, one company might like to hold the patent, but companies in general would still save more money if they allowed a cooperative solution generated by AI. Better a smaller share of profits in the new advance right now than a possible larger share decades from now.

All of that can happen without having AI invade creative fields. Lots more rights holders are involved, and if some don't want to participate, they should be able to opt out. If screenwriters want to use contract negotiations to ban AI altogether, so be it. None of that stops cancer from being cured. AI is not all or nothing.

Some companies are already embracing a licensing model that involves pay creators royalties on their work, to the extent that it is incorporated in AI products.

When I found out Shutterstock was now selling AI-generated images, I almost closed my account with them. But then I discovered this: https://support.submit.shutterstock.com/s/article/Shutterstock-ai-and-Computer-Vision-Contributor-FAQ?language=en_US
Basically, Shutterstock is championing "responsible AI." Their AI is trained on images licensed from them, and the creators of those images get paid when customers license AI images based in part on theirs. Sure, since every Ai image may have several human contributors, the payment for each creator isn't going to be as big as when someone licenses one of the creators' own images. But over time, they could receive a reasonable amount of money from that arrangement--some of it from customers who wouldn't necessarily have licensed one of their unaltered images. (The AI stuff is designed for people who want a very specific image, something like X number of people arranged in a particular way in front of a forest with a certain kind of tree in it. In that example, some creators who do a lot of people photographs and some who specialize in nature photographs would all profit. Maybe none of them would have if none of them had what the customer was looking for.)

Shutterstock also indemnifies companies who use its AI images against lawsuits. It can do that because it knows it has followed copyright law to the letter.

Think of how much better off Open AI and other companies would be right now if they'd followed such a pattern themselves. As it is, they may end up spending millions on court costs for litigation that was totally avoidable--even if they win. If they lose or get a mixed result, their costs could be much higher.

I think a responsible AI model has a better change of adoption than the massive kinds of changes you must be anticipating to compensate for the millions of unemployed people, among other things. It's possible a structure to deal with issues like that might develop, but given the current state of politics, I can't see it happening in the short. I can see a licensing model developing in the short term, however.

PJ Post · « **Reply #26 on:** August 11, 2023, 01:18:36 AM »

I think the problem comes in, not with the specific training for any specialized use, but with the foundational training, the basic vocabulary required for AI to function at all.

AI isn't training on any specific IP so that it can copy it, it's training at a very granular level, words and sentences, to understand context and how communication works. For images, it also trains on techniques and media and styles and technology, again, so that it can communicate with the end user. Sky and blue and cerulean can be used interchangeably, but they can also mean completely different things. It depends on the context. And for AI to learn that, it needs to train on ridiculous numbers of data sets.

AI wasn't designed to pirate anyone's IP. That's what humans do, and they've still never needed AI to do it before.

I think most of these lawsuits are more cash grab than genuine concern for the future of the Arts.

TimothyEllis · « **Reply #27 on:** August 11, 2023, 01:27:55 AM »

Quote from: PJ Post on August 11, 2023, 01:18:36 AM

AI wasn't designed to pirate anyone's IP.

And yet, like just about everything before it, that's what people use it for.

It can be perverted to do just that, so that's what people do with it, in the name of the big buck.

The fact it wasn't designed to do it is irrelevant.

What is relevant is the programming never included preventing it being used that way.

Like everything else invented used for a different purpose than it was designed for.

Programmers saying 'My Bad! We didn't design it for that.' doesn't cut it. They need to be forced to put restraints in place, as they would have if they'd considered any moral and ethical boundaries at all in the original designs.

The stupid thing is, they built in 'woke madness' into the design, but not ethical use.

Bill Hiatt · « **Reply #28 on:** August 11, 2023, 04:06:58 AM »

Basic language training could be done without using other people's IP in the training process. AI doesn't need to be able to write novels to cure cancer. Nor does it need to be able to generate images to do so. The worthy goals that PJ is talking about are all achievable with a much different kind of training.

I'm willing to concede that the developers may not have thought about some of the wrinkles in the early stages of the process. Perhaps now that the wrinkles have developed, they will, oh, I don't know, maybe do something about them? Because no one can claim ignorance at this point. Also, copyright law makes no provision for accidental infringement. Stuff AI with protected IP and then act surprised when that AI is used for infringement? From my point of view, the infringement is baked into the training process. That defense is a little like saying, "Well, yes, we did leave all kinds of IP lying around with no protection, but we didn't expect anyone to steal it."

Yes, people have infringed in the past. That doesn't mean we should make it even easier, which seems to be one of the admittedly accidental results of AI.

Meanwhile, there's still the problem of misinformation.
https://www.techradar.com/pro/chatgpt-is-a-bad-knowledge-base-confirms-new-study
Despite training on a lot of high-quality sources, AI's output is often not high quality.

A recent Purdue study fed ChatGPT 517 questions, then checked its answers. It was wrong 52% of the time. But that isn't even the biggest problem.

Quote

ChatGPT is often treated as infallible, even though it absolutely isn’t, because of the way answers are presented. The study found that even correct answers addressed all aspects of the question 65% of the time, and users often accepted incorrect information as truth because of “comprehensive, well-articulated, and humanoid” sounding responses.

In other words, AI is good at making wrong answers sound plausible. This isn't a plus.

True, there's incorrect information all over the internet. But there are also higher quality sources as well. But programs like ChatGPT don't generally cite their sources, so it's more difficult to apply tradition criteria to distinguish good answers from bad ones. And when ChatGPT does cite sources, it sometimes makes them up, as I've noted before.

In other articles, TechRadar has also pointed out that AI has enabled hackers to write malware faster and is useful in propagating ransomware attacks. Like infringement, AI didn't invent these problems. It just made them worse.

I agree that AI does also have positive implications. But it clearly also needs to have many more safeguards built in. At the moment, it's doing more harm than good.

I am not a programmer, but I think further developments need to be made with a clearer idea of what the goals are and how to achieve those goals without as many negative side effects as currently exist. Developers now have more data to work with than they did before, so perhaps that isn't an unreasonable expectation.

For the moment, it might also be good to focus more on assistive AI rather than generative AI. (The former is less likely to result in job loss than the latter.)

News:

Author Topic: Authors Join the Brewing Legal Battle Over AI (Read 2140 times)