knitwitt@lemmy.worldtoMovies and TV Shows@lemmy.film•Grisham, Martin join authors suing OpenAI: “There is nothing fair about this”English
115·
1 year agoIf I took 100 of the world’s best-selling novels, wrote each individual word onto a flashcard, shuffled the entire deck, then created an entirely new novel out of that, (with completely original characters, plot threads, themes, and messaged) could it be said that I produced stolen work?
What if I specifically attempted to emulate the style of the number one author on that list? What if instead of 100 novels, I used 1,000 or 10,000? What if instead of words on flashcards, I wrote down sentences? What if it were letters instead?
At some point, regardless of by what means the changes were derived, a transformed work must pass a threshold whereby content alone it is sufficiently different enough that it can no longer be considered derivative.
I imagine that the easiest way to acquire specific training data for a LLM is to download EBooks from amazon. If a university professor pirates a textbook and then uses extracts from various pages in their lecture slides, the cost of the crime would be the cost of a single textbook. In the case of a novel, GRRM should be entitled to the cost of a set of Ice & Fire if they could prove that the original training material was illegaly pirated instead of legally purchased.
Once a copy of a book is sold, an author typically has no say in how it gets used outside of reproduction.