New York Times vs. OpenAI: The Epic Showdown Over Content Theft

December 27, 2023 Doug Erickson

Ladies and gentlemen, hold onto your hats because we've got some spicy news coming right at you! The New York Times has slapped OpenAI and Microsoft with a lawsuit, accusing them of swiping millions of pages of their content to fuel the marvel that is ChatGPT. That's right, the charges are about as clear as day, and it's shaping up to be one epic showdown in the legal arena.

Now, you might be wondering, "Why the fuss?" Well, most language models, like ChatGPT, typically train on an open-source project known as "Common Crawl." It's like the Google search index, but it's free for all – with a few strings attached, of course. According to Common Crawl's terms of use, if you want to use their indexed data, you've got to play nice and follow the terms of service set by the content owners. Seems fair, right?

But here's the kicker – it appears that some tech folks out there have developed a rather nonchalant attitude toward intellectual property. It's almost like a wild west mentality – "If you can index it, it's yours." Frankly, that's absurd and a real punch in the gut to content creators. It all started with Napster back in the day and extended to Google's way of handling other people's content a couple of decades ago.

Ancient History

Google's stance was basically, "If you don't want to be in the index, just let us know in your robot.txt file!" Simple, right? But Google became such a behemoth, so fast, that it started sending an avalanche of traffic to sites like the New York Times. The publishing industry, in its blissful naivety, never considered how devastatingly useless Google would be if the top publications collectively gave them the cold shoulder.

So, Google played its cards cleverly, using "fair use" laws to make publishers feel like "a ton of traffic for a snippet of content is a fair deal." But make no mistake, it gutted the publishing industry and left society with a whole lot of downstream issues (a topic for another day).

Fast forward to today, the publishing industry isn't the same bumbling, helpless entity it once was. The New York Times, for instance, has transformed itself into a subscription powerhouse, competing across various domains beyond just news.

Judo Tactics

This time around, they're not rolling over; they're gearing up for an all-out brawl. And mark my words, they're going to win. OpenAI can't pull a Google and make the "fair use" argument because ChatGPT doesn't send traffic to publishers; it just hands out answers based on all the content it's devoured. The New York Times isn't having any of it, and they've brought receipts.

It's going to be a costly affair for OpenAI and Microsoft, with predictions of a settlement in the hundreds of millions, perhaps even billions. The evidence is damning; they've been caught red-handed. For instance, OpenAI allegedly nabbed the Wirecutter's intellectual property and shamelessly produced results based on it, sans any links back, and that's a real blow because it strips away those sweet affiliate links that fund content creation. Game over, OpenAI and Microsoft, game over.

Folks, grab your popcorn because this lawsuit is about to take us on a legal rollercoaster, and it's one for the history books!