Microsoft AI CEO: Web content (without Robots.txt file) is ‘freeware’ for AI training

Slashdot reader joshuark shared this report from Windows Central

Microsoft may have opened a Pandora’s box with recent comments from the tech giant’s AI CEO Mustafa Suleyman. Speaking to CNBC’s Andrew Ross Sorkin earlier this week at the Aspen Ideas Festival, the CEO claimed that all content shared on the web is available to use for AI training, unless a content producer specifically says otherwise.

The whole discussion was interesting — but this particular question was very direct. The CNBC interviewer specifically said, “There are a number of authors here… and there are also a number of journalists. And it seems like a lot of the information that’s been trained on over the years has come from the web — and some of it is the open web, and some of it isn’t, and we’ve heard stories about how OpenAI has turned YouTube videos into transcripts and then trained on the transcripts.”

The question becomes, “Who should own the IP, who should extract value from the IP, and whether, to put it very bluntly, the AI ​​companies have effectively stolen the world’s IP.” Suleyman begins his answer — at 14:40 — with, “Yeah, I think — look, it’s a very fair argument.”

SULEYMAN: “I think that in terms of content that’s already on the open web, the social contract of that content since the ’90s has been that it’s fair use. Anyone can copy it, recreate it, reproduce it. That was freeware, if you like. That was the deal.

“There’s a separate category where a website or publisher or news organization has explicitly said, ‘Do not scrape or crawl me for any reason other than indexing, so that other people can find that content.’ That’s a gray area and I think that should go through the courts.”

Q: And what does it mean when you say, “It’s a gray area”?

SULEYMAN: “Well, if – so far, some people have taken that information… but that’s going to be a lawsuit, and I think that’s fair…

“You know, look, the economics of information is about to change radically, because we’re going to drive the cost of producing knowledge down to zero marginal cost. And this is just a really hard thing for people to understand — but in 15 or 20 years, we’re going to be producing new scientific cultural knowledge at almost zero marginal cost. It’s going to be broadly open source and available to everyone. And I think that’s going to be a real turning point in the history of our species. Because what are we, collectively, as an organism of human beings, other than an intellectual production machine. We produce knowledge. Our science makes us better. And so what we really want in the world, in my view, are new machines that can drive discovery and invention.”

