AI’s free web scraping days may be over, thanks to this new licensing protocol

iweta0077/iStock/Getty Images Plus

Follow ZDNET: Add us as a preferred source on Google.


ZDNET’s key takeaways

  • Media companies announced a new web protocol: RSL.
  • RSL aims to put publishers back in the driver’s seat.
  • The RSL Collective will attempt to set pricing for content. 

AI companies are capturing as much content as possible from websites while also extracting information. Now, several heavyweight publishers and tech companies — Reddit, Yahoo, People, O’Reilly Media, Medium, and Ziff Davis (ZDNET’s parent company) — have developed a response: the Really Simple Licensing (RSL) standard. 

You can think of RSL as Really Simple Syndication’s (RSS) younger, tougher brother. While RSS is about syndication, getting your words, stories, and videos out onto the wider web, RSL says: “If you’re an AI crawler gobbling up my content, you don’t just get to eat for free anymore.”

Also: AI’s not ‘reasoning’ at all – how this team debunked the industry hype

The idea behind RSL is brutally simple. Instead of the old robots.txt file — which only said, “yes, you can crawl me,” or “no, you can’t,” and which AI companies often ignore — publishers can now add something new: machine-readable licensing terms. 

Want an attribution? You can demand it. Want payment every time an AI crawler ingests your work, or even every time it spits out an answer powered by your article? Yep, there’s a tag for that too. 

This approach allows publishers to define whether their content is free to crawl, requires a subscription, or will cost “per inference,” that is, every time ChatGPT, Gemini, or any other model uses content to generate a reply.

What RSL offers

The key capabilities of RSL include:

  • A shared vocabulary that lets publishers define licensing and compensation terms, including free, attribution, pay-per-crawl, and pay-per-inference compensation.
  • An open protocol to automate content licensing and create internet-scale licensing ecosystems between content owners and AI companies.
  • Standardized, public catalogs of licensable content and datasets through RSS and Schema.org metadata.
  • An open protocol for encrypting digital assets to securely license non-public proprietary content, including paywalled articles, books, videos, and training datasets.
  • Supporting collective licensing via RSL Collective or any other RSL-compatible licensing server.

It’s a clever fix for a complex problem. As Tim O’Reilly, the O’Reilly Media CEO and one of the RSL initiative’s high-profile backers, said: “RSS was critical to the internet’s evolution…but today, as AI systems absorb and repurpose that same content without permission or compensation, the rules need to evolve. RSL is that evolution.” 

O’Reilly’s right. RSS helped the early web scale, whether blogs, news syndication, or podcasts. But today’s web isn’t just competing for human eyeballs. The web is now competing to supply the training and reasoning fuel for AI models that, so far, aren’t exactly paying the bills for the sites they’re built on.

Of course, tech is one thing; business is another. That’s where the RSL Collective comes in. Modeled on music’s ASCAP and BMI, the nonprofit is essentially a rights-management clearinghouse for publishers and creators. Join for free, pool your rights, and let the Collective negotiate with AI companies to ensure you’re compensated.

Also: DeepSeek may be about to shake up the AI world again – what we know

As anyone in publishing knows, a lone freelancer, or most media outlets for that matter, has about as much leverage against the likes of OpenAI or Google as a soap bubble in a wind tunnel. But a collective that represents “the millions” of online creators suddenly has some bargaining power.

(Disclosure: Ziff Davis, ZDNET’s parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.)

How we got here

Let’s step back. For the last few years, AI has been snacking on the internet’s content buffet with zero cover charge. That approach worked when the web’s economics were primarily driven by advertising. However, those days are history. The old web ad model has left publishers gutted while generative AI companies raise billions in funding. 

So, RSL wants to bolt a licensing framework directly into the web’s plumbing. And because RSL is an open protocol, just like RSS, anyone can use it. From a giant outlet like Yahoo to a niche recipe blogger, RSL allows web publishers to spell out what they want in return when AI comes crawling.

Also: 5 ways to fill the AI skills gap in your business

The work of guiding RSL falls to the RSL Technical Steering Committee, which reads like a who’s who of the web’s protocol architects: Eckart Walther, co-author of RSS; RV Guha, Schema.org and RSS; Tim O’Reilly; Stephane Koenig, Yahoo; and Simon Wistow, Fastly.

The web has always run on invisible standards such as HTTP, HTML, RSS, and robots.txt. In Web 1.0, social contracts were written into code. If RSL catches on, it may be the next layer in that lineage: the one that finally gives human creators a fighting chance in the AI economy.

And maybe, just maybe, RSL will stop the AI feast from becoming an all-you-can-eat buffet with no one left to cook.



Original Source: zdnet

Leave a Reply

Your email address will not be published. Required fields are marked *