Perplexity is a fascinating AI application that I sometimes use to read the news, but it has me worried for a while now.
It takes a Wikipedia-like approach to compilation of news and information: it takes verified news sources and compiles the information in a format that is simple and easy to understand, with all the key facts. It also allows you to ask questions, answers to which it creates on the fly, based once again, on sources.
A recent Forbes article calls Perplexity’s aggregation and generation a “cynical theft,” pointing out that:
Perplexity had taken our work, without our permission, and republished it across multiple platforms — web, video, mobile — as though it were itself a media outlet.
Google has also rolled out its AI summaries feature, where it seeks to address queries we ask while searching by aggregating information from sources. AI summaries pose some key challenges for society and news publishers:
Firstly, both Google AI summaries or Perplexity have issues with accuracy. Unlike Wikipedia they don’t have human oversight or verification, and can pick up unreliable or unverified information from articles as facts. A user who doesn’t know better, or can’t easily spot issues, may accept that as news.
For news publishers, there is an issue of their work being used as raw material to generate these summaries, without any compensation. Facts are not protected by copyright because their dissemination is an essential public service, and there’s greater benefit for society if facts can be re-reported. There is a concept in copyright called “Fair Usage”, which enables this, and also depends on whether the information copied is a part of a broader composition, rather than just a copy. Basically, you need to add your own value if you’re copying. Perplexity and Google’s AI summaries add zero original value: they add no reporting, no original context, and literally nothing that makes it theirs. It is not copyright violation, but as the Forbes article pointed out, it is plagiarism.
A reaction from news publishers is warranted: such tools are extractive and cannibalistic, rather than value additive for publishers. They end up reducing the need for someone to visit the source of that information, effectively stealing their audience and means of monetization via advertising and subscription. This illustrates a gap in copyright law that needs to be addressed to ensure incentives for news organisations. If sources of news die, what will Perplexity or Google AI summaries copy news from? Without news reporting, all you’ll be left with is unverified tweets and Twitter’s fairly disastrous crowdsourced reader reviews.
One might argue that Wikipedia does the same thing as these AI services, in terms of aggregating information from sources, and linking out to them. Wikipedia is an encyclopedia, and not a source of news: it doesn’t replace a news article or cannibalise a publications audience. Instead it enables discovery of news sources for more context, while Perplexity hides them being an additional tab. and actually enables discovery of news sources. Wikipedia doesn’t include the complete context for a development while Perplexity aggregates context from its news sources.
Pranesh Prakash, former Research Director at the Centre for Internet and Society, likened such AI tools to a human being reading something and sharing that information in their own words. A couple of issues with this line of thinking. Firstly, just because something is public to read doesn’t mean that it’s open to copying, including for training Large Language (AI) Models. There needs to be a permission check. Much of GenAI is built on taking information from third parties without permission for training AI models. Secondly, the mass accumulation of content for training without permission or compensation cannot be treated the same way as human learning: there is a power law applicable here, and with ability to ingest, learn and replicate (even if not verbatim) at scale: the impact is disproportionate and exponential.
Perplexity also positions itself like an “answer engine”, similar to a search engine. This is a convenient positioning of the platform. Publishers choose to allow search engines to scrape their work because they index the content like a library does, and directs users to the appropriate website: it’s a symbiotic relationship, because it doesn’t replace the source of the information. In fact, publishers use tools to optimise their articles so that it’s easier for users to discover their content. They try to aid search engines in enhancing the ability of a user to make a decision to go and read something on a website or an app.
Perplexity and AI summaries do exactly the opposite: address the users’ needs directly, thus alleviating the desire to go to the source. The user and the “answer” engines win, but the publisher loses. This means that publishers cannot monetize the users presence via subscription or advertising, or encourage repeat usage via newsletters and apps. It can be argued that publishers can exclude themselves from indexing by AI bots which are scraping their content, but quite often AI bots refuse to honor codes that are meant to prevent such scraping. Huggingface, a repository of AI models, has added over 1500 new models per over the past two months: even if 10 per day are for language models, it’s a significant task for publishers to exclude such bots daily.
There is a risk that this might be seen as a big-tech vs publishers issue, and treat it the same way as the “link tax” imposed on Google and Meta in Canada and Australia, forcing them to pay publishers for links aggregated or added by users, but it isn’t. That idea is antithetical to how the Internet functions as an interconnection of links, and the pay-it-back approach of linking out.
Such an approach adds value to publishers. Instead of targeting platforms for a service that adds value, perhaps publishers are better served focusing on AI summaries that end up cannibalising their work.
Signing content licensing deals with AI companies is an exercise in feeding the beast that is devouring them.
(An edited version of this article was published in The Economic Times as a part of my TechNik columns)