The AI search startup Perplexity is in hot water in the wake of a Wired investigation revealing that the startup has been crawling content from websites that don’t want to be crawled.
Perplexity’s “answer engine” works by crawling large swaths of information on the web and then creating a big database (an index) of content it grabs from web pages. Instead of typing keywords into a search box, users type or speak questions into Perplexity’s web portal or mobile apps, and receive a narrative answer with citations and links to the web content it draws upon.
Websites can use something called a Robots Exclusion Protocol to keep their content away from web crawlers, which bots are supposed to honor, though compliance is voluntary. Wired, along with an independent researcher, says it has proof that Perplexity has been ignoring those codes and scraping content from off-limits sites anyway.
“Perplexity is not ignoring the Robot Exclusions Protocol and then lying about it,” said Perplexity cofounder and CEO Aravind Srinivas in a phone interview Friday. “I think there is a basic misunderstanding of the way this works,” Srinivas said. “We don’t just rely on our own web crawlers, we rely on third-party web crawlers as well.”
Srinivas said the mysterious web crawler that Wired identified was not owned by Perplexity, but by a third-party provider of web crawling and indexing services. Srinivas would not say the name of the third-party provider, citing a Nondisclosure Agreement. Asked if Perplexity immediately called the third-parter crawler to tell them to stop crawling Wired content, Srinivas was non-committal. “It’s complicated,” he said.
Srinivas also noted that the Robot Exclusion Protocol, which was first proposed in 1994, is “not a legal framework.” He suggested that the emergence of AI requires a new kind of working relationship between content creators, or publishers, and sites like his.
Wired also claims that it was able to get the Perplexity answer engine to closely paraphrase Wired articles by prompting the tool with the headlines or substance of Wired articles. At times Perplexity even paraphrased the Wired stories incorrectly. In one case, the Perplexity “answer” falsely claimed that a California police officer had committed a crime.
Srinivas suggested that Wired used prompts designed to get the Perplexity tool to behave that way, and that normal users wouldn’t see those kinds of results. “We have never said that we have never hallucinated,” he added.
Earlier in June, Forbes accused Perplexity of stealing its content. Perplexity had released a new product called “Pages” in May that lets a user create an article or blog post based on a series of questions they’ve asked the answer engine, or based on a single prompt on a specific subject. Users can add AI-generated or uploaded images, then tweak the text or add formatting before publishing to the web. One of Perplexity’s own Pages used content from a Forbes scoop but didn’t credit the publisher. Perplexity even created an AI-voiced podcast based on the Forbes reporting, but again didn’t credit the site.
Being fastidious about citing sources has been one of the Perplexity’s core principles since launch—which made the potential omission of citations in the Pages product even more glaring. Srinivas told Fast Company that after Forbes raised the issue, his company immediately pushed out an update to Pages that puts attributions within the text of the generated article.
Srinivas frequently says that his product will only be good as the internet ecosystem that it draws from. “We are happy creating a less-market cap, lower-margin business, as long as we are profitable and successful—and [we] make sure that the whole internet wins,” he told the audience at a Fast Company’s Most Innovative Companies Gala in May. “Perplexity would be useless if people were not able to create new content on the web.”
He has said that the company is now working on ‘revenue-sharing’ agreements with selected publishers. The publishers have not been named, so no telling if Conde Nast (Wired’s owner) or Forbes is involved in the initiative. The content crawling and indexing issues that Wired turned up could force the company to accelerate its plans to cut fair deals with publishers.
Despite publishers’ wariness, there’s still a lot of good will for Perplexity, which is taking on the unenviable task of challenging Google with a new kind of search. But it can’t afford to squander much more of it.
Melden Sie sich an, um einen Kommentar hinzuzufügen
Andere Beiträge in dieser Gruppe

At a press conference in the Oval Office earlier this month, Elon Musk—a billionaire who is not, at least formally, the President of the United States—was asked how the Department of Government Ef

Last Energy, a nuclear upstart backed by an Elon Musk-linked venture capital fund, says it plans to construct 30 microreactors on a site in Texas to supply electricity to data centers across the s

Democratic lawmakers demanded answers from billionaire Elon Musk’s Department of Govern


For years, the creator economy has become increasingly accepted as the future of media. These days, makeup tutorials on TikTok could have the same impact for a brand as a multi-million dollar mark

For more than two decades, users have turned to search engines like Google, typed in a query, and received a familiar list of 10 blue links—the gateway to the wider web. Ranking high on that list,

The sky is about to get a lot clearer.
NASA’s latest infrared space telescope, SPHEREx—short for Spectro-Photometer for the Histo