Amazon Web Services has started an investigation to determine whether Perplexity AI is breaking its rules, according to Wired. To, be precise, the company's cloud division is looking into allegations that the service is using a crawler, which is hosted on its servers, that ignores the Robots Exclusion Protocol. This protocol is a web standard, wherein developers put a robots.txt file on a domain containing instructions on whether bots can or can't access a particular page. Complying with those instructions is voluntary, but crawlers from reputable companies have generally been respecting them since web developers started implementing the standard in the '90s.
In an earlier piece, Wired reported that it discovered a virtual machine that was bypassing its website's robots.txt instructions. That machine was hosted on an Amazon Web Services server using the IP address 44.221.181.252 that's "certainly operated by Perplexity." It reportedly visited other Condé Nast properties hundreds of times over the past three months to scrape their content, as well. The Guardian, Forbes and The New York Times had also detected it visiting their publications multiple times, Wired said. To confirm whether Perplexity truly was scraping its content, Wired entered headlines or short descriptions of its articles into the company's chatbot. The tool then responded with results that closely paraphrased its articles "with minimal attribution."
A recent Reuters report claimed that Perplexity isn't the only AI company that's bypassing robots.txt files to gather content used to train large language models. However, Amazon's investigation seems to be focused on Perplexity AI only. An Amazon spokesperson told Wired that its customers have to comply with robots.txt instructions when crawling websites. "AWS’s terms of service prohibit customers from using our services for any illegal activity, and our customers are responsible for complying with our terms and all applicable laws," they said.
Perplexity spokesperson Sara Platnick told Wired that the company has already responded to Amazon's inquiries and denied that its crawlers are bypassing the Robots Exclusion Protocol. "Our PerplexityBot — which runs on AWS — respects robots.txt, and we confirmed that Perplexity-controlled services are not crawling in any way that violates AWS Terms of Service," she said. Platnick admitted, however, that PerplexityBot will ignore robots.text when a user includes a specific URL in their chatbot inquiry.
Aravind Srinivas, the CEO of Perplexity, also previously denied that his company is "ignoring the Robot Exclusions Protocol and then lying about it." Srinivas did admit to Fast Company that Perplexity uses third-party web crawlers on top of its own, and that the bot Wired identified was one of them.
This article originally appeared on Engadget at https://www.engadget.com/amazon-investigating-perplexity-ai-after-accusations-it-scrapes-websites-without-consent-133003374.html?src=rss https://www.engadget.com/amazon-investigating-perplexity-ai-after-accusations-it-scrapes-websites-without-consent-133003374.html?src=rssLogin to add comment
Other posts in this group
![Artists criticize Apple's lack of transparency around Apple Intelligence data](https://www.cdn5.niftycent.com/a/k/J/B/L/d/q/artists-criticize-apple-s-lack-of-transparency-around-apple-intelligence-data.webp)
![Early Prime Day deals bring the Samsung Galaxy S9+ tablet down to a record-low price](https://www.cdn5.niftycent.com/a/k/W/r/j/N/q/early-prime-day-deals-bring-the-samsung-galaxy-s9-tablet-down-to-a-record-low-price.webp)
Investing in a new tablet can be costly but early Prime Day deals are making it a bit more reasonable. Ahead of
![The Morning After: Google’s greenhouse gas emissions climbed nearly 50 percent in five years due to AI](https://www.cdn5.niftycent.com/a/D/3/W/g/y/A/the-morning-after-google-s-greenhouse-gas-emissions-climbed-nearly-50-percent-in-five-years-due-to-ai.webp)
![Proton launches its own version of Google Docs](https://www.cdn5.niftycent.com/a/D/v/v/L/7/m/proton-launches-its-own-version-of-google-docs.webp)
Proton now has its own version of Google Docs in i
![DJI further diversifies from drones with the Avinox e-bike drive system](https://www.cdn5.niftycent.com/a/1/g/o/a/3/4/dji-further-diversifies-from-drones-with-the-avinox-e-bike-drive-system.webp)
With a US ban of its hyper-popular drones more likely than not, DJI has been diversifying
![Neon White, Tchia and a bunch of other titles are coming to Game Pass this month](https://www.cdn5.niftycent.com/a/1/V/5/M/d/P/neon-white-tchia-and-a-bunch-of-other-titles-are-coming-to-game-pass-this-month.webp)
A number of pretty good titles are coming to Microsoft's Game Pass this month for Xbox
![Texas age-verification law for pornography websites is going to the Supreme Court](https://www.cdn5.niftycent.com/a/D/y/Y/L/Y/q/texas-age-verification-law-for-pornography-websites-is-going-to-the-supreme-court.webp)
Texas will be the main battleground for a case about porn websites that is now headed to the Supreme Court. The Free Speech Coalition, a nonprofit group that represents the adult industry, petition