OpenAI Introduces GPTBot: A New Web Crawler for Data Access with Opt-Out Options

Zach Anderson Aug 08, 2023 22:30 UTC 14:30

2 Min Read

OpenAI has introduced a new web crawler named GPTBot, designed to access data from various websites to potentially enhance its large language models, such as ChatGPT 4, and possibly gather data for future models like GPT-5. The information was detailed on OpenAI's official documentation page and reported by Indian Express on an unspecified date.

The GPTBot user agent can be identified by the following string: `Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)`. The web pages crawled by GPTBot are filtered to exclude sources that require paywall access, are known to gather personally identifiable information (PII), or contain text that violates OpenAI's policies.

The intention behind GPTBot is to use sources that are freely available, comply with OpenAI's guidelines, and do not collect any personal information from users. By allowing GPTBot to access their sites, publishers contribute data to OpenAI's existing and future models, potentially improving the accuracy and capabilities of AI chatbots.

However, concerns regarding privacy and security may arise. OpenAI has addressed this by providing an option for publishers to opt out of the process. They can disallow GPTBot from accessing their site by adding the following line to their site's robots.txt file: `User-agent: GPTBot Disallow: /`. Additionally, publishers can specify which parts of their website will be accessible and which ones will not.

The introduction of GPTBot represents a step towards enhancing AI models by utilizing publicly available web data. While it offers potential benefits in terms of AI advancement, it also raises questions about privacy and the control publishers have over their data. OpenAI's decision to provide an opt-out option reflects an acknowledgment of these concerns and an effort to balance technological progress with ethical considerations.

News ▸

OpenAI Introduces GPTBot: A New Web Crawler for Data Access with Opt-Out Options

Image source: Shutterstock

Read More

Crypto Exchange Bitstamp to Suspend Trading AXS, CHZ, MANA, MATIC, NEAR, SAND, and SOL

Bitcoin Miner Bitfarms Mined 1,223 BTC but with $25 Million Net Loss in Q2 2023

Coinbase Announces Cash Tender Offer for $150 Million of 3.625% Senior Notes Due 2031

Cathie Wood: SEC to Approve Multiple Bitcoin ETFs Simultaneously

Hong Kong's SFC Warns of Improper Practices by Unlicensed Virtual Asset Trading Platforms