Reference · Updated Quarterly

AI Tool Data Access Comparison 2026

Last updated: May 24, 2026 · Next update: August 24, 2026

What this page is

A side-by-side comparison of what each major consumer AI tool reads, retains, and trains on. Eight tools tracked: ChatGPT, Claude, Gemini, Copilot, Perplexity, Grok, Codex, Cursor. This is the question almost no major tech publication answers in one place — partly because the answers are buried in dense vendor legal documents, partly because the answers change without notice, partly because most AI coverage is about product features rather than the data trade those features are built on.

If you arrived here from a search engine asking "is ChatGPT safe for work email," "does Claude train on my data," "what does Gemini Spark read," or "AI tool privacy comparison," this page exists for that question.

The data trade is the price of the free tier. On paid tiers, the data trade narrows but does not disappear. Read carefully before deciding what to paste, upload, or grant access to.


The data-access taxonomy

To compare tools fairly, this page measures five dimensions per tool:

  1. What data the tool reads at the moment you use it. The conversation, the file you uploaded, the URL you pasted, the calendar event, the inbox message. Surface-level inputs.
  2. What data the tool retains after the session ends. Conversation history, uploaded files, embeddings of your inputs.
  3. Whether retained data is used to train the underlying model. Training-use is a different category from retention. A tool can retain conversations for product improvement without using them for model training.
  4. Whether an opt-out from training-use exists, and what it covers. Opt-outs vary in scope. Some are global (apply to all input). Some are conversational (apply only to chats marked private). Some are tier-conditional (free tier no opt-out, paid tier opt-out available).
  5. What standing access the tool maintains when not actively in use. Background processes, scheduled checks, persistent connections to your inbox/calendar/files. This is the dimension Gemini Spark introduced into the consumer market in 2026.

The comparison table

Tool (tier) Reads Retains Trains on it Opt-out available Standing access
ChatGPT Free Conversation, uploaded files, web browsing context Conversations stored by default Yes, by default Yes — Settings → Data Controls → "Improve the model for everyone" off None
ChatGPT Plus / Pro Same as free + larger uploads Same Same default + opt-out per the same setting Same toggle as free tier None
ChatGPT Team Same as Plus Same No — training-use disabled by default for Team accounts Default-off, no toggle needed None
Claude Free Conversation, uploaded files, web search context Conversations stored by default [VERIFY FROM VENDOR TERMS] [VERIFY FROM VENDOR TERMS] None
Claude Pro / Max Same as free + larger limits Same Same posture as free Same opt-out None
Claude Team Same as Pro Same No — training-use disabled by default for Team accounts Default-off None
Gemini Free Conversation, basic file uploads, Google account context Stored under standard Google data terms Yes under standard Google terms unless opted out at myactivity.google.com Yes — Google Account → Data & privacy → Gemini Apps Activity Google account-wide signals present by default; Gemini Free does not reach into Gmail / Calendar
Gemini → Google AI Same as free + Daily Brief reads Gmail and Calendar Same Same opt-out path Same as free Daily Brief reads Gmail and Calendar on schedule when enabled
Gemini → Google AI Ultra (Spark) Same as Google AI + Gemini Spark reads Gmail continuously when enabled Same Same opt-out path; opt-out covers training but does not disable inbox reading Training opt-out covers training; Spark's inbox-read is the feature, not optional Spark maintains 24/7 cloud-process access to inbox while enabled, even when phone is locked
Copilot Free Conversation, web search context [VERIFY FROM VENDOR TERMS] [VERIFY FROM VENDOR TERMS] [VERIFY FROM VENDOR TERMS] None
Copilot Pro Same as free Same Same Same None
Microsoft 365 Copilot Reads tenant Office 365 content (mail, files, calendar, teams chats) within Microsoft Graph permissions Per tenant data residency policy No — Microsoft commits to not training foundation models on tenant content Default-off for training; tenant-controlled retention Active access to M365 content while user is authenticated
Perplexity Free Query, web search context [VERIFY FROM VENDOR TERMS] [VERIFY FROM VENDOR TERMS] [VERIFY FROM VENDOR TERMS] None
Perplexity Pro Same as free + file uploads Same Same Same None
Grok Free / Premium / Premium+ Conversation + X profile data, posts, and engagement history by default Stored under X / xAI data terms Yes by default under xAI terms [VERIFY FROM VENDOR TERMS] None — but X profile data is read on each session
Codex Free Code submitted, file context provided Stored by default Yes by default Same opt-out as ChatGPT Plus None
Codex on ChatGPT Plus Same Same Same Same None
Cursor Free Code files in the editor, prompts [VERIFY FROM VENDOR TERMS] [VERIFY FROM VENDOR TERMS] Privacy Mode setting controls training-use None
Cursor Pro / Business Same Same No when Privacy Mode is enabled Privacy Mode toggle None

What "training opt-out" actually covers

The single most misunderstood line item in AI tool terms is what a training-use opt-out actually does. Three patterns are common:

Pattern 1 — Training opt-out covers training only, not retention. ChatGPT, Claude, Gemini, and Codex all work this way. Turning off "use my data to improve the model" stops the inputs from being added to future model training runs. It does not stop the inputs from being stored on vendor servers, processed for safety/abuse detection, or used for product analytics. Your data is still there — it is just not in the next model's training corpus.

Pattern 2 — Training opt-out is the only data control offered. Most free tiers fit here. If you want stronger data controls, the answer is upgrading to a paid tier (where Team/Enterprise plans default to no-training) or not using the tool for sensitive work.

Pattern 3 — No training-use happens regardless of toggle. ChatGPT Team, Claude Team, Microsoft 365 Copilot, and Cursor Privacy Mode all fit here. The vendor's commercial commitment is that tenant/team content does not enter training. The toggle is irrelevant because the answer is already no.

The watchdog read: if your work data is sensitive, the right answer is not "I will enable the opt-out." The right answer is "I will use a tier where no-training is the default."


Standing access — the dimension Gemini changed

Through 2025, AI tools were transactional. You opened them, you used them, you closed them. The tool read what you gave it during the session and did nothing when the session ended.

Gemini Spark, launched at Google I/O 2026, changes that. Spark is a 24/7 cloud-based agent with standing access to Gmail. It is reading your inbox in the background continuously — not just when you open the Gemini app. Disabling Spark is possible but disabling inbox-read while keeping Spark is not — the inbox-read is the feature.

Microsoft 365 Copilot has had a different version of standing access since 2024: it can read tenant Office content while the user is authenticated. The user is in control because the user's authentication state controls the access. Spark's posture is different — the cloud agent runs even when the phone is locked.

This is a category shift consumers should evaluate explicitly. Standing access is not the same product as a transactional assistant, and the privacy math is different.


Per use case — what to use, what to avoid

Personal, low-stakes use (drafts, summaries, fun, brainstorming, casual queries): any tool's free or paid tier is fine. The data trade is real but the stakes are low.

Knowledge work under standard employment terms (no NDA, no client confidentiality, just professional work): mid-tier paid plans of ChatGPT, Claude, Perplexity, Cursor, or Copilot Pro. The opt-out is worth enabling. Avoid pasting full client lists, salary data, or internal financial figures regardless.

NDA work, client confidential, regulated industries (legal, healthcare, finance, defense, anything under contractual confidentiality): use Team-tier plans where no-training is the default, or use Microsoft 365 Copilot if your organization is already on M365 with appropriate data classifications. Do not use free or individual paid tiers for this category.

Code with proprietary intellectual property: Cursor Business with Privacy Mode, ChatGPT Team for Codex, or self-hosted/local model. Avoid Codex on free tier for proprietary code — the default data terms include training-use.

Email and calendar integration (Daily Brief, Spark, Microsoft 365 Copilot): evaluate carefully. The productivity value is real but the standing access is a significant ask. For Spark specifically, the data-governance documentation for "what is stored, where, and how long" was not published at launch; waiting for that documentation before subscribing is a reasonable position.


What this page does not tell you

This page covers the consumer-product side of each tool's data terms. It does not cover:

  • API and developer-tier data terms (different defaults, different opt-outs)
  • Enterprise contracts with custom data-residency clauses (negotiated per-deal)
  • Regional regulatory overlays (EU GDPR, UK Data Protection Act, CCPA — each may grant additional rights beyond the vendor's default terms)
  • Government / regulated sector contracts with separate compliance terms

If your situation involves any of these, the consumer-product page above is the starting point, not the answer.


Data currency note

Vendor terms change without announcement. The cells in this page reflect the published terms on the date stamped at the top. Every quarterly refresh re-confirms each cell against the live vendor documentation. If a tool's terms have shifted between refreshes, the freshest reference is the vendor's own privacy policy — linked from each tool's hub page (/tools/{slug}).

OneHuman does not have advance access to vendor term changes. We learn about changes the way consumers do — on the day they happen. The 15-day pricing-sweep cadence catches most material shifts; standing terms changes get reviewed at quarterly refresh.



Independent. AI-assisted. Human-verified. No ads. No affiliates. No investors.