Reference · Updated Quarterly
AI Tool Data Access Comparison 2026
Last updated: May 24, 2026 · Next update: August 24, 2026
What this page is
A side-by-side comparison of what each major consumer AI tool reads, retains, and trains on. Eight tools tracked: ChatGPT, Claude, Gemini, Copilot, Perplexity, Grok, Codex, Cursor. This is the question almost no major tech publication answers in one place — partly because the answers are buried in dense vendor legal documents, partly because the answers change without notice, partly because most AI coverage is about product features rather than the data trade those features are built on.
If you arrived here from a search engine asking "is ChatGPT safe for work email," "does Claude train on my data," "what does Gemini Spark read," or "AI tool privacy comparison," this page exists for that question.
The data trade is the price of the free tier. On paid tiers, the data trade narrows but does not disappear. Read carefully before deciding what to paste, upload, or grant access to.
The data-access taxonomy
To compare tools fairly, this page measures five dimensions per tool:
- What data the tool reads at the moment you use it. The conversation, the file you uploaded, the URL you pasted, the calendar event, the inbox message. Surface-level inputs.
- What data the tool retains after the session ends. Conversation history, uploaded files, embeddings of your inputs.
- Whether retained data is used to train the underlying model. Training-use is a different category from retention. A tool can retain conversations for product improvement without using them for model training.
- Whether an opt-out from training-use exists, and what it covers. Opt-outs vary in scope. Some are global (apply to all input). Some are conversational (apply only to chats marked private). Some are tier-conditional (free tier no opt-out, paid tier opt-out available).
- What standing access the tool maintains when not actively in use. Background processes, scheduled checks, persistent connections to your inbox/calendar/files. This is the dimension Gemini Spark introduced into the consumer market in 2026.
The comparison table
| Tool (tier) | Reads | Retains | Trains on it | Opt-out available | Standing access |
|---|---|---|---|---|---|
| ChatGPT Free | Conversation, uploaded files, web browsing context | Conversations stored by default | Yes, by default | Yes — Settings → Data Controls → "Improve the model for everyone" off | None |
| ChatGPT Plus / Pro | Same as free + larger uploads | Same | Same default + opt-out per the same setting | Same toggle as free tier | None |
| ChatGPT Team | Same as Plus | Same | No — training-use disabled by default for Team accounts | Default-off, no toggle needed | None |
| Claude Free | Conversation, uploaded files, web search context | Conversations stored by default | [VERIFY FROM VENDOR TERMS] | [VERIFY FROM VENDOR TERMS] | None |
| Claude Pro / Max | Same as free + larger limits | Same | Same posture as free | Same opt-out | None |
| Claude Team | Same as Pro | Same | No — training-use disabled by default for Team accounts | Default-off | None |
| Gemini Free | Conversation, basic file uploads, Google account context | Stored under standard Google data terms | Yes under standard Google terms unless opted out at myactivity.google.com | Yes — Google Account → Data & privacy → Gemini Apps Activity | Google account-wide signals present by default; Gemini Free does not reach into Gmail / Calendar |
| Gemini → Google AI | Same as free + Daily Brief reads Gmail and Calendar | Same | Same opt-out path | Same as free | Daily Brief reads Gmail and Calendar on schedule when enabled |
| Gemini → Google AI Ultra (Spark) | Same as Google AI + Gemini Spark reads Gmail continuously when enabled | Same | Same opt-out path; opt-out covers training but does not disable inbox reading | Training opt-out covers training; Spark's inbox-read is the feature, not optional | Spark maintains 24/7 cloud-process access to inbox while enabled, even when phone is locked |
| Copilot Free | Conversation, web search context | [VERIFY FROM VENDOR TERMS] | [VERIFY FROM VENDOR TERMS] | [VERIFY FROM VENDOR TERMS] | None |
| Copilot Pro | Same as free | Same | Same | Same | None |
| Microsoft 365 Copilot | Reads tenant Office 365 content (mail, files, calendar, teams chats) within Microsoft Graph permissions | Per tenant data residency policy | No — Microsoft commits to not training foundation models on tenant content | Default-off for training; tenant-controlled retention | Active access to M365 content while user is authenticated |
| Perplexity Free | Query, web search context | [VERIFY FROM VENDOR TERMS] | [VERIFY FROM VENDOR TERMS] | [VERIFY FROM VENDOR TERMS] | None |
| Perplexity Pro | Same as free + file uploads | Same | Same | Same | None |
| Grok Free / Premium / Premium+ | Conversation + X profile data, posts, and engagement history by default | Stored under X / xAI data terms | Yes by default under xAI terms | [VERIFY FROM VENDOR TERMS] | None — but X profile data is read on each session |
| Codex Free | Code submitted, file context provided | Stored by default | Yes by default | Same opt-out as ChatGPT Plus | None |
| Codex on ChatGPT Plus | Same | Same | Same | Same | None |
| Cursor Free | Code files in the editor, prompts | [VERIFY FROM VENDOR TERMS] | [VERIFY FROM VENDOR TERMS] | Privacy Mode setting controls training-use | None |
| Cursor Pro / Business | Same | Same | No when Privacy Mode is enabled | Privacy Mode toggle | None |
What "training opt-out" actually covers
The single most misunderstood line item in AI tool terms is what a training-use opt-out actually does. Three patterns are common:
Pattern 1 — Training opt-out covers training only, not retention. ChatGPT, Claude, Gemini, and Codex all work this way. Turning off "use my data to improve the model" stops the inputs from being added to future model training runs. It does not stop the inputs from being stored on vendor servers, processed for safety/abuse detection, or used for product analytics. Your data is still there — it is just not in the next model's training corpus.
Pattern 2 — Training opt-out is the only data control offered. Most free tiers fit here. If you want stronger data controls, the answer is upgrading to a paid tier (where Team/Enterprise plans default to no-training) or not using the tool for sensitive work.
Pattern 3 — No training-use happens regardless of toggle. ChatGPT Team, Claude Team, Microsoft 365 Copilot, and Cursor Privacy Mode all fit here. The vendor's commercial commitment is that tenant/team content does not enter training. The toggle is irrelevant because the answer is already no.
The watchdog read: if your work data is sensitive, the right answer is not "I will enable the opt-out." The right answer is "I will use a tier where no-training is the default."
Standing access — the dimension Gemini changed
Through 2025, AI tools were transactional. You opened them, you used them, you closed them. The tool read what you gave it during the session and did nothing when the session ended.
Gemini Spark, launched at Google I/O 2026, changes that. Spark is a 24/7 cloud-based agent with standing access to Gmail. It is reading your inbox in the background continuously — not just when you open the Gemini app. Disabling Spark is possible but disabling inbox-read while keeping Spark is not — the inbox-read is the feature.
Microsoft 365 Copilot has had a different version of standing access since 2024: it can read tenant Office content while the user is authenticated. The user is in control because the user's authentication state controls the access. Spark's posture is different — the cloud agent runs even when the phone is locked.
This is a category shift consumers should evaluate explicitly. Standing access is not the same product as a transactional assistant, and the privacy math is different.
Per use case — what to use, what to avoid
Personal, low-stakes use (drafts, summaries, fun, brainstorming, casual queries): any tool's free or paid tier is fine. The data trade is real but the stakes are low.
Knowledge work under standard employment terms (no NDA, no client confidentiality, just professional work): mid-tier paid plans of ChatGPT, Claude, Perplexity, Cursor, or Copilot Pro. The opt-out is worth enabling. Avoid pasting full client lists, salary data, or internal financial figures regardless.
NDA work, client confidential, regulated industries (legal, healthcare, finance, defense, anything under contractual confidentiality): use Team-tier plans where no-training is the default, or use Microsoft 365 Copilot if your organization is already on M365 with appropriate data classifications. Do not use free or individual paid tiers for this category.
Code with proprietary intellectual property: Cursor Business with Privacy Mode, ChatGPT Team for Codex, or self-hosted/local model. Avoid Codex on free tier for proprietary code — the default data terms include training-use.
Email and calendar integration (Daily Brief, Spark, Microsoft 365 Copilot): evaluate carefully. The productivity value is real but the standing access is a significant ask. For Spark specifically, the data-governance documentation for "what is stored, where, and how long" was not published at launch; waiting for that documentation before subscribing is a reasonable position.
What this page does not tell you
This page covers the consumer-product side of each tool's data terms. It does not cover:
- API and developer-tier data terms (different defaults, different opt-outs)
- Enterprise contracts with custom data-residency clauses (negotiated per-deal)
- Regional regulatory overlays (EU GDPR, UK Data Protection Act, CCPA — each may grant additional rights beyond the vendor's default terms)
- Government / regulated sector contracts with separate compliance terms
If your situation involves any of these, the consumer-product page above is the starting point, not the answer.
Data currency note
Vendor terms change without announcement. The cells in this page reflect the published terms on the date stamped at the top. Every quarterly refresh re-confirms each cell against the live vendor documentation. If a tool's terms have shifted between refreshes, the freshest reference is the vendor's own privacy policy — linked from each tool's hub page (/tools/{slug}).
OneHuman does not have advance access to vendor term changes. We learn about changes the way consumers do — on the day they happen. The 15-day pricing-sweep cadence catches most material shifts; standing terms changes get reviewed at quarterly refresh.
Related reading
- Pricing comparison across the same eight tools → /reference/ai-tool-pricing-comparison-2026
- What is an AI consumer protection watchdog → /reference/what-is-an-ai-consumer-protection-watchdog
- Per-tool hub pages with latest verdicts → /tools/
Independent. AI-assisted. Human-verified. No ads. No affiliates. No investors.