Claude Code Context Window: Why 200k Tokens Isn't the Problem


Auto Compact triggers at roughly 95% context utilization, not when you hit the ceiling. That single behavior explains why so many Claude Code sessions feel broken well before the 200k token limit becomes a factor. The window is not the bottleneck. The way tokens get spent in the first thirty minutes usually is.


The core argument here is architectural, not tactical: most context degradation in Claude Code sessions happens because of what gets fed in, not because of how large the model's window is. You can have 200,000 tokens of available space and still watch a session lose coherence after a single large file read, a noisy tool call chain, or a CLAUDE.md that tries to do too much at once. The degradation pattern is predictable once you see it. The fix is almost never asking for a bigger window.


Where the Tokens Actually Go


Where Tokens Actually Go in a Typical Claude Code Session

Token Budget Breakdown in a Typical Session 200,000 tokens available — where they actually go Redundant File Re-reads 40–60% of context is redundant Error Output / Traces 200–400 lines per stack trace + Tool Call Payloads 80 paths returned when 4 needed Actual Useful Content Minority of total tokens used Result: window fills before complex task is complete — a workflow problem, not a model limit

Source: Article analysis: Claude Code context allocation patterns



Run a mid-size Python project through a Claude Code session and watch what happens to context allocation in practice. The model reads a file, then re-reads it when a tool call returns an error, then re-reads a slice of it again when you ask a follow-up. By the time you have done three rounds of iteration on a single module, that file's content has been embedded in the context multiple times over, not because Claude is doing something wrong, but because that is how stateless tool calls work inside a session.


The other large consumer is error output. A stack trace from a failing pytest run can easily run 200-400 lines. If you are letting raw stderr dump into the context repeatedly across retries, you are spending tokens on noise that does not help the model and does not help you. The model does not need the full traceback on iteration three. It needed the first frame and the exception type on iteration one.


Tool call overhead compounds this. Each MCP tool invocation that returns a large payload, each bash call that echoes a full directory tree, each glob pattern that returns 80 file paths when you wanted 4 , all of it lands in context and stays there. There is no selective eviction. Once something is in the window, it counts until Auto Compact runs or you start a new session.


The practical result is that a 200k token window fills with content that is 40-60% redundant long before you have actually worked through a complex task. That is not a model limitation. That is a workflow problem.


What Auto Compact Does to Session Precision


Auto Compact: What It Preserves vs. What It Loses

Auto Compact Triggers at ~95% Utilization What the model-generated summary keeps vs. drops Compaction fires at 95% full not at the 200k ceiling PRESERVED PRESERVED High-level intent General task context Architecture overview e.g. "building FastAPI auth" LOST Exact function signatures Pinned library versions Rejected approaches Specific variable names After compaction: Claude may re-suggest approaches you already rejected

Source: Article: Claude Code Auto Compact behavior at ~95% context utilization



Auto Compact, when enabled, summarizes the conversation history once utilization approaches the limit. The summary is generated by the model itself, which means it inherits the same strengths and weaknesses the model has when summarizing anything. Specific code details compress poorly. High-level intent compresses well. After a compaction event, Claude will remember that you were building a FastAPI authentication layer, but it may lose the specific signature of the middleware function you settled on two exchanges ago.


This is where sessions feel like they degrade. The model is not broken. It is working from a summary instead of the raw exchange, and summaries drop precision. If your session was already dense with implementation specifics, exact variable names, the particular version of a library you pinned because of a known bug, the three approaches you explicitly rejected, compaction is going to sand those edges off.


There is a version of this that is completely fine. If you are doing exploratory work, debugging a general pattern, or reviewing architecture, compaction usually does not hurt you. The problem shows up in sessions where you are deep in a specific implementation and precision matters. After compaction, you will often see Claude suggest an approach you already tried and discarded. That is the summary dropping a rejection decision.


# This kind of command is a context killer in a tight session
find . -name ".py" | xargs grep -r "import" --include=".py"

Scoped version burns a fraction of the tokens for the same useful result


Key Facts: The Real Bottleneck in Claude Code Sessions

Key Facts: Context Problems Start Within 30 Minutes 200k token window Window size is NOT the bottleneck Workflow determines when coherence breaks down 95% triggers compact Auto Compact fires before the limit Summaries drop precision, not just volume 30 min typical onset Degradation is a workflow problem Fix: scope inputs, not the window size Source: Claude Code Context Window analysis

Source: Article: Claude Code Context Window analysis


grep -r "from auth" src/api/ --include="*.py" -l

The difference between those two bash calls is not just performance. It is whether you are dumping 800 lines of import statements into context or getting back 6 filenames. In a long session, that gap is the difference between hitting Auto Compact at hour one versus hour three.


The CLAUDE.md Token Tax


CLAUDE.md is loaded at session start and sits in context for the entire session. A verbose CLAUDE.md file is not just bad style , it is a fixed token tax on every exchange. A 400-line CLAUDE.md with project history, coding conventions, team preferences, dependency notes, and a glossary section will cost somewhere in the range of 2,000-4,000 tokens depending on how it is written, and that cost is non-negotiable. It does not compress. It does not get evicted. It stays.


The pattern that actually works is treating CLAUDE.md as a routing document rather than a knowledge base. Keep it short enough to read in ninety seconds. Use it to point Claude toward the files and directories where the actual detail lives, rather than duplicating that detail in the CLAUDE.md itself. A line that says authentication logic lives in src/auth/ with session handling decisions documented in middleware.py is more token-efficient and more durable than pasting the relevant code directly into the configuration file.


# Project: Meridian API

Architecture


  • FastAPI 0.115.x, Python 3.12, PostgreSQL 16

  • Auth: JWT via src/auth/middleware.py (see inline comments for session decisions)

  • Background jobs: Celery 5.x, config in celery_config.py

Active Constraints


  • Do not modify alembic migrations directly; generate via make migration

  • Pin httpx at 0.27.x until upstream resolves redirect handling issue

What to Read First


  • src/api/routes/ for endpoint structure

  • tests/integration/ for expected behavior contracts

That CLAUDE.md is under 120 words. It costs roughly 300 tokens. It tells Claude where to look without reproducing what it would find there. Compare that to a CLAUDE.md that pastes in three architecture diagrams described in prose, a full list of all environment variables, and a changelog going back six months. The latter is not more helpful. It is more expensive, and it crowds out context that would be better used for the actual work.


There is a related failure mode in team environments. CLAUDE.md files grow by accumulation. Someone adds a paragraph in January, someone adds a note about a new linting rule in March, and by July the file is 600 lines with nobody certain which sections are still accurate. The tokens do not care whether the content is current. They get spent regardless.


Session Scope as the Real Control Variable


The most durable fix is not configuration tuning. It is treating sessions as scoped units of work rather than persistent workspaces. A session that starts with a clear, narrow goal burns context more slowly than one that meanders through three different problem areas. That sounds obvious until you are actually working, and the thing you are debugging naturally bleeds into a related issue, which leads you to open a second set of files, which surfaces a third question, and suddenly your context is split across four different problem domains with none of them well-served.


The practical version of this is starting a new session when the problem changes, not when the old one breaks. It costs almost nothing to start fresh and re-establish context for a new task. It costs considerably more to spend the back half of a long session working with a model that is summarizing heavily and losing the specific implementation decisions that made the first half of the session worth running.


# Scoping your initial context load for a session

Instead of asking Claude to read the whole repo, target specific files


In CLAUDE.md or as an explicit session opener:


"Read src/auth/middleware.py and tests/test_auth.py before we start.


Do not read anything else unless I ask."


Equivalent manual approach: reference only what the current task needs

relevant_files = [ "src/auth/middleware.py", "src/auth/models.py", "tests/integration/test_auth_flow.py", ]

Not: every file under src/ because they might be tangentially relevant


The instinct to pre-load everything so Claude has full context is understandable. It is also usually wrong. A model working from three highly relevant files will outperform a model working from thirty loosely relevant ones. The noise-to-signal ratio in the context window matters more than raw coverage, and that ratio degrades fast once you start loading files speculatively.


One pattern worth watching: if you are three exchanges into a session and you find yourself prepending corrections along the lines of this is still about the same issue we were just discussing, the session's internal coherence is already slipping. That is a signal, not a minor annoyance. Starting fresh at that point with a cleaner scoped prompt tends to recover more ground than pushing through with corrections.


The 200k token ceiling is genuinely large, large enough that most working sessions never need to care about the number itself. What they do need to care about is the shape of what fills that space, because a context window stuffed with redundant tool output, a heavy CLAUDE.md, and three rounds of full file re-reads will degrade just as reliably at 80k tokens as one that is actually pressed against the limit. The ceiling is not where the problem lives.