Domain 5 — Context Management & Reliability

Exam weight: 15%

This domain tests your ability to manage context windows effectively, design reliable escalation patterns, preserve information provenance across multi-agent handoffs, and build resilient production systems.

What this domain tests

Task Statement	Description
5.1	Apply context window management strategies for long documents
5.2	Design reliable escalation patterns that avoid self-reported confidence
5.3	Preserve information provenance across multi-agent handoffs
5.4	Implement graceful degradation and error resilience
5.5	Optimize cost with prompt caching

Attention dilution — the "lost in the middle" problem

Symptom: Agent misses details from the middle of long documents or contexts.

Root cause: Transformer models give less reliable attention to content in the middle of long contexts. This is a property of the architecture, not a context window size limitation.

Critical misconception the exam tests:

❌ Wrong:  "Use a model with a 200K context window to process the full document at once"
✅ Right:  "Split into focused per-section passes, then run a synthesis pass"

A larger context window does NOT fix attention dilution — it just moves the diluted zone. The fix is always focused passes:

# ❌ Wrong — stuffing 200 pages into one call
response = client.messages.create(
    messages=[{"role": "user", "content": entire_200_page_document}]
)

# ✅ Right — focused section passes
section_summaries = []
for section in split_into_sections(document):
    summary = client.messages.create(
        messages=[{"role": "user", "content": f"Analyze this section:\n\n{section}"}]
    )
    section_summaries.append(summary)

# Final integration pass
final_report = client.messages.create(
    messages=[{"role": "user", "content": f"Synthesize these section analyses:\n\n{section_summaries}"}]
)

Escalation patterns

Why self-reported confidence fails

LLMs are poorly calibrated — they express high confidence on questions they answer incorrectly. This means the cases that most need escalation are exactly the ones the model will most confidently say it can handle.

❌ Wrong escalation signal:
"I'm only 70% confident about this refund policy — escalating to human"

✅ Correct escalation signals (programmatic):
- Required field `policy_tier` not found in get_customer response
- Refund amount > $500 (policy threshold)
- Tool error count > 3 in this session
- Issue category in ["fraud", "legal", "executive"] (hardcoded escalation list)

Escalation architecture

def should_escalate(session_state: dict, extracted: dict) -> bool:
    # Programmatic rules — not Claude's self-assessment
    if session_state['tool_errors'] > 3:
        return True
    if extracted.get('refund_amount', 0) > 500:
        return True
    if not extracted.get('customer_verified', False):
        return True
    if extracted.get('issue_category') in ESCALATION_CATEGORIES:
        return True
    return False

Structured handoff for human escalation

When escalating to a human agent who lacks session access:

{
  "customer_id": "CUS-48291",
  "issue_summary": "Billing dispute — charged twice for March subscription",
  "root_cause": "Duplicate charge identified in order ORD-9912 and ORD-9913",
  "actions_taken": ["Verified customer identity", "Confirmed duplicate charge", "Applied $29.99 credit for ORD-9913"],
  "recommended_action": "Confirm credit applied and send confirmation email",
  "escalation_reason": "Customer requesting formal refund receipt — requires accounting team",
  "session_started": "2026-03-26T14:22:00Z"
}

The handoff must be self-contained — the human should not need to read the conversation to act.

Information provenance

In multi-agent pipelines, every claim in the final output must be traceable to a source.

Coordinator → subagent context passing (with provenance):

{
  "research_findings": [
    {
      "claim": "Global AI market projected to reach $1.8T by 2030",
      "source_id": "src_001",
      "source_url": "https://...",
      "source_title": "McKinsey AI Report 2026",
      "retrieved_at": "2026-03-26",
      "page": 14
    }
  ]
}

Synthesis schema (with citations):

{
  "sections": [
    {
      "title": "Market Size",
      "content": "...",
      "citation_ids": ["src_001", "src_003"]
    }
  ]
}

Prompt caching

Cache the KV state of repeated prompt prefixes to reduce cost:

response = client.messages.create(
    model="claude-sonnet-4-6",
    system=[
        {
            "type": "text",
            "text": large_system_prompt,  # 50K tokens shared across all requests
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[{"role": "user", "content": user_query}]
)

When caching helps most: large stable prefixes (system prompts, few-shot sets, large reference documents) reused across many requests.

Cache invalidation: any change — even a single character — breaks the prefix match and forces full re-processing. Version or date stamps in system prompts destroy cache hit rates.

Resilience patterns

Per-item error isolation

results = []
for doc in documents:
    try:
        result = extract(doc)
        results.append(result)
    except Exception as e:
        # Fail this document without affecting others
        results.append({
            "doc_id": doc['id'],
            "status": "failed",
            "error": str(e),
            "requires_review": True
        })
# Continue processing — one failure doesn't stop the batch

Rolling context summaries for long conversations

def compress_history(messages: list, threshold: int = 40) -> list:
    if len(messages) < threshold:
        return messages

    # Summarize early messages
    summary = summarize(messages[:-20])  # keep last 20 turns verbatim
    return [
        {"role": "user", "content": f"[Conversation summary]\n{summary}"},
        {"role": "assistant", "content": "Understood. Continuing from that context."},
        *messages[-20:]
    ]

What this domain tests​

Attention dilution — the "lost in the middle" problem​

Escalation patterns​

Why self-reported confidence fails​

Escalation architecture​

Structured handoff for human escalation​

Information provenance​

Prompt caching​

Resilience patterns​

Per-item error isolation​

Rolling context summaries for long conversations​

Official documentation​