FAQ
Frequently asked questions
Clear answers about wallet credit, usage, subscriptions, and how Tycoon charges for work.
How many tokens is an English word?
Roughly 1.3 tokens per word in English — or equivalently, ~0.75 words per token. So a 1000-token document is ~750 words, a 100K-token context is ~75K words (a 300-page book), and a 1M-token context is ~750K words (roughly the Lord of the Rings trilogy). Other languages have different ratios — Chinese and Japanese are often 1-2 characters per token, Korean 2-3, code often 2-3 tokens per line. Tokenizer differences (BPE vs. SentencePiece) also affect the ratio by 10-20%.
If the context window is 1M tokens, should I always use it?
No, for three reasons. (1) Cost — a 1M-token query costs 500x a 2K-token query. (2) Latency — processing 1M tokens takes 10-30 seconds versus sub-second for short queries. (3) Quality — the 'lost in the middle' problem means models don't attend equally to all parts of a long context. For production systems, use RAG to retrieve the 5-20K relevant tokens instead of stuffing everything in. Reserve long context for ad-hoc exploration, prototyping, and tasks where the entire document genuinely needs to be reasoned over as a whole (e.g., full codebase refactors).
What happens when I exceed the context window?
The API returns an error. There's no silent truncation by default — you have to explicitly manage context when approaching the limit. Common strategies: sliding window (drop oldest messages), summarization (replace old conversation with a summary), RAG (retrieve only relevant history), or hierarchical memory (keep recent messages verbatim, summarize older ones). All modern agent frameworks implement one or more of these; Tycoon uses a hybrid of RAG-based project memory plus recent conversation history verbatim.
Does a longer context window make the model smarter?
Not inherently — it lets the model see more at once but doesn't improve reasoning ability. A 1M-token Gemini 2.5 Pro reasoning over a small problem isn't smarter than a 200K-token Claude 4.5 Sonnet reasoning over the same problem. What long context enables is tasks that previously required RAG or manual chunking — reading whole documents, comparing across many files, maintaining very long conversations. For short-context tasks, the practical differences between models are about reasoning capability and speed, not window size.
What is prompt caching and how does it interact with context windows?
Prompt caching lets you designate a prefix (system prompt, large document, tool definitions) as cacheable — the provider keeps it in memory for a few minutes, and subsequent requests reusing that prefix pay 10-25% of the normal input cost for those tokens and skip most of the processing latency. This changes long-context economics dramatically: a 100K-token document that costs $0.30 fresh costs $0.03 cached, so you can afford to keep rich context always loaded. Anthropic, OpenAI, and Google all support it; Tycoon relies on it heavily to keep AI employees' system prompts and project memory always available without burning tokens on every turn.