The basic unit AI models use to process text — roughly corresponding to word parts, common words, or character sequences.
Tokens are the chunks of text that AI models read and produce. They're not quite words — common words like 'the' are one token, but longer words get broken into pieces. Roughly, 1 token ≈ 4 characters or ¾ of a word in English. When you see AI pricing like '$10 per 1M tokens' or context limits like '128K tokens', it refers to these chunks.
Think of tokens like LEGO blocks. Common words like 'the' or 'apple' are single pre-made blocks. Longer or less common words get built from smaller pieces. The word 'tokenization' might be three tokens: 'token' + 'iz' + 'ation'. AI models work with these blocks, not whole words, which gives them flexibility to handle new or rare terms.
Tokens are produced by tokenizers — algorithms that split text into meaningful units for model processing. Most modern LLMs use subword tokenization like Byte-Pair Encoding (BPE) or SentencePiece, which balance vocabulary size with the ability to represent rare words. A tokenizer learns common character sequences during training; frequent patterns become single tokens while uncommon words are broken into multiple pieces. Different models have different tokenizers, so the same text can have different token counts across models. Non-English languages often require more tokens per character due to less training representation.
'Hello world!' = 3 tokens. A typical paragraph of 100 words ≈ 130 tokens.
'128K tokens' = roughly 300 pages of a book. GPT-4o has 128K, Claude has up to 1M.
'$2.50 per 1M input tokens' = you pay $2.50 for about 750,000 words of input. Most conversations cost pennies.
Languages like Chinese, Japanese, and Korean use more tokens per character, making them more expensive to process.
OpenAI provides a free tokenizer tool at platform.openai.com/tokenizer. Anthropic's Claude has similar tools. Programmatically, libraries like tiktoken (Python) calculate tokens for OpenAI models. For rough estimation: 1 token ≈ 4 English characters ≈ 0.75 words. A 2,000-word essay is approximately 2,700 tokens.
LLM APIs charge per token, usually with separate rates for input and output. Costs can add up with long prompts or conversations. For example, a conversation using 10K input tokens + 2K output tokens per message at $5/1M input, $15/1M output = $0.08/message. Multiply across users and it becomes significant.
Each model has its own tokenizer trained on different data. A tokenizer that saw more technical text will have shorter token counts for technical terms. This is why the same prompt might cost differently across providers — not just per-token pricing differs, but the token count itself can vary by 10-20% between models.
The maximum amount of text an AI model can consider at once — including your prompt, conversation history, and the response being generated.
🧩A way of representing text (or other data) as lists of numbers that capture meaning, enabling similarity search and semantic operations.
📚A neural network trained on massive text data to understand and generate human-like language.
✍️The skill of writing instructions to AI models to get the best possible output.
Our free AI course teaches you to use these ideas in real projects.
Start Free AI Course →