
What It Actually Costs to Run AI in Production
We measured a real AI support conversation: 4,363 tokens, $0.002. Here's the full cost breakdown across 7 models.
We run AI chat on hej.chat. Real conversations, real customers, real invoices. So we measured what a typical support interaction actually costs.
One question ("Do you ship to Germany?"), one knowledge base search, one answer. Total: 4,363 tokens. Cost: $0.0024. That's a quarter of a penny.
A human support agent handling the same ticket costs roughly $2. The AI is 830x cheaper. But the model you pick matters, and the token math is not what most people assume. Here's the full breakdown.
What a real conversation looks like
Most cost estimates assume a roughly even split between input and output tokens. That's wrong. Here's what our production data actually shows:
98% input, 2% output. That's not a typo. Here's where those 4,270 input tokens come from:
- System prompt: ~500 tokens (personality, instructions, response guidelines)
- Tool definitions: ~200 tokens (the searchKnowledgeBase function schema)
- Knowledge base results: ~3,400 tokens (crawled page content returned by the tool)
- User message: ~10 tokens ("Do you ship to Germany")
The output? A clean 93-token answer: shipping costs, delivery times, customs info. Done.
This ratio has a huge practical consequence: input pricing matters way more than output pricing for chat applications. A model with cheap input tokens will beat a model with cheap output tokens every time.
The pricing landscape
Here are the seven models most relevant for production chat, sorted cheapest to most expensive:
| Model | Provider | Input / 1M tokens | Output / 1M tokens |
|---|---|---|---|
| Gemini 2.0 Flash-Lite | $0.075 | $0.30 | |
| GPT-4o mini | OpenAI | $0.15 | $0.60 |
| Gemini 2.5 Flash | $0.30 | $2.50 | |
| Gemini 3 Flash | $0.50 | $3.00 | |
| Claude Haiku 4.5 | Anthropic | $1.00 | $5.00 |
| Gemini 2.5 Pro | $1.25 | $10.00 | |
| GPT-4o | OpenAI | $2.50 | $10.00 |
Google's Gemini Flash lineup dominates the budget end. Flash-Lite 2.0 at 7.5 cents per million input tokens is essentially free. Even Gemini 3 Flash, which is significantly more capable, costs just $0.50 per million input tokens.
What 1,000 conversations cost
Using our real production data (4,270 input, 93 output per conversation):
| Model | Per conversation | Per 1K conversations | Per 10K conversations |
|---|---|---|---|
| Gemini 2.0 Flash-Lite | $0.00035 | $0.35 | $3.50 |
| GPT-4o mini | $0.0007 | $0.70 | $7 |
| Gemini 2.5 Flash | $0.0015 | $1.50 | $15 |
| Gemini 3 Flash | $0.0024 | $2.40 | $24 |
| Claude Haiku 4.5 | $0.0047 | $4.70 | $47 |
| Gemini 2.5 Pro | $0.0063 | $6.30 | $63 |
| GPT-4o | $0.012 | $12 | $120 |
We use Gemini 3 Flash at hej.chat. At $2.40 per thousand conversations, it gives us the newest model quality from Google at a price that's basically invisible on our bill. The jump from Flash 2.5 to Flash 3 is less than a dollar per thousand conversations, but the response quality difference is noticeable.
Here's another way to think about it. What does $100 of API credits actually get you?
Flash-Lite 2.0 gives you 285,000 conversations for $100. Even Gemini 3 Flash handles 41,000+ conversations for that price. GPT-4o, the most expensive model here, still delivers 8,300 conversations. These are not serious expenses for any business with a website.
Scaling to 500K conversations
At low volume, model choice barely matters. At scale, it compounds.
| Volume | Flash-Lite 2.0 | Gemini 3 Flash | Haiku 4.5 | GPT-4o |
|---|---|---|---|---|
| 10K/month | $3.50 | $24 | $47 | $120 |
| 50K/month | $17.50 | $120 | $235 | $600 |
| 100K/month | $35 | $240 | $470 | $1,200 |
| 500K/month | $175 | $1,200 | $2,350 | $6,000 |
At 100K conversations per month on Gemini 3 Flash, you're spending $240/month. For context, a single full-time support agent costs $3,000-5,000/month and handles maybe 2,000 conversations. The AI handling 100K conversations at $240 is doing the work of 50 agents at 0.15% of the cost.
Even at 500K monthly conversations (a high-traffic SaaS product), Flash-Lite keeps you under $200/month and Gemini 3 Flash costs $1,200. These are rounding errors for any company generating that kind of traffic.
A note on speed
With only 93 output tokens per response, model speed barely matters for this use case. Even the slowest model on our list generates 93 tokens in about 1.5 seconds. The fastest do it in under half a second. Both feel instant in a chat widget.
What does matter is time to first token (TTFT), the delay before the response starts streaming. Google's Flash models and GPT-4o mini lead here with sub-500ms TTFT. That means the user sees text appearing almost immediately after sending their message.
For a chat application, any model in the Flash or mini tier is fast enough that users won't notice a difference.
Which model should you pick
| Use case | Recommended | Why |
|---|---|---|
| High-volume FAQ bot | Gemini 2.0 Flash-Lite | Cheapest at any scale, handles simple Q&A well |
| General website support | Gemini 3 Flash | Best quality/cost balance, newest Google model |
| E-commerce product chat | Gemini 2.5 Flash or 3 Flash | Good with product details and tool use |
| Complex multi-turn support | Gemini 2.5 Pro or GPT-4o | Stronger reasoning for nuanced conversations |
| Enterprise / compliance-heavy | Claude Haiku 4.5 or Gemini 2.5 Pro | Anthropic's safety profile or Google's enterprise support |
For most businesses adding AI chat to their website, Gemini 3 Flash is the sweet spot. It's fast, cheap, capable, and handles tool use (like searching a knowledge base) reliably. That's why we use it at hej.chat.
If cost truly is a constraint, Flash-Lite 2.0 at $3.50 per 10K conversations is hard to argue with. If you need premium reasoning, Gemini 2.5 Pro or GPT-4o are there, and even they cost less than a single team lunch per thousand conversations.
The era where AI was expensive is over. A real support conversation costs a quarter of a penny. The question isn't whether you can afford to run AI. It's what's stopping you.