Cache read pricing is now live on Lilac
By Lucas Ewing
TL;DR
Lilac now publishes cache read pricing for models that support it. When repeated input tokens are served from cache, those cached tokens are billed at a lower rate than standard input tokens.
Current cache read rates:
| Model | Input | Cache read | Output |
|---|---|---|---|
| Kimi K2.5 | $0.40 / M tokens | $0.10 / M tokens | $2.00 / M tokens |
| GLM 5.1 | $0.90 / M tokens | $0.27 / M tokens | $3.00 / M tokens |
You can see the updated pricing on the homepage, Kimi K2.5 API, and GLM 5.1 API pages.
Why cache pricing matters
Many production LLM workloads repeat a large amount of context across requests:
- Agent instructions and tool schemas
- Repository or documentation context
- Long system prompts
- RAG prefixes that stay stable across a session
- Multi-turn workflows where most context is unchanged
Without cache pricing, every repeated token is billed like a fresh input token. With cache read pricing, repeated context can be billed at a lower rate on supported models, reducing the cost of long-context workloads without changing the API shape.
What changed
Lilac now shows three token rates where a model supports prompt/cache reads:
- Input — fresh prompt tokens
- Cache read — repeated input tokens served from cache
- Output — generated completion tokens
This keeps pricing visible before you send traffic. If a model does not support cache reads, we do not show a cache price for it.
Supported models
Cache read pricing is live for:
- Kimi K2.5 — $0.10 / M cached input tokens
- GLM 5.1 — $0.27 / M cached input tokens
Gemma 4 remains priced with standard input and output token rates.
Same OpenAI-compatible API
You still call Lilac through the same OpenAI-compatible endpoint:
from openai import OpenAI
client = OpenAI(
base_url="https://api.getlilac.com/v1",
api_key="lilac_sk_...",
)
response = client.chat.completions.create(
model="moonshotai/kimi-k2.5",
messages=[
{"role": "system", "content": "You are a helpful coding assistant..."},
{"role": "user", "content": "Review this file for bugs."},
],
)
For teams running long-context coding agents, document QA, support bots, or repeated evaluation workloads, cache read pricing is a direct way to make repeated context cheaper.
Ready to try it? Sign up at console.getlilac.com or start from the cheap inference API page.