Back to blogApril 27, 2026

Cache read pricing is now live on Lilac

By Lucas Ewing

TL;DR

Lilac now publishes cache read pricing for models that support it. When repeated input tokens are served from cache, those cached tokens are billed at a lower rate than standard input tokens.

Current cache read rates:

Model	Input	Cache read	Output
MiniMax M2.7	$0.30 / M tokens	$0.055 / M tokens	$1.20 / M tokens
Kimi K2.6	$0.70 / M tokens	$0.20 / M tokens	$3.50 / M tokens
GLM 5.1	$0.90 / M tokens	$0.27 / M tokens	$3.00 / M tokens

You can see the updated pricing on the homepage, MiniMax M2.7 API, Kimi K2.6 API, and GLM 5.1 API pages.

Why cache pricing matters

Many production LLM workloads repeat a large amount of context across requests:

Agent instructions and tool schemas
Repository or documentation context
Long system prompts
RAG prefixes that stay stable across a session
Multi-turn workflows where most context is unchanged

Without cache pricing, every repeated token is billed like a fresh input token. With cache read pricing, repeated context can be billed at a lower rate on supported models, reducing the cost of long-context workloads without changing the API shape.

What changed

Lilac now shows three token rates where a model supports prompt/cache reads:

Input — fresh prompt tokens
Cache read — repeated input tokens served from cache
Output — generated completion tokens

This keeps pricing visible before you send traffic. If a model does not support cache reads, we do not show a cache price for it.

Supported models

Cache read pricing is live for:

MiniMax M2.7 — $0.055 / M cached input tokens
Kimi K2.6 — $0.20 / M cached input tokens
GLM 5.1 — $0.27 / M cached input tokens

Gemma 4 remains priced with standard input and output token rates.

Same OpenAI-compatible API

You still call Lilac through the same OpenAI-compatible endpoint:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.getlilac.com/v1",
    api_key="lilac_sk_...",
)

response = client.chat.completions.create(
    model="moonshotai/kimi-k2.6",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant..."},
        {"role": "user", "content": "Review this file for bugs."},
    ],
)

For teams running long-context coding agents, document QA, support bots, or repeated evaluation workloads, cache read pricing is a direct way to make repeated context cheaper.

Ready to try it? Sign up at console.getlilac.com or start from the cheap inference API page.