Back to blog

    Cache read pricing is now live on Lilac

    By Lucas Ewing


    TL;DR

    Lilac now publishes cache read pricing for models that support it. When repeated input tokens are served from cache, those cached tokens are billed at a lower rate than standard input tokens.

    Current cache read rates:

    ModelInputCache readOutput
    Kimi K2.5$0.40 / M tokens$0.10 / M tokens$2.00 / M tokens
    GLM 5.1$0.90 / M tokens$0.27 / M tokens$3.00 / M tokens

    You can see the updated pricing on the homepage, Kimi K2.5 API, and GLM 5.1 API pages.


    Why cache pricing matters

    Many production LLM workloads repeat a large amount of context across requests:

    • Agent instructions and tool schemas
    • Repository or documentation context
    • Long system prompts
    • RAG prefixes that stay stable across a session
    • Multi-turn workflows where most context is unchanged

    Without cache pricing, every repeated token is billed like a fresh input token. With cache read pricing, repeated context can be billed at a lower rate on supported models, reducing the cost of long-context workloads without changing the API shape.

    What changed

    Lilac now shows three token rates where a model supports prompt/cache reads:

    • Input — fresh prompt tokens
    • Cache read — repeated input tokens served from cache
    • Output — generated completion tokens

    This keeps pricing visible before you send traffic. If a model does not support cache reads, we do not show a cache price for it.

    Supported models

    Cache read pricing is live for:

    • Kimi K2.5 — $0.10 / M cached input tokens
    • GLM 5.1 — $0.27 / M cached input tokens

    Gemma 4 remains priced with standard input and output token rates.

    Same OpenAI-compatible API

    You still call Lilac through the same OpenAI-compatible endpoint:

    from openai import OpenAI
    
    client = OpenAI(
        base_url="https://api.getlilac.com/v1",
        api_key="lilac_sk_...",
    )
    
    response = client.chat.completions.create(
        model="moonshotai/kimi-k2.5",
        messages=[
            {"role": "system", "content": "You are a helpful coding assistant..."},
            {"role": "user", "content": "Review this file for bugs."},
        ],
    )
    

    For teams running long-context coding agents, document QA, support bots, or repeated evaluation workloads, cache read pricing is a direct way to make repeated context cheaper.

    Ready to try it? Sign up at console.getlilac.com or start from the cheap inference API page.