Back to blog

    Lilac is now self-serve — plus GLM 5.1 is live and Gemma 4 is on the way

    By Lucas Ewing


    TL;DR

    Lilac Cloud is now self-serve. Head to console.getlilac.com, create an account, and you'll have an API key in under a minute. No waitlist, no forms, no sales call required. Pay per token, no contracts, no minimums.

    We're also expanding the model catalog:

    • GLM 5.1 — live now at $0.90/M input, $3.00/M output, 0.58s TTFT
    • Gemma 4 — coming soon at $0.13/M input, $0.38/M output

    No more waitlist

    Until now, getting on Lilac's shared inference endpoints meant filling out a form and waiting for us to provision an API key. That made sense when we were still validating the product with a handful of pilot customers. It does not make sense anymore.

    Teams that want to try us should be able to try us. Swap your base URL, drop in a key, and see whether the output quality and price work for your workload. If they don't, you walked away having spent nothing. If they do, you're already running on Lilac.

    What you get when you sign up at console.getlilac.com:

    • An API key, immediately
    • OpenAI-compatible endpoints for every model we host
    • Pay-per-token billing, no minimums, no commitments
    • Usage dashboards and per-model metering

    If you want to talk to a founder before signing up — or you're an enterprise looking for custom pricing and volume discounts — you can still book a call. But you don't have to.

    GLM 5.1 is live

    GLM 5.1 is Zhipu AI's latest open frontier model. It's strong on coding, reasoning, and agentic workflows, and it's already available on Lilac's shared endpoints as of today.

    ModelInputOutputLatency
    GLM 5.1$0.90 / M tokens$3.00 / M tokens0.58s TTFT

    Like everything else on Lilac, the API is OpenAI-compatible. Switching is a one-line change:

    from openai import OpenAI
    
    client = OpenAI(
        base_url="https://api.getlilac.com/v1",
        api_key="lilac_sk_...",
    )
    
    response = client.chat.completions.create(
        model="z-ai/glm-5.1",
        messages=[{"role": "user", "content": "Hello!"}],
    )
    

    Gemma 4 — coming soon

    Gemma 4 is Google's latest open-weights model family. We're in the process of standing up capacity and expect to flip it live shortly.

    ModelInputOutputLatency
    Gemma 4 (coming soon)$0.13 / M tokens$0.38 / M tokens

    At $0.13/M input, Gemma 4 is by far the cheapest model we'll host. If you want to be notified the moment it goes live, email contact@getlilac.com with "Gemma 4" in the subject.

    Why we can price this way

    Lilac serves inference on idle enterprise GPUs — hardware that is already powered on, already paid for, and sitting underutilized inside Kubernetes clusters we've onboarded. Our operator finds that spare capacity, schedules inference onto it, and preempts the moment the cluster's own workloads need the GPUs back.

    Because the fixed costs are already covered by the provider's primary jobs, the marginal cost of serving a token is much lower than renting dedicated capacity. We pass that through to you. For a deeper write-up, see our earlier post on how idle GPUs make cheap inference possible.


    Ready to try it? Sign up at console.getlilac.com or browse our cheap inference API page for a walk-through. Questions? contact@getlilac.com.