April 8, 20262 min read

Lilac is now self-serve — plus GLM 5.1 and Gemma 4 are live

By Lucas Ewing

TL;DR

Lilac Cloud is now self-serve. Head to console.getlilac.com, create an account, and you'll have an API key in under a minute. No waitlist, no forms, no sales call required. Pay per token, no contracts, no minimums.

We're also expanding the model catalog:

GLM 5.1 — live now at $0.90/M input, $3.00/M output, 0.58s TTFT
Gemma 4 — live now at $0.11/M input, $0.35/M output, 0.72s TTFT

No more waitlist

Until now, getting on Lilac's shared inference endpoints meant filling out a form and waiting for us to provision an API key. That made sense when we were still validating the product with a handful of pilot customers. It does not make sense anymore.

Teams that want to try us should be able to try us. Swap your base URL, drop in a key, and see whether the output quality and price work for your workload. If they don't, you walked away having spent nothing. If they do, you're already running on Lilac.

What you get when you sign up at console.getlilac.com:

An API key, immediately
OpenAI-compatible endpoints for every model we host
Pay-per-token billing, no minimums, no commitments
Usage dashboards and per-model metering

If you want to talk to a founder before signing up — or you're an enterprise looking for custom pricing and volume discounts — you can still book a call. But you don't have to.

GLM 5.1 is live

GLM 5.1 is Zhipu AI's latest open frontier model. It's strong on coding, reasoning, and agentic workflows, and it's already available on Lilac's shared endpoints as of today.

Model	Input	Output	Latency
GLM 5.1	$0.90 / M tokens	$3.00 / M tokens	0.58s TTFT

Like everything else on Lilac, the API is OpenAI-compatible. Switching is a one-line change:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.getlilac.com/v1",
    api_key="lilac_sk_...",
)

response = client.chat.completions.create(
    model="zai-org/glm-5.1",
    messages=[{"role": "user", "content": "Hello!"}],
)

Gemma 4 — live now

Gemma 4 is Google's latest open-weights model family. It's now live on Lilac's shared endpoints.

Model	Input	Output	Latency
Gemma 4	$0.11 / M tokens	$0.35 / M tokens	0.72s TTFT

At $0.11/M input, Gemma 4 is the cheapest model we host. Sign up at console.getlilac.com to start using it immediately.

Why we can price this way

Lilac serves inference on idle enterprise GPUs — hardware that is already powered on, already paid for, and sitting underutilized inside Kubernetes clusters we've onboarded. Our operator finds that spare capacity, schedules inference onto it, and preempts the moment the cluster's own workloads need the GPUs back.

Because the fixed costs are already covered by the provider's primary jobs, the marginal cost of serving a token is much lower than renting dedicated capacity. We pass that through to you. For a deeper write-up, see our earlier post on how idle GPUs make cheap inference possible.

Ready to try it? Sign up at console.getlilac.com or browse the inference section for a walk-through. Questions? contact@getlilac.com.