Lilac is now self-serve — plus GLM 5.1 is live and Gemma 4 is on the way
By Lucas Ewing
TL;DR
Lilac Cloud is now self-serve. Head to console.getlilac.com, create an account, and you'll have an API key in under a minute. No waitlist, no forms, no sales call required. Pay per token, no contracts, no minimums.
We're also expanding the model catalog:
- GLM 5.1 — live now at $0.90/M input, $3.00/M output, 0.58s TTFT
- Gemma 4 — coming soon at $0.13/M input, $0.38/M output
No more waitlist
Until now, getting on Lilac's shared inference endpoints meant filling out a form and waiting for us to provision an API key. That made sense when we were still validating the product with a handful of pilot customers. It does not make sense anymore.
Teams that want to try us should be able to try us. Swap your base URL, drop in a key, and see whether the output quality and price work for your workload. If they don't, you walked away having spent nothing. If they do, you're already running on Lilac.
What you get when you sign up at console.getlilac.com:
- An API key, immediately
- OpenAI-compatible endpoints for every model we host
- Pay-per-token billing, no minimums, no commitments
- Usage dashboards and per-model metering
If you want to talk to a founder before signing up — or you're an enterprise looking for custom pricing and volume discounts — you can still book a call. But you don't have to.
GLM 5.1 is live
GLM 5.1 is Zhipu AI's latest open frontier model. It's strong on coding, reasoning, and agentic workflows, and it's already available on Lilac's shared endpoints as of today.
| Model | Input | Output | Latency |
|---|---|---|---|
| GLM 5.1 | $0.90 / M tokens | $3.00 / M tokens | 0.58s TTFT |
Like everything else on Lilac, the API is OpenAI-compatible. Switching is a one-line change:
from openai import OpenAI
client = OpenAI(
base_url="https://api.getlilac.com/v1",
api_key="lilac_sk_...",
)
response = client.chat.completions.create(
model="z-ai/glm-5.1",
messages=[{"role": "user", "content": "Hello!"}],
)
Gemma 4 — coming soon
Gemma 4 is Google's latest open-weights model family. We're in the process of standing up capacity and expect to flip it live shortly.
| Model | Input | Output | Latency |
|---|---|---|---|
| Gemma 4 (coming soon) | $0.13 / M tokens | $0.38 / M tokens | — |
At $0.13/M input, Gemma 4 is by far the cheapest model we'll host. If you want to be notified the moment it goes live, email contact@getlilac.com with "Gemma 4" in the subject.
Why we can price this way
Lilac serves inference on idle enterprise GPUs — hardware that is already powered on, already paid for, and sitting underutilized inside Kubernetes clusters we've onboarded. Our operator finds that spare capacity, schedules inference onto it, and preempts the moment the cluster's own workloads need the GPUs back.
Because the fixed costs are already covered by the provider's primary jobs, the marginal cost of serving a token is much lower than renting dedicated capacity. We pass that through to you. For a deeper write-up, see our earlier post on how idle GPUs make cheap inference possible.
Ready to try it? Sign up at console.getlilac.com or browse our cheap inference API page for a walk-through. Questions? contact@getlilac.com.