Kimi K2.6 is live on Lilac
Kimi K2.6 is now available on Lilac with OpenAI-compatible chat completions, 262K context, cache-read pricing, and no commitments.
Updates, thinking, and technical deep-dives from the Lilac team.
Kimi K2.6 is now available on Lilac with OpenAI-compatible chat completions, 262K context, cache-read pricing, and no commitments.
Supported Lilac models now show lower cache read rates for repeated context, making long-context and agent workloads cheaper to run.
No more waitlist. Sign up, grab an API key, and start running inference. GLM 5.1 is live at $0.90/M input, and Gemma 4 is live at $0.11/M input.
We benchmarked our GLM 5.1 endpoint against every GLM 5.1 provider listed on OpenRouter. Competitive throughput at the lowest per-token price in the comparison.
Lilac serves Kimi K2.6 inference on idle enterprise GPUs with OpenAI-compatible, pay-per-token shared endpoints.
A direct comparison of GPU inference API pricing across major providers. How idle GPU economics enable Lilac to offer lower per-token rates.
The GPU shortage isn't what you think. The industry doesn't have a supply problem — it has a utilization problem masquerading as one.
Most Kubernetes clusters run GPUs at 30-50% utilization. We built a single operator to change that.