Blog

    Updates, thinking, and technical deep-dives from the Lilac team.

    We're partnering with MiniMax to bring M2.7 to Lilac

    We are partnering with MiniMax to bring commercially licensed MiniMax M2.7 access to Lilac.

    How to keep frontier open weights viable

    Why Lilac supports open-weight licensing, and why commercial rights can help more frontier models stay open.

    Kimi K2.6 is live on Lilac

    Kimi K2.6 is now available on Lilac with OpenAI-compatible chat completions, 262K context, cache-read pricing, and no commitments.

    Cache read pricing is now live on Lilac

    Supported Lilac models now show lower cache read rates for repeated context, making long-context and agent workloads cheaper to run.

    Lilac is now self-serve — plus GLM 5.1 and Gemma 4 are live

    No more waitlist. Sign up, grab an API key, and start running inference. GLM 5.1 is live at $0.90/M input, and Gemma 4 is live at $0.11/M input.

    GLM 5.1 Inference Benchmark

    We benchmarked our GLM 5.1 endpoint against every GLM 5.1 provider listed on OpenRouter. Competitive throughput at the lowest per-token price in the comparison.

    How Idle GPUs Make Cheap Inference Possible

    Lilac serves Kimi K2.6 inference on idle enterprise GPUs with OpenAI-compatible, pay-per-token shared endpoints.

    GPU Inference API Pricing Compared

    A direct comparison of GPU inference API pricing across major providers. How idle GPU economics enable Lilac to offer lower per-token rates.

    The GPU Scarcity Paradox

    The GPU shortage isn't what you think. The industry doesn't have a supply problem — it has a utilization problem masquerading as one.

    Introducing Lilac: Turn Idle GPU Capacity into Revenue

    Most Kubernetes clusters run GPUs at 30-50% utilization. We built a single operator to change that.