Serverless inference API

    Serverless inference, no cold starts.

    Shared endpoints backed by idle enterprise GPUs. Warm capacity and token pricing — no dedicated GPUs to manage.

    Get Started

    OpenAI-compatible serverless API. No contracts, token-based pricing.

    Model pricing

    Pay per token. No commitments.

    Shared warm endpoints across our open-weight model catalog — no container spin-up, no cold starts.

    Model
    Context
    Input
    Cache
    Output
    MiniMax M2.7

    Live now

    FP8
    200K
    $0.30/M
    $0.055/M
    $1.20/M
    Kimi K2.6

    Live now

    INT4
    262K
    $0.70/M
    $0.20/M
    $3.50/M
    GLM 5.1

    Live now

    FP8
    203K
    $0.90/M
    $0.27/M
    $3.00/M
    Gemma 4 (31B)

    Live now

    BF16
    262K
    $0.11/M
    -
    $0.35/M
    OpenAI-compatibleShared warm endpointsNo contractsNo minimums

    More models are coming soon and will be added as they go live.

    Integration

    One base URL change.

    Keep the OpenAI SDK and point it at Lilac. Your existing code just works.

    inference.py

    from openai import OpenAI

    client = OpenAI(

    base_url="https://api.openai.com/v1",

    api_key="sk_...",

    )

    response = client.chat.completions.create(

    model="openai/gpt-5.4",

    messages=[{"role": "user", "content": "Hello!"}],

    )

    # Same code. Same SDK. Fraction of the price.

    01

    Nothing to provision, autoscale, or keep warm.

    02

    Endpoints are already running — no container spin-up.

    03

    Switch from OpenAI by updating one base URL.

    Frequently asked questions

    Is this really serverless?

    Yes. No infrastructure to provision or maintain — just an API call.

    How do you avoid cold starts?

    Traffic routes to already-running shared capacity, not freshly spun containers.

    Start running inference in minutes.

    No contracts, no commitments. Swap your base URL and pay less for the same output quality.

    Get Started

    No commitment required.