Cheap inference API

    Cheap inference API.

    Visible token pricing, no contracts, no minimums. OpenAI-compatible.

    Get Started

    Open-weight models on shared warm endpoints, priced per token. See live pricing below.

    Model pricing

    Pay per token. No commitments.

    Competitive speed vs. OpenRouter-listed providers, at the lowest price in the benchmark snapshot.

    Model
    Context
    Input
    Cache
    Output
    MiniMax M2.7

    Live now

    FP8
    200K
    $0.30/M
    $0.055/M
    $1.20/M
    Kimi K2.6

    Live now

    INT4
    262K
    $0.70/M
    $0.20/M
    $3.50/M
    GLM 5.1

    Live now

    FP8
    203K
    $0.90/M
    $0.27/M
    $3.00/M
    Gemma 4 (31B)

    Live now

    BF16
    262K
    $0.11/M
    -
    $0.35/M
    OpenAI-compatibleShared warm endpointsNo contractsNo minimums

    More models are coming soon and will be added as they go live.

    Integration

    One base URL change.

    Keep the OpenAI SDK and point it at Lilac. Your existing code just works.

    inference.py

    from openai import OpenAI

    client = OpenAI(

    base_url="https://api.openai.com/v1",

    api_key="sk_...",

    )

    response = client.chat.completions.create(

    model="openai/gpt-5.4",

    messages=[{"role": "user", "content": "Hello!"}],

    )

    # Same code. Same SDK. Fraction of the price.

    01

    OpenAI-compatible — switching is a base URL change.

    02

    Shared endpoints stay warm. No cold starts.

    03

    No contracts or minimums. Start immediately.

    Frequently asked questions

    What makes Lilac cheap?

    We route inference to idle enterprise GPUs — hardware already powered on and paid for.

    Does cheap mean slower?

    No. We benchmark competitively with OpenRouter-listed providers at the same price point or lower.

    Start running inference in minutes.

    No contracts, no commitments. Swap your base URL and pay less for the same output quality.

    Get Started

    No commitment required.