Back to blog

    We're partnering with MiniMax to bring M2.7 to Lilac

    By Lucas Ewing


    TL;DR

    We are partnering with MiniMax to bring MiniMax M2.7 to Lilac.

    The goal is simple: make it easy for teams to try and deploy M2.7 commercially without managing a large model deployment themselves.

    ModelInputCache readOutputThroughputAccuracy
    MiniMax M2.7$0.30 / M tokens$0.055 / M tokens$1.20 / M tokensSustained 60 tok/s/user at 160 concurrency99.8% verifier schema accuracy

    In the AIPerf stress run, Lilac reached a sustained 60 tok/s/user at 160-way concurrency with 100% request success at every tested concurrency level. In the MiniMax Provider Verifier run, Lilac returned 1,020/1,020 successful requests with 100% query success, 98.80% tool-call match rate, 99.88% tool-call trigger similarity, 99.80% tool-call schema accuracy, 0% error-only reasoning rate, and 100% language following.


    Why MiniMax M2.7

    MiniMax M2.7 is an open-weight model built for professional software engineering, long-horizon work, and tool-heavy agents. The official model card highlights coding, agent teams, complex tool use, and professional work benchmarks as core strengths.

    Those are exactly the workloads Lilac customers care about. Coding agents and internal automation systems often need a mix of long context, reliable tool calling, high throughput, and cost discipline. M2.7 is a strong fit for that shape of work.

    Why we are excited about the partnership

    There are two hard parts to using a model like M2.7 in production.

    The first is operational. Serving a large model well means choosing the right hardware, keeping replicas warm, tuning inference settings, measuring real latency, and handling bursty traffic without forcing every team to rent dedicated capacity.

    The second is commercial. MiniMax M2.7's public weights are available for broad non-commercial use, while commercial serving requires authorization from MiniMax. That is reasonable for a model company investing heavily in the weights, training recipe, evaluations, and release process. A direct partnership gives customers a clear path to use the model commercially through Lilac.

    Lilac sits at the intersection of those two problems. We handle the hosted endpoint, route traffic onto GPU capacity, and give developers the same Lilac API surface they already use.

    Benchmark results

    We measured the launch endpoint with streaming chat requests using approximately 60K input tokens and 500-600 output tokens per request.

    MetricResultNotes
    Aggregate output throughput15,544.8 output tok/s160 concurrent streaming requests across launch capacity
    Per-user throughputSustained 60 tok/s/user160 concurrent streaming requests
    TTFT1.2s P50 / 3.3s P90160 concurrent streaming requests
    Request success100.0%All tested concurrency stages from 2 to 160
    Verifier requests1,020/1,020 successfulMiniMax Provider Verifier, 102 cases x 10 rounds
    Verifier tool-call match98.80%Matched the MiniMax M2.7 reference line
    Verifier schema accuracy99.80%Tool-call argument schema validation

    Provider Verifier comparison

    MiniMax publishes reference results for M2.7 in the MiniMax Provider Verifier. Their May 2026 MiniMax-M2.7 reference line is computed across 10 runs; Lilac's endpoint was tested with the same 10-round shape, with 102 cases per round.

    MetricLilac endpointOfficial baselineDifference
    Query-Success-Rate100.00%100.00%0
    ToolCalls-Match-Rate98.80%98.80%0
    ToolCalls-Trigger-Similarity99.88%
    ToolCalls-Schema-Accuracy99.80%99.76%+0.04%
    Error-Only-Reasoning-Rate0.00%0.00%0
    Language-Following100.00%75.00%+25%

    API example

    from openai import OpenAI
    
    client = OpenAI(
        base_url="https://api.getlilac.com/v1",
        api_key="lilac_sk_...",
    )
    
    response = client.chat.completions.create(
        model="minimaxai/minimax-m2.7",
        messages=[
            {"role": "user", "content": "Review this pull request for production risks."},
        ],
    )
    

    Availability

    MiniMax M2.7 is available on Lilac. The website pricing table and MiniMax M2.7 API page show the public pricing.