Back to blogMay 7, 2026

We're partnering with MiniMax to bring M2.7 to Lilac

By Lucas Ewing

TL;DR

We are partnering with MiniMax to bring MiniMax M2.7 to Lilac.

The goal is simple: make it easy for teams to try and deploy M2.7 commercially without managing a large model deployment themselves.

Model	Input	Cache read	Output	Throughput	Accuracy
MiniMax M2.7	$0.30 / M tokens	$0.055 / M tokens	$1.20 / M tokens	Sustained 60 tok/s/user at 160 concurrency	99.8% verifier schema accuracy

In the AIPerf stress run, Lilac reached a sustained 60 tok/s/user at 160-way concurrency with 100% request success at every tested concurrency level. In the MiniMax Provider Verifier run, Lilac returned 1,020/1,020 successful requests with 100% query success, 98.80% tool-call match rate, 99.88% tool-call trigger similarity, 99.80% tool-call schema accuracy, 0% error-only reasoning rate, and 100% language following.

Why MiniMax M2.7

MiniMax M2.7 is an open-weight model built for professional software engineering, long-horizon work, and tool-heavy agents. The official model card highlights coding, agent teams, complex tool use, and professional work benchmarks as core strengths.

Those are exactly the workloads Lilac customers care about. Coding agents and internal automation systems often need a mix of long context, reliable tool calling, high throughput, and cost discipline. M2.7 is a strong fit for that shape of work.

Why we are excited about the partnership

There are two hard parts to using a model like M2.7 in production.

The first is operational. Serving a large model well means choosing the right hardware, keeping replicas warm, tuning inference settings, measuring real latency, and handling bursty traffic without forcing every team to rent dedicated capacity.

The second is commercial. MiniMax M2.7's public weights are available for broad non-commercial use, while commercial serving requires authorization from MiniMax. That is reasonable for a model company investing heavily in the weights, training recipe, evaluations, and release process. A direct partnership gives customers a clear path to use the model commercially through Lilac.

Lilac sits at the intersection of those two problems. We handle the hosted endpoint, route traffic onto GPU capacity, and give developers the same Lilac API surface they already use.

Benchmark results

We measured the launch endpoint with streaming chat requests using approximately 60K input tokens and 500-600 output tokens per request.

Metric	Result	Notes
Aggregate output throughput	15,544.8 output tok/s	160 concurrent streaming requests across launch capacity
Per-user throughput	Sustained 60 tok/s/user	160 concurrent streaming requests
TTFT	1.2s P50 / 3.3s P90	160 concurrent streaming requests
Request success	100.0%	All tested concurrency stages from 2 to 160
Verifier requests	1,020/1,020 successful	MiniMax Provider Verifier, 102 cases x 10 rounds
Verifier tool-call match	98.80%	Matched the MiniMax M2.7 reference line
Verifier schema accuracy	99.80%	Tool-call argument schema validation

Provider Verifier comparison

MiniMax publishes reference results for M2.7 in the MiniMax Provider Verifier. Their May 2026 MiniMax-M2.7 reference line is computed across 10 runs; Lilac's endpoint was tested with the same 10-round shape, with 102 cases per round.

Metric	Lilac endpoint	Official baseline	Difference
Query-Success-Rate	100.00%	100.00%	0
ToolCalls-Match-Rate	98.80%	98.80%	0
ToolCalls-Trigger-Similarity	99.88%	—	—
ToolCalls-Schema-Accuracy	99.80%	99.76%	+0.04%
Error-Only-Reasoning-Rate	0.00%	0.00%	0
Language-Following	100.00%	75.00%	+25%

API example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.getlilac.com/v1",
    api_key="lilac_sk_...",
)

response = client.chat.completions.create(
    model="minimaxai/minimax-m2.7",
    messages=[
        {"role": "user", "content": "Review this pull request for production risks."},
    ],
)

Availability

MiniMax M2.7 is available on Lilac. The website pricing table and MiniMax M2.7 API page show the public pricing.