We're partnering with MiniMax to bring M2.7 to Lilac
By Lucas Ewing
TL;DR
We are partnering with MiniMax to bring MiniMax M2.7 to Lilac.
The goal is simple: make it easy for teams to try and deploy M2.7 commercially without managing a large model deployment themselves.
| Model | Input | Cache read | Output | Throughput | Accuracy |
|---|---|---|---|---|---|
| MiniMax M2.7 | $0.30 / M tokens | $0.055 / M tokens | $1.20 / M tokens | Sustained 60 tok/s/user at 160 concurrency | 99.8% verifier schema accuracy |
In the AIPerf stress run, Lilac reached a sustained 60 tok/s/user at 160-way concurrency with 100% request success at every tested concurrency level. In the MiniMax Provider Verifier run, Lilac returned 1,020/1,020 successful requests with 100% query success, 98.80% tool-call match rate, 99.88% tool-call trigger similarity, 99.80% tool-call schema accuracy, 0% error-only reasoning rate, and 100% language following.
Why MiniMax M2.7
MiniMax M2.7 is an open-weight model built for professional software engineering, long-horizon work, and tool-heavy agents. The official model card highlights coding, agent teams, complex tool use, and professional work benchmarks as core strengths.
Those are exactly the workloads Lilac customers care about. Coding agents and internal automation systems often need a mix of long context, reliable tool calling, high throughput, and cost discipline. M2.7 is a strong fit for that shape of work.
Why we are excited about the partnership
There are two hard parts to using a model like M2.7 in production.
The first is operational. Serving a large model well means choosing the right hardware, keeping replicas warm, tuning inference settings, measuring real latency, and handling bursty traffic without forcing every team to rent dedicated capacity.
The second is commercial. MiniMax M2.7's public weights are available for broad non-commercial use, while commercial serving requires authorization from MiniMax. That is reasonable for a model company investing heavily in the weights, training recipe, evaluations, and release process. A direct partnership gives customers a clear path to use the model commercially through Lilac.
Lilac sits at the intersection of those two problems. We handle the hosted endpoint, route traffic onto GPU capacity, and give developers the same Lilac API surface they already use.
Benchmark results
We measured the launch endpoint with streaming chat requests using approximately 60K input tokens and 500-600 output tokens per request.
| Metric | Result | Notes |
|---|---|---|
| Aggregate output throughput | 15,544.8 output tok/s | 160 concurrent streaming requests across launch capacity |
| Per-user throughput | Sustained 60 tok/s/user | 160 concurrent streaming requests |
| TTFT | 1.2s P50 / 3.3s P90 | 160 concurrent streaming requests |
| Request success | 100.0% | All tested concurrency stages from 2 to 160 |
| Verifier requests | 1,020/1,020 successful | MiniMax Provider Verifier, 102 cases x 10 rounds |
| Verifier tool-call match | 98.80% | Matched the MiniMax M2.7 reference line |
| Verifier schema accuracy | 99.80% | Tool-call argument schema validation |
Provider Verifier comparison
MiniMax publishes reference results for M2.7 in the MiniMax Provider Verifier. Their May 2026 MiniMax-M2.7 reference line is computed across 10 runs; Lilac's endpoint was tested with the same 10-round shape, with 102 cases per round.
| Metric | Lilac endpoint | Official baseline | Difference |
|---|---|---|---|
| Query-Success-Rate | 100.00% | 100.00% | 0 |
| ToolCalls-Match-Rate | 98.80% | 98.80% | 0 |
| ToolCalls-Trigger-Similarity | 99.88% | — | — |
| ToolCalls-Schema-Accuracy | 99.80% | 99.76% | +0.04% |
| Error-Only-Reasoning-Rate | 0.00% | 0.00% | 0 |
| Language-Following | 100.00% | 75.00% | +25% |
API example
from openai import OpenAI
client = OpenAI(
base_url="https://api.getlilac.com/v1",
api_key="lilac_sk_...",
)
response = client.chat.completions.create(
model="minimaxai/minimax-m2.7",
messages=[
{"role": "user", "content": "Review this pull request for production risks."},
],
)
Availability
MiniMax M2.7 is available on Lilac. The website pricing table and MiniMax M2.7 API page show the public pricing.