Open inference, flat rate

Flat-rate API access to the fastest open-source models. Built for developers and AI agents.

Get Wafer Pass For enterprise

Works with

Loved by

FeaturedSelf-serve for devs & agents

Wafer Pass

Pay first, onboard after. Weekly or annually, cancel anytime.

Save 20%WeeklyYearly

Read docs

Starter

Solo devs, daily agents

$10/wk$10/wk

$416/yr billed annually

1,000 requests per 5-hour window

Get Starter

Pro

Popular

Power users, continuous agents

$25/wk$25/wk

$1040/yr billed annually

5,000 requests per 5-hour window

Get Pro

No signup required — checkout first, your account is set up right after.

Backed by

Fifty Years

Y Combinator

Liquid 2

Jeff DeanChief Scientist at Google

Woj ZarembaCo-Founder at OpenAI

Dan FuHead of Kernels at Together

Charlie SonghurstMeta Board of Directors

Arash FerdowsiCo-Founder at Dropbox

Kawal GandhiOffice of the CTO at Google

NVIDIA Inception

Fifty Years

Y Combinator

Liquid 2

Jeff DeanChief Scientist at Google

Woj ZarembaCo-Founder at OpenAI

Dan FuHead of Kernels at Together

Charlie SonghurstMeta Board of Directors

Arash FerdowsiCo-Founder at Dropbox

Kawal GandhiOffice of the CTO at Google

NVIDIA Inception

Fifty Years

Y Combinator

Liquid 2

Jeff DeanChief Scientist at Google

Woj ZarembaCo-Founder at OpenAI

Dan FuHead of Kernels at Together

Charlie SonghurstMeta Board of Directors

Arash FerdowsiCo-Founder at Dropbox

Kawal GandhiOffice of the CTO at Google

NVIDIA Inception

Performance

AI that optimizes AI

Wafer agents autonomously profile, diagnose, and optimize inference across the entire stack. This means we can run the fastest AI on the planet on any AI hardware.

2.8xfaster than base SGLang

Output throughput · Qwen3.5-397B · Input/Output: 1600 / 7000

4003002001000

408.4

144.8

WaferBase SGLang

tok/s (higher is better)

Enterprise inference optimization

Custom agents that optimize kernels for your hardware — chip companies, cloud providers, and AI labs.

Book a demo