+
+
+
+

Open inference, flat rate

Flat-rate API access to the fastest open-source models. Built for developers and AI agents.

Loved by
AMDAWSDigitalOcean
FeaturedSelf-serve for devs & agents

Wafer Pass

Pay first, onboard after. Weekly or annually, cancel anytime.

Save 20%WeeklyYearly
Read docs

Starter

Solo devs, daily agents

$10/wk$10/wk
$416/yr billed annually

1,000 requests per 5-hour window

Get Starter

Pro

Popular

Power users, continuous agents

$25/wk$25/wk
$1040/yr billed annually

5,000 requests per 5-hour window

Get Pro

No signup required — checkout first, your account is set up right after.

Backed by
Fifty Years
Fifty Years
Y Combinator
Y Combinator
Liquid 2
Liquid 2
Jeff Dean
Jeff DeanChief Scientist at Google
Woj Zaremba
Woj ZarembaCo-Founder at OpenAI
Dan Fu
Dan FuHead of Kernels at Together
Charlie Songhurst
Charlie SonghurstMeta Board of Directors
Arash Ferdowsi
Arash FerdowsiCo-Founder at Dropbox
Kawal Gandhi
Kawal GandhiOffice of the CTO at Google
NVIDIA Inception
NVIDIA Inception
Fifty Years
Fifty Years
Y Combinator
Y Combinator
Liquid 2
Liquid 2
Jeff Dean
Jeff DeanChief Scientist at Google
Woj Zaremba
Woj ZarembaCo-Founder at OpenAI
Dan Fu
Dan FuHead of Kernels at Together
Charlie Songhurst
Charlie SonghurstMeta Board of Directors
Arash Ferdowsi
Arash FerdowsiCo-Founder at Dropbox
Kawal Gandhi
Kawal GandhiOffice of the CTO at Google
NVIDIA Inception
NVIDIA Inception
Fifty Years
Fifty Years
Y Combinator
Y Combinator
Liquid 2
Liquid 2
Jeff Dean
Jeff DeanChief Scientist at Google
Woj Zaremba
Woj ZarembaCo-Founder at OpenAI
Dan Fu
Dan FuHead of Kernels at Together
Charlie Songhurst
Charlie SonghurstMeta Board of Directors
Arash Ferdowsi
Arash FerdowsiCo-Founder at Dropbox
Kawal Gandhi
Kawal GandhiOffice of the CTO at Google
NVIDIA Inception
NVIDIA Inception
Performance

AI that optimizes AI

Wafer agents autonomously profile, diagnose, and optimize inference across the entire stack. This means we can run the fastest AI on the planet on any AI hardware.

2.8xfaster than base SGLang
Output throughput · Qwen3.5-397B · Input/Output: 1600 / 7000
4003002001000
408.4
144.8
WaferBase SGLang
tok/s (higher is better)

Enterprise inference optimization

Custom agents that optimize kernels for your hardware — chip companies, cloud providers, and AI labs.

Book a demo