+
+
+
+
Open inference, flat rate
Flat-rate API access to the fastest open-source models. Built for developers and AI agents.
Loved by


FeaturedSelf-serve for devs & agents
Wafer Pass
Pay first, onboard after. Weekly or annually, cancel anytime.
Save 20%WeeklyYearly
Read docsStarter
Solo devs, daily agents
$10/wk$10/wk
$416/yr billed annually1,000 requests per 5-hour window
Get StarterPro
PopularPower users, continuous agents
$25/wk$25/wk
$1040/yr billed annually5,000 requests per 5-hour window
Get ProNo signup required — checkout first, your account is set up right after.
Backed by
Fifty Years
Y Combinator
Liquid 2
Jeff DeanChief Scientist at Google
Woj ZarembaCo-Founder at OpenAI

Dan FuHead of Kernels at Together
Charlie SonghurstMeta Board of Directors
Arash FerdowsiCo-Founder at Dropbox

Kawal GandhiOffice of the CTO at Google
NVIDIA Inception
Fifty Years
Y Combinator
Liquid 2
Jeff DeanChief Scientist at Google
Woj ZarembaCo-Founder at OpenAI

Dan FuHead of Kernels at Together
Charlie SonghurstMeta Board of Directors
Arash FerdowsiCo-Founder at Dropbox

Kawal GandhiOffice of the CTO at Google
NVIDIA Inception
Fifty Years
Y Combinator
Liquid 2
Jeff DeanChief Scientist at Google
Woj ZarembaCo-Founder at OpenAI

Dan FuHead of Kernels at Together
Charlie SonghurstMeta Board of Directors
Arash FerdowsiCo-Founder at Dropbox

Kawal GandhiOffice of the CTO at Google
NVIDIA Inception
Performance
AI that optimizes AI
Wafer agents autonomously profile, diagnose, and optimize inference across the entire stack. This means we can run the fastest AI on the planet on any AI hardware.
2.8xfaster than base SGLang
Output throughput · Qwen3.5-397B · Input/Output: 1600 / 7000
4003002001000
408.4
144.8
WaferBase SGLang
tok/s (higher is better)
Enterprise inference optimization
Custom agents that optimize kernels for your hardware — chip companies, cloud providers, and AI labs.