Product · Inference gateway

BTL Runtime

One API in front of every model provider. Lower effective AI spend and lower latency — without rewriting your app.

For teams shipping across OpenAI, Anthropic, Bedrock, Vertex, OpenRouter, and the long tail. BTL Runtime is the drop-in gateway that keeps working when provider economics change underneath you.

change one line
from openai import OpenAI

client = OpenAI(
    base_url="https://api.badtheorylabs.com/v1",
    api_key=BTL_KEY,
)

# same call. same shape. less spend.
client.chat.completions.create(
    model="btl-frontier",
    messages=[{"role": "user",
               "content": "ship it"}],
)
Drop-in
OpenAI-compatible
Providers
Multi-vendor routing
Billing
1 credit = $1 spend
Proof
Ledgered + metered

How it cuts spend

A token-efficiency layer,
not just a router.

Routing to a cheaper equivalent upstream is only half of it. The runtime also sends fewer billable tokens, reuses the ones it must send, and avoids doing the same work twice. You keep the model boundary you chose; we cut the waste before the request reaches it.

01
Provider-native prompt caching
Aggressive prefix reuse on the upstreams that support it, so the stable head of a prompt stops getting re-billed on every call.
02
Conversation compression
Stale history gets compressed before it leaves the gateway. The model keeps the thread; you stop paying to resend it verbatim.
03
Retrieval chunk dedupe
Repeated context — the same docs pasted into every turn — is collapsed before the request reaches the provider.
04
Output budget shaping
When a caller forgets a max-tokens cap, the runtime applies a sane one instead of letting a runaway completion bill you.

What teams get

Switch the base URL,
keep the product.

Best fit for teams already shipping AI products and feeling real spend or latency pressure. No exact-vendor lock-in — ask for a specific provider when you need it, let the gateway choose when you don't.

Request access →
01
Drop-in compatibility
Keep the OpenAI-compatible surface your app already speaks. Switch the base URL — not the architecture.
02
Cheaper, faster execution
Routing to equivalent upstreams plus cache and dedupe push spend down and speed up repeated traffic.
03
Proof it's working
Request ledgering, usage analytics, and pricing visibility make the savings something you can actually see.
04
Self-serve keys & billing
Workspace auth, scoped API keys for inference and read-only usage, credits, and top-ups — no operator backdoor required.

Customer API surface

The routes that
actually matter.

Most traffic only ever touches two of these. The rest are for keys, usage, and the catalog. No /v1/admin/* or ops-only auth in the customer path.

POST/v1/chat/completionsprimary inference
POST/v1/responsesresponses API
GET/v1/modelspublic model slugs
GET/v1/providersprovider catalog
GET/v1/account/pricingwhat you pay now
POST/v1/api-keysscoped keys
GET/v1/usage/summaryspend & savings

Stop paying for token waste.

Tell us your stack, providers, traffic, and constraints. We'll get you a key.