Product · Inference gateway

BTL Runtime

One API in front of every model provider. Lower effective AI spend and lower latency — without rewriting your app.

For teams shipping across OpenAI, Anthropic, Bedrock, Vertex, OpenRouter, and the long tail. BTL Runtime is the drop-in gateway that keeps working when provider economics change underneath you.

Request access →Schedule a call

change one line

from openai import OpenAI

client = OpenAI(
    base_url="https://api.badtheorylabs.com/v1",
    api_key=BTL_KEY,
)

# same call. same shape. less spend.
client.chat.completions.create(
    model="btl-frontier",
    messages=[{"role": "user",
               "content": "ship it"}],
)

Drop-in

OpenAI-compatible

Providers

Multi-vendor routing

Billing

1 credit = $1 spend

Proof

Ledgered + metered

How it cuts spend

A token-efficiency layer,
not just a router.

Routing to a cheaper equivalent upstream is only half of it. The runtime also sends fewer billable tokens, reuses the ones it must send, and avoids doing the same work twice. You keep the model boundary you chose; we cut the waste before the request reaches it.

Provider-native prompt caching

Aggressive prefix reuse on the upstreams that support it, so the stable head of a prompt stops getting re-billed on every call.

Conversation compression

Stale history gets compressed before it leaves the gateway. The model keeps the thread; you stop paying to resend it verbatim.

Retrieval chunk dedupe

Repeated context — the same docs pasted into every turn — is collapsed before the request reaches the provider.

Output budget shaping

When a caller forgets a max-tokens cap, the runtime applies a sane one instead of letting a runaway completion bill you.

What teams get

Switch the base URL,
keep the product.

Best fit for teams already shipping AI products and feeling real spend or latency pressure. No exact-vendor lock-in — ask for a specific provider when you need it, let the gateway choose when you don't.

Request access →

Drop-in compatibility

Keep the OpenAI-compatible surface your app already speaks. Switch the base URL — not the architecture.

Cheaper, faster execution

Routing to equivalent upstreams plus cache and dedupe push spend down and speed up repeated traffic.

Proof it's working

Request ledgering, usage analytics, and pricing visibility make the savings something you can actually see.

Self-serve keys & billing

Workspace auth, scoped API keys for inference and read-only usage, credits, and top-ups — no operator backdoor required.

Customer API surface

The routes that
actually matter.

Most traffic only ever touches two of these. The rest are for keys, usage, and the catalog. No /v1/admin/* or ops-only auth in the customer path.

POST/v1/chat/completionsprimary inference

POST/v1/responsesresponses API

GET/v1/modelspublic model slugs

GET/v1/providersprovider catalog

GET/v1/account/pricingwhat you pay now

POST/v1/api-keysscoped keys

GET/v1/usage/summaryspend & savings

Stop paying for token waste.

Tell us your stack, providers, traffic, and constraints. We'll get you a key.

Request access →Schedule a call

BTL Runtime

A token-efficiency layer,not just a router.

Switch the base URL,keep the product.

The routes thatactually matter.

Stop paying for token waste.

A token-efficiency layer,
not just a router.

Switch the base URL,
keep the product.

The routes that
actually matter.