Open model · Code review

BTL-2 Coder 7B

A small open blade — local, inspectable, and built for code-review findings.

A 7B LoRA adapter on Qwen2.5-Coder that returns structured security and correctness findings with file-level evidence and numeric confidence. Not a frontier clone. A downloadable model that ships with weights, evals, and receipts.

one finding, machine-readable
{
  "severity": "critical",
  "file": "src/users.ts",
  "line": 42,
  "title": "SQL injection through
            string-built query",
  "evidence": "User id is concatenated
               directly into the SQL string.",
  "recommendation": "Use a parameterized
                     query.",
  "confidence": 0.96
}
Base model
Qwen2.5-Coder-7B-Instruct
Method
LoRA SFT · Unsloth · r64 / α128
Training mix
4k API traces + 1k templates
Runtime
TypeScript agent · OpenAI-compatible

What it catches

Narrow on purpose.
Tuned for findings.

It is not a general chat model wearing a security hat. The adapter is trained to emit one thing well: a JSON array of findings, each with a severity, the file and line, concrete evidence, a recommendation, and a confidence number you can threshold on.

SQL injection
Path traversal
Authorization bypass
Missing error handling
Boundary / off-by-one logic
Related security & correctness issues

Receipts

Evals, not vibes.

Measured on an NVIDIA H200 with 4-bit adapter inference under strict schema prompting. Strict schema validity climbs from 0.43 to 0.95 once the prompt contract is included — the runtime ships that contract by default.

EvalJSON parseSchema validCategory hitFile hit
Heldout 100 strict1.000.950.780.84
Heldout 30 strict v21.000.980.870.87
Seeded 15 strict1.001.000.931.00
Seeded 15 strict v21.001.00

Seeded controlled benchmark reaches 0.933 precision / 0.933 recall and 0.956 weighted severity recall. Full breakdown in reports/EVAL_SUMMARY.md.

Not a Mythos clone

If Mythos is the locked frontier giant,
this is the small open one.

We are not claiming frontier-class capability or piggybacking on a brand. The claim is narrower and stronger: a model you can actually download, inspect, and run on your own hardware — with the training recipe and evals in the open.

01
Local
Runs on a laptop-class GPU via Ollama or llama.cpp. Your code never leaves the machine.
02
Inspectable
Open weights, open adapter config, open eval scripts. Nothing about the result is a black box.
03
Receipts
Published numbers on heldout and seeded benchmarks — format adherence, file hits, precision and recall.
04
Structured
Machine-readable findings with numeric confidence, so you can gate a CI step on severity and score.

Quickstart

Pull it. Run it.

Load the adapter
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "unsloth/Qwen2.5-Coder-7B-Instruct"
adapter = "badtheorylabs/btl-2-coder"

tok = AutoTokenizer.from_pretrained(adapter)
m = AutoModelForCausalLM.from_pretrained(base, device_map="auto")
m = PeftModel.from_pretrained(m, adapter)
Or local, OpenAI-compatible
ollama pull qwen2.5-coder:7b

curl http://localhost:8787/v1/agents/code/runs \
  -H 'content-type: application/json' \
  -d '{
    "task": "Review for security bugs",
    "workspaceRoot": "/path/to/repo",
    "mode": "review"
  }'

Weights, evals, and receipts.

All of it is downloadable. None of it is a black box.