Open model · Code review

BTL-2 Coder 7B

A small open blade — local, inspectable, and built for code-review findings.

A 7B LoRA adapter on Qwen2.5-Coder that returns structured security and correctness findings with file-level evidence and numeric confidence. Not a frontier clone. A downloadable model that ships with weights, evals, and receipts.

Download weights →View on GitHub

one finding, machine-readable

{
  "severity": "critical",
  "file": "src/users.ts",
  "line": 42,
  "title": "SQL injection through
            string-built query",
  "evidence": "User id is concatenated
               directly into the SQL string.",
  "recommendation": "Use a parameterized
                     query.",
  "confidence": 0.96
}

Base model

Qwen2.5-Coder-7B-Instruct

Method

LoRA SFT · Unsloth · r64 / α128

Training mix

4k API traces + 1k templates

Runtime

TypeScript agent · OpenAI-compatible

What it catches

Narrow on purpose.
Tuned for findings.

It is not a general chat model wearing a security hat. The adapter is trained to emit one thing well: a JSON array of findings, each with a severity, the file and line, concrete evidence, a recommendation, and a confidence number you can threshold on.

SQL injection

Path traversal

Authorization bypass

Missing error handling

Boundary / off-by-one logic

Related security & correctness issues

Receipts

Evals, not vibes.

Measured on an NVIDIA H200 with 4-bit adapter inference under strict schema prompting. Strict schema validity climbs from 0.43 to 0.95 once the prompt contract is included — the runtime ships that contract by default.

Eval	JSON parse	Schema valid	Category hit	File hit
Heldout 100 strict	1.00	0.95	0.78	0.84
Heldout 30 strict v2	1.00	0.98	0.87	0.87
Seeded 15 strict	1.00	1.00	0.93	1.00
Seeded 15 strict v2	1.00	1.00	—	—

Seeded controlled benchmark reaches 0.933 precision / 0.933 recall and 0.956 weighted severity recall. Full breakdown in reports/EVAL_SUMMARY.md.

Not a Mythos clone

If Mythos is the locked frontier giant,
this is the small open one.

We are not claiming frontier-class capability or piggybacking on a brand. The claim is narrower and stronger: a model you can actually download, inspect, and run on your own hardware — with the training recipe and evals in the open.

Local

Runs on a laptop-class GPU via Ollama or llama.cpp. Your code never leaves the machine.

Inspectable

Open weights, open adapter config, open eval scripts. Nothing about the result is a black box.

Receipts

Published numbers on heldout and seeded benchmarks — format adherence, file hits, precision and recall.

Structured

Machine-readable findings with numeric confidence, so you can gate a CI step on severity and score.

Quickstart

Pull it. Run it.

Load the adapter

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "unsloth/Qwen2.5-Coder-7B-Instruct"
adapter = "badtheorylabs/btl-2-coder"

tok = AutoTokenizer.from_pretrained(adapter)
m = AutoModelForCausalLM.from_pretrained(base, device_map="auto")
m = PeftModel.from_pretrained(m, adapter)

Or local, OpenAI-compatible

ollama pull qwen2.5-coder:7b

curl http://localhost:8787/v1/agents/code/runs \
  -H 'content-type: application/json' \
  -d '{
    "task": "Review for security bugs",
    "workspaceRoot": "/path/to/repo",
    "mode": "review"
  }'

Weights, evals, and receipts.

All of it is downloadable. None of it is a black box.

Download on Hugging Face →Talk to us