https://codeaxiom.avixosec.xyz | contact@avixosec.xyz
CodeAxiom builds AxiomCode, a coding model that turns programming requests into working, verified code changes.
The first target is AxiomCode-3B, based on Qwen2.5-Coder-3B. The model is not trained yet. This repository contains the runner, benchmark shape, training data format, Modal path, and report templates for the first measured before and after experiment.
Qwen2.5-Coder-3B base
↓ same benchmark
base result
verified coding-task data
↓ LoRA training on Modal
AxiomCode-3B
AxiomCode-3B
↓ same benchmark
trained result
compare report
CodeAxiom does not publish benchmark claims before runs exist. Current public numbers stay marked as planned or TBD.
| Model | Tasks | Passed | Pass rate |
|---|---|---|---|
| Qwen2.5-Coder-3B base | 20 | TBD | TBD |
| AxiomCode-3B | 20 | TBD | TBD |
- local agent runner
- YAML task format
- noop backend
- prepared-patch backend
- OpenAI-compatible model backend
- toy verifier demo
- AxiomRepair-20 suite scaffold
- AxiomTask-20 broad programming task suite scaffold
- benchmark runner and compare script
- AxiomTrace SFT export script
- LoRA config for AxiomCode-3B
- Modal training and serving entrypoints
- model card template
- public site
- CI for demo tasks
The runner is local by default. It can use a generic OpenAI-compatible endpoint for Modal, Ollama, or a local proxy when configured.
A generated answer is not enough. CodeAxiom records whether code applies, compiles, runs, and passes tests.
The first training data format keeps:
- task description
- relevant files
- baseline and verification output
- returned unified diff
- verification result
- split metadata
Only verified traces should become training data. Eval holdout traces must not be used for training.
Install dependencies:
python -m pip install -r requirements.txtRun the task that should stay failed:
python agent/run_task.py examples/tasks/toy_noop.yamlRun the task that should pass after a prepared patch:
python agent/run_task.py examples/tasks/toy_fix.yamlOn Windows:
py -m pip install -r requirements.txt
py .\agent\run_task.py .\examples\tasks\toy_noop.yaml
py .\agent\run_task.py .\examples\tasks\toy_fix.yamlExpected result:
toy_noop -> failed
toy_fix -> passed
Each run writes:
artifacts/runs/<run_id>.json
artifacts/logs/<run_id>.log
artifacts/patches/<run_id>.patch
artifacts/reports/<run_id>.md
artifacts/plans/<run_id>.md
artifacts/traces/<run_id>.jsonl
artifacts/tmp/<run_id>/
Generated artifacts are ignored by Git.
A task can use the openai_compatible backend with these fields:
backend:
name: openai_compatible
base_url: http://127.0.0.1:11434/v1
api_key_env: OPENAI_API_KEY
model: qwen2.5-coder:3b
temperature: 0
max_tokens: 2048The backend sends the task, relevant files, and baseline or verification output to the endpoint. It expects a unified diff only, applies it with Git, and then the runner executes the verification command. For model backends, the runner can retry with verification output when an attempt fails.
Run a benchmark suite:
python eval/scripts/run_benchmark.py eval/suites/axiomrepair_20.yaml --artifacts artifacts
python eval/scripts/run_benchmark.py eval/suites/axiomtask_20.yaml --artifacts artifactsCompare two benchmark outputs:
python eval/scripts/compare_runs.py artifacts/reports/base.json artifacts/reports/axiomcode.jsonExport verified positive traces:
python training/scripts/export_sft.py --results artifacts/runs --output training/data/train.jsonlThe first training phase uses public, licensed, and project-owned verified coding-task data. Private user code is not training data by default.
Benchmark data must stay separate from training data. Public scores need a contamination note.
See docs/data-policy.md and docs/contamination-policy.md.
ONE_PAGER.md short project summary
docs/model-plan.md AxiomCode naming and target
docs/benchmark-protocol.md AxiomRepair and reporting protocol
docs/modal-runbook.md Modal compute path
docs/first-training-runbook.md first measured run steps
docs/training-roadmap.md training stages
docs/architecture.md system shape
docs/benchmark-plan.md evaluation plan
docs/compute-plan.md compute policy
docs/funding-brief.md partner brief
eval/suites/axiomrepair_20.yaml first suite scaffold
eval/reports/base_vs_axiomcode_template.md report template
training/configs/axiomcode_3b_lora.yaml LoRA config
models/cards/AxiomCode-3B.md model card draft
agent/README.md runner notes
examples/tasks/ demo task files
site/ static public site
CodeAxiom is not a finished model. AxiomCode-3B is the first target checkpoint.
No benchmark score is claimed before evaluation.
MIT