CodeAxiom

https://codeaxiom.avixosec.xyz | contact@avixosec.xyz

CodeAxiom builds AxiomCode, a coding model that turns programming requests into working, verified code changes.

The first target is AxiomCode-3B, based on Qwen2.5-Coder-3B. The model is not trained yet. This repository contains the runner, benchmark shape, training data format, Modal path, and report templates for the first measured before and after experiment.

Core experiment

Qwen2.5-Coder-3B base
        ↓ same benchmark
base result

verified coding-task data
        ↓ LoRA training on Modal
AxiomCode-3B

AxiomCode-3B
        ↓ same benchmark
trained result

compare report

CodeAxiom does not publish benchmark claims before runs exist. Current public numbers stay marked as planned or TBD.

Model	Tasks	Passed	Pass rate
Qwen2.5-Coder-3B base	20	TBD	TBD
AxiomCode-3B	20	TBD	TBD

What exists now

local agent runner
YAML task format
noop backend
prepared-patch backend
OpenAI-compatible model backend
toy verifier demo
AxiomRepair-20 suite scaffold
AxiomTask-20 broad programming task suite scaffold
benchmark runner and compare script
AxiomTrace SFT export script
LoRA config for AxiomCode-3B
Modal training and serving entrypoints
model card template
public site
CI for demo tasks

The runner is local by default. It can use a generic OpenAI-compatible endpoint for Modal, Ollama, or a local proxy when configured.

Why executable verification

A generated answer is not enough. CodeAxiom records whether code applies, compiles, runs, and passes tests.

The first training data format keeps:

task description
relevant files
baseline and verification output
returned unified diff
verification result
split metadata

Only verified traces should become training data. Eval holdout traces must not be used for training.

Local verifier demo

Install dependencies:

python -m pip install -r requirements.txt

Run the task that should stay failed:

python agent/run_task.py examples/tasks/toy_noop.yaml

Run the task that should pass after a prepared patch:

python agent/run_task.py examples/tasks/toy_fix.yaml

On Windows:

py -m pip install -r requirements.txt
py .\agent\run_task.py .\examples\tasks\toy_noop.yaml
py .\agent\run_task.py .\examples\tasks\toy_fix.yaml

Expected result:

toy_noop -> failed
toy_fix -> passed

Each run writes:

artifacts/runs/<run_id>.json
artifacts/logs/<run_id>.log
artifacts/patches/<run_id>.patch
artifacts/reports/<run_id>.md
artifacts/plans/<run_id>.md
artifacts/traces/<run_id>.jsonl
artifacts/tmp/<run_id>/

Generated artifacts are ignored by Git.

Model backend

A task can use the openai_compatible backend with these fields:

backend:
  name: openai_compatible
  base_url: http://127.0.0.1:11434/v1
  api_key_env: OPENAI_API_KEY
  model: qwen2.5-coder:3b
  temperature: 0
  max_tokens: 2048

The backend sends the task, relevant files, and baseline or verification output to the endpoint. It expects a unified diff only, applies it with Git, and then the runner executes the verification command. For model backends, the runner can retry with verification output when an attempt fails.

Benchmark and training path

Run a benchmark suite:

python eval/scripts/run_benchmark.py eval/suites/axiomrepair_20.yaml --artifacts artifacts
python eval/scripts/run_benchmark.py eval/suites/axiomtask_20.yaml --artifacts artifacts

Compare two benchmark outputs:

python eval/scripts/compare_runs.py artifacts/reports/base.json artifacts/reports/axiomcode.json

Export verified positive traces:

python training/scripts/export_sft.py --results artifacts/runs --output training/data/train.jsonl

Data policy

The first training phase uses public, licensed, and project-owned verified coding-task data. Private user code is not training data by default.

Benchmark data must stay separate from training data. Public scores need a contamination note.

See docs/data-policy.md and docs/contamination-policy.md.

Files worth reading

ONE_PAGER.md                               short project summary
docs/model-plan.md                         AxiomCode naming and target
docs/benchmark-protocol.md                 AxiomRepair and reporting protocol
docs/modal-runbook.md                      Modal compute path
docs/first-training-runbook.md             first measured run steps
docs/training-roadmap.md                   training stages
docs/architecture.md                       system shape
docs/benchmark-plan.md                     evaluation plan
docs/compute-plan.md                       compute policy
docs/funding-brief.md                      partner brief
eval/suites/axiomrepair_20.yaml            first suite scaffold
eval/reports/base_vs_axiomcode_template.md report template
training/configs/axiomcode_3b_lora.yaml    LoRA config
models/cards/AxiomCode-3B.md               model card draft
agent/README.md                            runner notes
examples/tasks/                            demo task files
site/                                      static public site

Scope

CodeAxiom is not a finished model. AxiomCode-3B is the first target checkpoint.

No benchmark score is claimed before evaluation.

Contact

contact@avixosec.xyz

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeAxiom

Core experiment

What exists now

Why executable verification

Local verifier demo

Model backend

Benchmark and training path

Data policy

Files worth reading

Scope

Contact

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
agent		agent
artifacts		artifacts
docs		docs
eval		eval
examples		examples
modal		modal
models/cards		models/cards
scripts		scripts
site		site
training		training
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
ONE_PAGER.md		ONE_PAGER.md
README.md		README.md
SECURITY.md		SECURITY.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

CodeAxiom

Core experiment

What exists now

Why executable verification

Local verifier demo

Model backend

Benchmark and training path

Data policy

Files worth reading

Scope

Contact

License

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages