Skip to content

AvixoSec/codeaxiom

Repository files navigation

CodeAxiom

https://codeaxiom.avixosec.xyz | contact@avixosec.xyz

CodeAxiom builds AxiomCode, a coding model that turns programming requests into working, verified code changes.

The first target is AxiomCode-3B, based on Qwen2.5-Coder-3B. The model is not trained yet. This repository contains the runner, benchmark shape, training data format, Modal path, and report templates for the first measured before and after experiment.

Core experiment

Qwen2.5-Coder-3B base
        ↓ same benchmark
base result

verified coding-task data
        ↓ LoRA training on Modal
AxiomCode-3B

AxiomCode-3B
        ↓ same benchmark
trained result

compare report

CodeAxiom does not publish benchmark claims before runs exist. Current public numbers stay marked as planned or TBD.

Model Tasks Passed Pass rate
Qwen2.5-Coder-3B base 20 TBD TBD
AxiomCode-3B 20 TBD TBD

What exists now

  • local agent runner
  • YAML task format
  • noop backend
  • prepared-patch backend
  • OpenAI-compatible model backend
  • toy verifier demo
  • AxiomRepair-20 suite scaffold
  • AxiomTask-20 broad programming task suite scaffold
  • benchmark runner and compare script
  • AxiomTrace SFT export script
  • LoRA config for AxiomCode-3B
  • Modal training and serving entrypoints
  • model card template
  • public site
  • CI for demo tasks

The runner is local by default. It can use a generic OpenAI-compatible endpoint for Modal, Ollama, or a local proxy when configured.

Why executable verification

A generated answer is not enough. CodeAxiom records whether code applies, compiles, runs, and passes tests.

The first training data format keeps:

  • task description
  • relevant files
  • baseline and verification output
  • returned unified diff
  • verification result
  • split metadata

Only verified traces should become training data. Eval holdout traces must not be used for training.

Local verifier demo

Install dependencies:

python -m pip install -r requirements.txt

Run the task that should stay failed:

python agent/run_task.py examples/tasks/toy_noop.yaml

Run the task that should pass after a prepared patch:

python agent/run_task.py examples/tasks/toy_fix.yaml

On Windows:

py -m pip install -r requirements.txt
py .\agent\run_task.py .\examples\tasks\toy_noop.yaml
py .\agent\run_task.py .\examples\tasks\toy_fix.yaml

Expected result:

toy_noop -> failed
toy_fix -> passed

Each run writes:

artifacts/runs/<run_id>.json
artifacts/logs/<run_id>.log
artifacts/patches/<run_id>.patch
artifacts/reports/<run_id>.md
artifacts/plans/<run_id>.md
artifacts/traces/<run_id>.jsonl
artifacts/tmp/<run_id>/

Generated artifacts are ignored by Git.

Model backend

A task can use the openai_compatible backend with these fields:

backend:
  name: openai_compatible
  base_url: http://127.0.0.1:11434/v1
  api_key_env: OPENAI_API_KEY
  model: qwen2.5-coder:3b
  temperature: 0
  max_tokens: 2048

The backend sends the task, relevant files, and baseline or verification output to the endpoint. It expects a unified diff only, applies it with Git, and then the runner executes the verification command. For model backends, the runner can retry with verification output when an attempt fails.

Benchmark and training path

Run a benchmark suite:

python eval/scripts/run_benchmark.py eval/suites/axiomrepair_20.yaml --artifacts artifacts
python eval/scripts/run_benchmark.py eval/suites/axiomtask_20.yaml --artifacts artifacts

Compare two benchmark outputs:

python eval/scripts/compare_runs.py artifacts/reports/base.json artifacts/reports/axiomcode.json

Export verified positive traces:

python training/scripts/export_sft.py --results artifacts/runs --output training/data/train.jsonl

Data policy

The first training phase uses public, licensed, and project-owned verified coding-task data. Private user code is not training data by default.

Benchmark data must stay separate from training data. Public scores need a contamination note.

See docs/data-policy.md and docs/contamination-policy.md.

Files worth reading

ONE_PAGER.md                               short project summary
docs/model-plan.md                         AxiomCode naming and target
docs/benchmark-protocol.md                 AxiomRepair and reporting protocol
docs/modal-runbook.md                      Modal compute path
docs/first-training-runbook.md             first measured run steps
docs/training-roadmap.md                   training stages
docs/architecture.md                       system shape
docs/benchmark-plan.md                     evaluation plan
docs/compute-plan.md                       compute policy
docs/funding-brief.md                      partner brief
eval/suites/axiomrepair_20.yaml            first suite scaffold
eval/reports/base_vs_axiomcode_template.md report template
training/configs/axiomcode_3b_lora.yaml    LoRA config
models/cards/AxiomCode-3B.md               model card draft
agent/README.md                            runner notes
examples/tasks/                            demo task files
site/                                      static public site

Scope

CodeAxiom is not a finished model. AxiomCode-3B is the first target checkpoint.

No benchmark score is claimed before evaluation.

Contact

contact@avixosec.xyz

License

MIT

About

Verified coding-agent workbench for real repository tasks: patches, tests, logs, artifacts, and metrics.

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors