continue-profiling-agent (cpa) is a Linux continuous profiling agent built
to keep performance evidence available after an incident has already happened.
It continuously records ultra-low-overhead profiling history on production
hosts, so engineers can inspect the exact time window later instead of waiting
for the issue to reproduce.
- Ultra-low overhead: with the highly optimized libgunwinder, CPA can continuously record whole-machine, per-second flamegraphs for all processes at very low and often unnoticeable cost.
- Always-on evidence: CPA keeps recent second-level profiling history on the host and rotates it by policy.
- No application intrusion: CPA does not modify, inject into, or restart user processes, so profiling can stay resident without changing application behavior.
- Practical incident workflow:
cpa showcan export flamegraph data or open an interactive TUI to jump to the affected wall-clock time.
CPA has two primary commands:
cpa monitor: collect and persist profile datacpa show: inspect stored data as a flamegraph or in the embedded Rust TUI
- continuous on-CPU profiling through BPF or perf backends
- off-CPU collection
- probe-triggered stack capture
- persistent store rotation under
store_dir - flamegraph export through
cpa show - embedded Rust TUI through
cpa show --use_cui
CPA has four major layers:
- CLI layer
cpa monitorparses runtime and filtering options.cpa showreads stored profile data and renders either a flamegraph or the embedded TUI.
- Profiling runtime
- The runtime is organized around workers for capture, unwind, storage rotation, local statistics, and debug paths.
- BPF and perf capture backends
- BPF capture uses CO-RE programs and host-side loaders under
bpf/. - Perf capture remains available as the alternate sampling backend.
- BPF capture uses CO-RE programs and host-side loaders under
- Storage and viewer
cpa monitorwrites a directory containingconf,strmap,idsmap,stack.bin, and related metadata.cpa showand the Rustcpa_showlibrary consume that directory.
See the detailed architecture guide in docs/en/architecture.md.
For the on-disk format and backend fallback rules, see:
- 中文:存储格式说明
- 中文:后端与能力检查
- 中文:BPF 第三方依赖
- English: Store Format
- English: Backends and Capability Checks
- English: BPF Third-Party Dependencies
- English: Technical Deep Dive
Do not run CPA on upstream Linux kernels from 5.19 through 6.3, including
the 6.1 LTS line. A kernel BPF copy_from_user_nofault() bug in that
range can deadlock the host. See
docs/en/kernel-compatibility.md
before installing CPA on production machines.
Install CPA from the latest release and start the systemd service:
curl -fsSL https://raw.githubusercontent.com/volcengine/continue-profiling-agent/refs/heads/main/tools/install_cpa.sh | sudo bashCheck that the service is running:
sudo systemctl status cpa.service
cpa versionCPA stores profiling data under /var/log/cpa by default. Print the available
time range from a stored directory:
cpa show --read /var/log/cpa/cpa_YYMMDD --show_range 1Export a flamegraph profile:
cpa show --read /var/log/cpa/cpa_YYMMDD --output_prof cpa.profOpen the embedded Rust TUI:
cpa show --read /var/log/cpa/cpa_YYMMDD --use_cuiUninstall CPA while preserving profiling data:
curl -fsSL https://raw.githubusercontent.com/volcengine/continue-profiling-agent/refs/heads/main/tools/install_cpa.sh | sudo bash -s -- --uninstallSee docs/en/usage.md for more examples.
After a GitHub release is published, install the portable Linux x86_64 package with:
curl -fsSL https://raw.githubusercontent.com/volcengine/continue-profiling-agent/refs/heads/main/tools/install_cpa.sh | sudo bashTo install a specific release tag:
curl -fsSL https://raw.githubusercontent.com/volcengine/continue-profiling-agent/refs/heads/main/tools/install_cpa.sh | sudo bash -s -- --version v1.0.0For a locally built systemd-managed installation, use the deployment helper:
sudo tools/deploy_cpa.sh --binary build/bin/cpaOn hosts without /sys/kernel/btf/vmlinux, generate a matching detached BTF
with pahole and pass it explicitly:
sudo mkdir -p /etc/cpa
sudo pahole --btf_encode_detached=/etc/cpa/vmlinux.btf \
/usr/lib/debug/boot/vmlinux-$(uname -r)
sudo tools/deploy_cpa.sh --binary build/bin/cpa --btf /etc/cpa/vmlinux.btfThe helper creates /var/log/cpa, writes /etc/cpa/cpa.conf, installs a
cpa.service systemd unit, and starts CPA at 49 Hz by default. See
docs/en/deploy.md for the full deployment guide.
Prerequisites:
- Linux with eBPF CO-RE support
cmake >= 3.10clang,llvm-strip,llvm-objdumppython3cargomake- development libraries for
elf,dw,zstd,crypto, andiberty - the
libs/libgunwindersubmodule initialized from GitHub
Build from source:
git submodule update --init --recursive
cmake -S . -B build
cmake --build build -jThe main executable is generated at build/bin/cpa.
To generate the portable single-file distribution artifact through SOPacker, run:
cmake --build build -j --target cpa_portableThat target produces build/bin/cpa_portable. It packages the dynamically
linked cpa executable and its dependent shared libraries; it is not a static
link of LGPL components. To reuse an existing local
checkout, point CMake at it with
-DCPA_BPF_SOPACKER_DIR=/path/to/sopacker.
The portable artifact is a self-extracting script intended for distribution to
hosts that may not have libgunwinder installed. It keeps CPA dynamically
linked, so libgunwinder.so can still be replaced when validating a different
LGPL build or deploying a patched unwinder.
To replace the bundled libgunwinder.so in a cpa_portable artifact:
# Extract once and run a cheap command.
./cpa_portable version
# The generated script records its extraction directory near the top.
tmpdir=$(sed -n 's/^tempdir=//p' ./cpa_portable | head -n1)
# Replace the unpacked shared object. The replacement must be ABI-compatible
# and should use the same SONAME, libgunwinder.so.
cp /path/to/libgunwinder.so "$tmpdir/libgunwinder.so"
# Run the portable artifact again; SOPacker reuses the existing extraction
# directory when the unpacked cpa binary still matches the embedded checksum.
./cpa_portable versionFor one-off testing, an explicit preload also works and avoids changing the temporary directory:
LD_PRELOAD=/path/to/libgunwinder.so ./cpa_portable versionIf /tmp is cleaned or the portable artifact changes, repeat the extraction and
replacement steps.
See docs/en/build.md for distro packages, LLVM/CMake notes, and the complete build and test guide.
libgunwinder includes cfi_bench, a focused microbenchmark for the DWARF CFI
frame evaluator. The benchmark below was measured on 2026-06-01 on an
Intel(R) Xeon(R) Platinum 8336C CPU @ 2.30GHz, Linux 5.15.152, GCC 8.3.0.
Each row is the median of three runs pinned
to CPU0:
taskset -c 0 libs/libgunwinder/bin/cfi_bench \
--frames 20000000 --set-size <N> --warmup 10000| Working set | CFI frames/s | Avg ns/frame | P50 ns | P99 ns | 16-frame samples/s | 32-frame samples/s |
|---|---|---|---|---|---|---|
| 100 | 11,998,307 | 83.35 | 70.11 | 167.27 | 749,894 | 374,947 |
| 1,000 | 4,227,484 | 236.55 | 228.41 | 327.33 | 264,218 | 132,109 |
| 10,000 | 1,353,664 | 738.74 | 737.73 | 862.48 | 84,604 | 42,302 |
The sample-rate columns are theoretical values calculated as CFI frames per second divided by average stack depth. End-to-end CPA throughput also includes sampling, queueing, symbol formatting, store writes, and cold ELF/CFI loads.
Shared generic options:
--help, -h: print help.--verbose, -v: enable verbose CLI logging.--config, -C <FILE>: override options from a config file using{arg_name}: {arg_val}entries.--btf_path, -b <PATH>: override the custom BTF path used by the BPF backend. Startup preflight rejects unreadable or invalid BTF objects.--duration, -d <SEC>: stopcpa monitorafter the given number of seconds.
cpa monitor options:
--store_dir, -s <DIR>: root directory for continuous CPA stores.--backend <bpf|perf>: select the sampling backend.perfonly supports continuous on-CPU profiling; plain continuous on-CPU requests may fall back toperfwhen BPF is unavailable.--freq, -F <HZ>: sampling frequency.--record_interval, -r <SEC>: store rotation and query granularity.--persistent_day, -P <DAYS>: retain only the latest N days of continuous stores.--oneshot: write a single flamegraph profile instead of rotating store directories. Requires the BPF backend.--output_prof, -o <PATH>: output path for one-shot mode. Default:cpa.prof.--pid, -p <PID>: capture one target pid. Requires the BPF backend.--comm, -n <NAME>: capture tasks matching one comm/group-comm name. Requires the BPF backend.--kernel_stack, -K: capture kernel-space stacks only. Requires the BPF backend.--offcpu, -u: collect off-CPU samples. Only valid on the BPF backend.--probe <SPEC>: capture stacks only when a probe fires, using bpftrace-style syntax such askprobe:try_to_free_pages.--disable_sym, -S: disable symbol parsing and keep raw addresses where applicable.--include_full_path: keep full file paths in rendered frames where available.--strip_name_disable: disable Go symbol-name stripping.--record_env_name, -R <LIST>: record these env keys into metadata socpa showcan filter on them.--parse_env_values, -V <LIST>: only unwind user stacks for processes whose recorded env values match this list.--max_queue_size, -m <N>: maximum in-memory stack event queue length before backpressure.--stack_size <BYTES>: BPF backend user-stack capture buffer size. Must be 4K-aligned and within[4096, 65536]. The perf backend does not support custom stack-size semantics.--max_cache_size_mb <MB>: restart monitor when symbol/debug cache usage exceeds this limit.--max_store_size_mb <MB>: restart monitor and trim old stores when store usage exceeds this limit.--log_print_cycles <N>: print local runtime statistics every N timer cycles.--bench: print per-stat-interval final-DWARF-path benchmark statistics, including measured sample count, actual unwind rate, average/min/max latency, and fixed latency buckets.FP_BETTERsamples are excluded. Cold symbol/CFI loading can appear in the first few intervals.--debug_option <PID,FREQ,PATH>: debug capture override in{pid},{sample_freq},{dump_path}form.
cpa show options:
--read, -r <DIR>: input CPA profile directory.--starttime, -B <HH:MM:SS>: absolute start time in the stored record timeline.--endtime, -E <HH:MM:SS>: absolute end time in the stored record timeline.--output_num, -n <N>: export N consecutive records from the selected point. Must be a positive integer.--output_prof, -o <PATH>: flamegraph output path. Without explicit time options, export starts from the first matching record. If omitted, CPA generatescpa_<time>_<n>.prof.--show_range, -p: print the available record time range and exit.--use_cui, -G: open the embedded Rustcpa_showterminal UI.--use_cache, -u: reuse files underdecompressed/instead of re-decompressing them.--split_path <DIR>: export the selected time range as raw split files into this directory.--show_thread_name, -S: include thread names in flamegraph output.--no_pid, -P: omit pid suffixes from metadata labels.--no_env, -V: omit env labels from metadata labels.--show_raw, -R: render raw metadata entries instead of formatted CPA labels.--target_pid <PID>: filter to one pid.--target_comm <NAME>: filter to one process group comm.--target_env <VALUE>: filter to one recorded env value.--target_cgroup_id <ID>: filter to one cgroup ID.--target_cpu <CPUSET>: filter to CPUs in a standard CPU-set expression such as1-3,5,7-9.
English:
- Architecture
- Technical Deep Dive
- Store Format
- Backends and Capability Checks
- BPF Third-Party Dependencies
- Usage Guide
- Deployment Guide
- Build Guide
- Test Coverage Guide
- Development Guide
- Contributing
中文:
Component docs:
The repository contains Python-based integration tests under tests/.
pytest -q tests/cpaSee docs/en/testing.md for suite coverage details.
Many tests require:
- a built
build/bin/cpa - root privileges
- a kernel and toolchain that can load the bundled fixture modules
src/: user-space CLI,cpa monitor,cpa show, and thecpa_showRust integrationbpf/: BPF programs, host-side loaders, skeleton generation,libbpf/bpftoolsubmodules, and build helperstests/: unit and integration testsdocs/: English and Chinese project documentation
cpa_showis built as an embedded Rust library forcpa show --use_cui; it is not shipped as a standalone executable in this repository.libgunwinderis tracked as thelibs/libgunwindersubmodule. The CMake build invokes its Makefile, copieslibgunwinder.sonext tobuild/bin/cpa, and linkscpadynamically with an$ORIGINruntime search path.- The default storage directory is
/var/log/cpa. - The default one-shot output name is
cpa.prof.
This repository is licensed under the Apache License 2.0. See LICENSE.

