chatter is the modern toolchain for the TalkBank
CHAT transcript format:
validate, convert, normalize, and explore .cha files through a desktop app,
a command-line tool, and an editor language server.
If you work with CHAT transcripts (CHILDES, AphasiaBank, FluencyBank, and the rest of TalkBank) and just want to use chatter, you do not need Rust or any build tools:
-
Desktop app (no terminal needed). Download Chatter for macOS, Windows, or Linux from the latest release, open a
.chafile, and read its validation results. The app keeps itself up to date: it checks for a new version on launch and offers to install it. Recommended if you do not use a command line. -
Command line (
chatter). Install with one command, then validate a single file or a whole folder:# macOS / Linux (Windows: see Installation for the PowerShell one-liner) curl --proto '=https' --tlsv1.2 -LsSf https://github.com/TalkBank/chatter/releases/latest/download/chatter-installer.sh | sh chatter validate myfile.cha # check one transcript chatter validate path/to/folder # check an entire corpus chatter --help # list every command chatter update # update to the newest release
New to chatter? The User Guide walks through validating, fixing, and converting transcripts step by step, and the Installation section below has full per-platform details.
CHAT (Codes for the Human Analysis of Transcripts) is the 40-year-old
transcription format behind every TalkBank corpus.
chatter is the canonical home of the CHAT-format authority and the chatter
tool family, modernizing the CLAN-era workflow with typed Rust infrastructure:
a strict parser, structured validation diagnostics, programmable transforms,
and an LSP-backed editor experience.
Early release (v0.1.0). chatter is usable today and the CHAT-format core (parser, data model, validation, alignment) is mature, but no surface is API-stable yet: expect changes before 1.0. See the support tiers page for the surface-by-surface stability matrix.
The Rust crates behind the CHAT-format core. The foundation crates are preview-quality; some surfaces are intentionally held back as experimental or internal until their stability contracts are settled.
| Crate | Role | Stability |
|---|---|---|
talkbank-model |
CHAT data model, validation rules, error codes, tier-alignment primitives | Preview |
talkbank-parser |
Tree-sitter-backed primary parser (incremental, LSP-friendly) | Preview |
talkbank-parser-re2c |
Independent re2c-based oracle parser that cross-checks tree-sitter on every wild-corpus file | Preview |
talkbank-parser-tests |
Shared parser test harness + reference corpus golden tests | Internal |
talkbank-transform |
CHAT to JSON, normalization, transcript-merge, redaction, transform pipelines | Preview |
talkbank-derive |
Derive macros for the data model | Preview |
talkbank-cache |
SQLite-backed pass/fail cache for validation + roundtrip results | Preview |
talkbank-llm |
OpenAI-compatible HTTP judgment provider (the only network-dependent crate; backs holistic speaker-id) | Experimental |
send2clan |
Rust bindings for "open in CLAN" (macOS Apple Events, Windows WM_APP) | Experimental |
| Component | What it is | Stability |
|---|---|---|
chatter, the chatter binary |
The flagship CLI: validate, normalize, convert (JSON / XML), lint, watch, and the experimental merge / speaker-id reconciliation commands | Preview |
talkbank-lsp, the talkbank-lsp binary |
Language Server Protocol implementation; powers real-time validation, hover, go-to-definition, cross-tier alignment in any LSP-aware editor | Preview |
apps/chatter-desktop/ |
Tauri-based desktop validation app, a TUI parity target for researchers who don't use a terminal | Preview |
| Path | Role |
|---|---|
grammar/ |
The tree-sitter-talkbank grammar, the source of truth for CHAT syntax. Generates the parser used by talkbank-parser. |
spec/ |
The CHAT format specification (constructs, errors, symbol registry); generates the test corpus that gates every grammar/parser change |
schema/ |
JSON Schema for the typed CHAT AST emitted by chatter to-json |
corpus/reference/ |
The reference corpus: every file MUST pass parser equivalence and roundtrip validation on every commit |
test-fixtures/ |
Minimal .cha fixtures used by integration tests |
| Where | What |
|---|---|
book/ |
Comprehensive mdBook: user guides for the chatter CLI, plus CHAT format documentation, architecture, and contributor guides |
docs/strategy/ |
Strategic planning docs (distribution + signing) |
docs/proposals/ |
Format-extension proposals under review |
CONTRIBUTING.md |
How to set up a fresh checkout and contribute |
Prebuilt binaries for macOS (Apple Silicon and Intel), Linux, and
Windows are attached to every GitHub
Release. The chatter
CLI installs via the release's shell (macOS/Linux) or PowerShell
(Windows) installer script, or you can download the archive for your
platform directly from the release page.
- macOS / Linux (CLI): run the installer one-liner from the latest release.
- Windows (CLI): run the PowerShell installer from the latest release. The downloaded binary is not yet code-signed, so SmartScreen may warn; choose "More info" then "Run anyway".
- Desktop app: download the installer for your platform from the
release page. The macOS
.dmgis signed and notarized; the Windows installer is not yet signed (same SmartScreen note as above). - From crates.io: not yet published (see the support tiers page).
To build the CLI from source instead, see the developer quick start below.
Prerequisites: the Rust toolchain pinned by rust-toolchain.toml
(currently 1.96.0; rustup installs it automatically) and SQLite dev
headers (macOS bundled; Linux needs libsqlite3-dev).
# Clone
git clone git@github.com:TalkBank/chatter.git
cd chatter
# Build the whole workspace
cargo build --workspace --all-targets --locked
# Run the full workspace test suite
cargo test --workspace --locked
# Try the chatter binary
./target/debug/chatter --help
./target/debug/chatter validate path/to/file.chaBuild the docs locally:
just book-install-tools
just book
# Output at book/build/html/See CONTRIBUTING.md for the full development
setup and the coding conventions.
┌────────────────────────────────────────────────┐
│ spec/ → grammar/ → talkbank-parser │ CHAT format
│ (source of truth) (tree-sitter, LSP-friendly) │ authority
└─────────────────────────┬──────────────────────┘
▼
┌────────────────────────────────────────────────┐
│ talkbank-model + talkbank-derive │ Data model
│ (validation rules, error codes, alignment) │ + lints
└─────────────────────────┬──────────────────────┘
▼
┌────────────────────────────────────────────────┐
│ talkbank-transform + talkbank-cache │ Pipelines
│ (CHAT↔JSON, normalize, merge, redact) │
└─────────────────────────┬──────────────────────┘
▼
┌────────────────┬───────────┴───────┐
▼ ▼ ▼
chatter talkbank-lsp chatter-desktop
(chatter) (lsp binary) (Tauri app)
Detailed architecture documentation: the architecture overview.
- Rust (pinned to a specific stable in
rust-toolchain.toml, currently1.96.0; edition 2024; no MSRV declared until crates.io publication) - Node 20+ for the desktop app's web front end
- SQLite dev headers (macOS bundled; Linux
libsqlite3-dev)
Dual-licensed under your choice of:
Unless explicitly stated otherwise, any contribution intentionally submitted for inclusion in this work shall be dual-licensed as above, without any additional terms or conditions.
batchalign3: the upstream neural ML pipeline (ASR, forced alignment, neural morphotag); the two projects coordinate at the CHAT-format boundary- TalkBank: the parent project; CHAT format is the data substrate for every TalkBank corpus
- CHILDES, AphasiaBank, FluencyBank, and the other TalkBank banks