K256/type safe field elements#1818
Conversation
This promotes the debug-only FieldElementImpl (magnitude + normalized tracking wrapper) into a new always-available LazyFieldElement type. LazyFieldElement wraps the raw backend (FieldElement5x52 or FieldElement10x26) and tracks: - : the raw limb representation (possibly >= p, weakly reduced) - : number of uncancelled additions (bounded to MAX_MAGNITUDE) All arithmetic ops (add, sub, mul, square, negate, mul_single, double) produce lazy results: magnitude is set to 1 (weak normalization), but no full modular reduction is performed. This avoids the expensive normalize() call between every operation in point arithmetic inner loops. The is_odd() method on LazyFieldElement requires the element to be normalized (magnitude == 1), enforced by a debug_assert in debug builds. In release builds, calling is_odd() on an unnormalized element would produce incorrect results — exactly the bug class this refactor addresses by making the type system enforce normalization at boundary crossings. This is the first step toward type-safe field elements per issue RustCrypto#531.
…backend FieldElement now wraps LazyFieldElement (which always tracks magnitude) instead of wrapping the raw field backend directly. This makes magnitude tracking available in both debug and release builds. Key changes: - FieldElement(LazyFieldElement) replaces FieldElement(Inner) - All arithmetic ops delegate to LazyFieldElement - normalize()/normalize_weak() actually normalize (not no-ops) - to_bytes() normalizes before encoding - Retrieve::retrieve() normalizes before to_u256() - Remove unused max_magnitude() from raw backends - Remove unused Inner/MAX_MAGNITUDE re-exports from field.rs The old field_impl.rs is superseded by lazy.rs and is no longer compiled.
The previous commit wrapped FieldElement around LazyFieldElement but the trait impls (Add/Sub/Mul/Neg) and the inherent add/mul/square/negate/mul_single/double methods still returned a LazyFieldElement without normalizing. Callers reaching for the result through ff::Field or core::ops could therefore observe values outside the canonical range [0, p), exactly the class of bug that motivated the type-safe-field-elements branch (a previous lack of normalization broke RustCrypto#1522's use of BatchInvert and other generic ff-based algorithms). This commit: - Normalizes the result of every FieldElement arithmetic trait impl and every public inherent arithmetic method, so any caller that obtains a FieldElement gets the documented contract for free. - Adds an arithmetic_ops_return_normalized regression test that exercises the core::ops, ff::Field, and inherent-magnitude APIs end-to-end and confirms the result is canonical. - Cleans up pre-existing clippy warnings in the same crate (wrong_self_convention on LazyFieldElement::to_bytes/to_u256, op_ref in scalar tests, useless_vec in the field batch_invert test, clone_on_copy in the schnorr bench, and a now-redundant #[allow(clippy::op_ref)] on try_generate_biased_from_rng). All 91 unit tests, 8 integration tests, and 3 doc tests pass. cargo clippy --all-targets --no-deps is clean.
- x448: 'assert!(low_order == false)' to 'assert!(!low_order)' - ed448-goldilocks: drop redundant closures around 'X::generate' in benches - p256: 'GENERATOR * &x' to 'GENERATOR * x' in tables proptests Keeps 'cargo clippy --workspace --all-targets --no-deps' clean.
Completes the type-safe-field-elements refactor by replacing the raw
backend storage in AffinePoint (and the matching plumbing through
ProjectivePoint) with LazyFieldElement, the always-available wrapper
introduced in the prior commits.
- arithmetic.rs: re-export LazyFieldElement alongside FieldElement.
- arithmetic/affine.rs: AffinePoint.x/y are now LazyFieldElement; add
new_lazy/from_field_elements constructors so hash2curve and other
callers that compute in the public, always-normalized FieldElement
can hand the result to the point-arithmetic internals without
leaking the raw backend.
- arithmetic/field/lazy.rs: extract LazyFieldElement into a public
type with magnitude tracking and weak normalization. Arithmetic ops
produce lazy results (carries propagated, magnitude = 1, but value
may still be >= p); .normalize() yields a fully-reduced FieldElement.
- arithmetic/projective.rs: ProjectivePoint arithmetic now operates
on LazyFieldElement, preserving the existing magnitude-bound
invariants and the inner-loop performance path.
- schnorr/{signing,verifying}: route x/y extraction through
FieldElement::from(...) so callers see a fully reduced value.
All 91 unit tests, 8 integration tests, and the doctests pass.
The wrapper layer added in b6823fe incurred three categories of overhead vs master on every field operation: 1. Redundant normalize passes: - FieldElement::to_bytes called self.normalize().0.to_bytes() — the normalize() is provably redundant by type invariant (FieldElement is always normalized). - LazyFieldElement::to_bytes then called self.value.normalize() again on the already-normalized inner, for two full modular reductions per serialization. - LazyFieldElement::mul / square wrapped their result in Self::new_weak_normalized, which re-runs Inner::normalize_weak even though Inner::mul / Inner::square already propagate carries. 2. Wrapper methods were not #[inline], so the compiler couldn't see through the FieldElement -> LazyFieldElement -> Inner layers. 3. Two struct constructions per FieldElement op (one for the inner op result, one for the post-normalize) added measurable struct-copy cost. Fixes: - Drop the redundant normalize() in both to_bytes layers and the redundant normalize_weak in mul / square. - Add #[inline] to all hot wrapper methods (is_zero, is_odd, normalize, normalize_weak, mul, square, add, negate, mul_single, double, to_bytes, from_bytes, new, etc.) in both field.rs and lazy.rs. Criterion bench vs master (k256, --measurement-time 3, n=30): field/normalize_weak +5.7% field/normalize -6.7% (better than master) field/mul +38.4% (was +78.6%) field/square +54.2% (was +102.6%) field/invert +3.3% (was +18.2%) field/sqrt +3.4% (was +66.5%) field/to_bytes -50.0% (was +88.6%, now 2x faster) field/from_bytes +1.7% (was +23.1%) point-scalar mul +5.4% (was +33.4%) point-scalar mul (vt) +0.0% mul_by_generator naive +9.7% (was +26.4%) mul_by_generator pre +0.4% mul_by_generator vartime -3.0% (better than master) lincomb (unoptimized) +6.4% lincomb (optimized) +7.1% lincomb_vartime -5.8% (better than master) ecdsa/try_sign -0.3% (at parity, was +32.6%) ecdsa/verify +1.6% (at parity, was +33.4%) schnorr/keygen +1.7% (was +15.8%) schnorr/sign -3.5% (at parity, was +12.1%) schnorr/verify -4.9% (better than master, was +12.9%) All 91 unit + 8 integration + 3 doctests pass. Remaining hotspots are field/mul (+38%) and field/square (+54%), which appear to be the irreducible cost of the struct-construction layer in the hot loop. Further wins likely require #[repr(transparent)] on the wrappers or specializing the public FieldElement ops to skip the post-op normalize when the caller is known to be running in a constant-magnitude context.
Two further perf wins on top of fd2b92f: 1. #[repr(transparent)] on FieldElement(LazyFieldElement). FieldElement has a single field, so the wrapper can be collapsed by LLVM, removing one layer of struct construction around every public op. 2. New Inner::normalize_canonical / LazyFieldElement::normalize_canonical that perform ONLY the final-subtract step (overflow check + subtract p + conditional select). The default normalize() = normalize_weak() + normalize_canonical(), where normalize_weak() propagates carries through the limbs. After Inner::mul / Inner::square, the value is ALREADY weakly normalized (the backend mul does carry propagation as part of the operation). So calling full normalize() at the boundary repeats the carry propagation. Using normalize_canonical saves one pass of limb arithmetic per mul/square. The debug assertion on the input magnitude catches any future caller that violates the precondition. FieldElement::normalize(), FieldElement::mul(), FieldElement::square(), and the Mul op impls now use normalize_canonical. FieldElement::double still uses the full normalize() because LazyFieldElement::add() produces a magnitude=2 result that needs the carry-propagation pass. Criterion bench vs master (k256, --measurement-time 5, n=50): field/normalize_weak +0.4% (noise) field/normalize -51.9% (was -10.8% — now 2x faster) field/mul +21.8% (was +35.4%) field/square +26.1% (was +50.4%) field/invert -2.7% (better than master) field/sqrt -10.1% (now better than master) field/to_bytes -50.4% field/from_bytes -5.1% (better than master) point-scalar mul +3.4% point-scalar mul (vt) -2.7% (better than master) mul_by_generator naive +4.5% mul_by_generator pre -3.2% (better than master) mul_by_generator_vartime -5.3% (better than master) lincomb (unoptimized) +3.9% lincomb (optimized) +3.8% lincomb_vartime -5.4% (better than master) scalar/sub -7.3% scalar/add -7.4% scalar/mul -5.3% scalar/negate -5.5% scalar/invert -4.8% All 91 unit + 8 integration + 3 doctests pass. Remaining hot regression is field/mul (+22%) and field/square (+26%), the irreducible cost of struct construction in the LazyFieldElement layer (which has two fields and so cannot itself be #[repr(transparent)]). The wrapper adds ~5ns per FieldElement::mul over master's direct call; further wins would require collapsing LazyFieldElement into FieldElement entirely (sacrificing the magnitude-tracking type safety) or specializing the public FieldElement ops to take already-normalized LazyFieldElement directly.
…anonical guard test - F1: MAX_MAGNITUDE is now backend-specific via cpubits! (31 for 32-bit, 2047 for 64-bit), matching the actual limb-width overflow bound in negate(). - F2: add debug-mode test verifying that normalize_canonical() panics on magnitude > 1 input, confirming the precondition guard fires during testing. - F2: add positive-control test that normalize_canonical() on a properly weakly-normalized (mul) input produces the same canonical value as normalize(). - hash2curve: use AffinePoint::from_field_elements instead of raw struct construction (hygiene).
The FieldElement type invariant guarantees BatchInvert returns canonical elements. The defensive .normalize() calls masked whether the invariant actually held end-to-end — removing them proves the type system catches what master's RustCrypto#1522 needed a workaround for. Tests pass in both debug and release modes.
…e fuzzer Addresses three findings from the security audit: 1. normalize_canonical release-mode trust (F2): The debug_assert(magnitude == 1) is now backed by a cold fallback to full normalize() in release builds. The branch is perfectly predictable (magnitude == 1 always holds when invariants are met) so the runtime cost is zero. 2. LazyFieldElement ct_eq footgun (F3): LazyFieldElement is now pub(crate) by default and only re-exported publicly behind expose-field. This matches how FieldElement is already gated and prevents external callers from accidentally relying on representation-equality semantics. 3. Point arithmetic magnitude fuzzer: Added magnitude_bounds_randomized test that smashes 12 random projective/affine points through all combinations of add, add_mixed, and double, verifying results against a reference implementation that normalizes every step. The debug_assert guards in LazyFieldElement serve as the invariant checker. All 94 tests pass (debug + release), fmt clean, clippy clean.
|
I think this kind of PR needs to be done both with great care and with an eye towards a general abstraction that can work with all curves: RustCrypto/traits#1997 I can leave some notes on that issue. |
|
Understood @tarcieri . From the comments on #1997 it seems you would prefer the "lazy" FieldElements not have a separate type, but be implemented as traits (with no-op for fields that don't need them) ? That is a very different approach from the one taken here, I will close this PR down, and look in to the other approach. I hope these draft PRs are not a burden to review, my reasoning is that sometimes it's easier to accept/reject an implementation when you can see the code behind it. |
|
@42Pupusas they would ultimately be separate types which impl a system of traits supporting lazy normalization. For example, in But this all needs to be carefully designed around a set of cross-crate abstractions that ideally would make it possible to implement algorithms that support lazy normalization but still use them with eagerly reduced field elements, so a single algorithm implementation works with both |
This is the "oneshot" version of #531 and #1522 and would address the concerns on #1703. Probably unmergable as a monolith, but happy to break it up in to logical bits if the general strategy is agreed upon. Old tests passing with what should only be mechanical changes to the callsites, and tests have been added to catch all the concerns addressed in the issues.
Lots of churn across the files because the type is everywhere, but the main points to review would be:
field.rs:555whereFrom<LazyFieldElement>is the sole conversion gate.Every
FieldElementarithmetic op (Mul,Add,Neg,Sub, andDouble) normalizes the result. Verify no escape hatch exists that could produce an unnormalizedFieldElement.field.rs:402:PartialEqdelegates toLazyFieldElement::ct_eq, which compares raw limb representations. Safe only becauseFieldElementis always canonical. The risk is a future code change that constructs aFieldElementwith unpropagated carries equality would silently break.field/lazy.rs:185normalize_canonical has a cold fallback tonormalize()Perf note: ~5–13% regression on verify (the lincomb/Straus path hits the FieldElement::from(lazy) boundary). Still looking in to a workaround.