Skip to content

K256/type safe field elements#1818

Closed
42Pupusas wants to merge 11 commits into
RustCrypto:masterfrom
42Pupusas:k256/type-safe-field-elements
Closed

K256/type safe field elements#1818
42Pupusas wants to merge 11 commits into
RustCrypto:masterfrom
42Pupusas:k256/type-safe-field-elements

Conversation

@42Pupusas

Copy link
Copy Markdown
Contributor

This is the "oneshot" version of #531 and #1522 and would address the concerns on #1703. Probably unmergable as a monolith, but happy to break it up in to logical bits if the general strategy is agreed upon. Old tests passing with what should only be mechanical changes to the callsites, and tests have been added to catch all the concerns addressed in the issues.

Lots of churn across the files because the type is everywhere, but the main points to review would be:

  • field.rs:555 where From<LazyFieldElement> is the sole conversion gate.

  • Every FieldElement arithmetic op (Mul , Add, Neg, Sub, and Double) normalizes the result. Verify no escape hatch exists that could produce an unnormalized FieldElement.

  • field.rs:402 : PartialEq delegates to LazyFieldElement::ct_eq, which compares raw limb representations. Safe only because FieldElement is always canonical. The risk is a future code change that constructs a FieldElement with unpropagated carries equality would silently break.

  • field/lazy.rs:185 normalize_canonical has a cold fallback to normalize()

  • Perf note: ~5–13% regression on verify (the lincomb/Straus path hits the FieldElement::from(lazy) boundary). Still looking in to a workaround.

42Pupusas added 11 commits June 18, 2026 17:15
This promotes the debug-only FieldElementImpl (magnitude + normalized
tracking wrapper) into a new always-available LazyFieldElement type.

LazyFieldElement wraps the raw backend (FieldElement5x52 or FieldElement10x26)
and tracks:
- : the raw limb representation (possibly >= p, weakly reduced)
- : number of uncancelled additions (bounded to MAX_MAGNITUDE)

All arithmetic ops (add, sub, mul, square, negate, mul_single, double)
produce lazy results: magnitude is set to 1 (weak normalization), but no
full modular reduction is performed. This avoids the expensive normalize()
call between every operation in point arithmetic inner loops.

The is_odd() method on LazyFieldElement requires the element to be
normalized (magnitude == 1), enforced by a debug_assert in debug builds.
In release builds, calling is_odd() on an unnormalized element would
produce incorrect results — exactly the bug class this refactor addresses
by making the type system enforce normalization at boundary crossings.

This is the first step toward type-safe field elements per issue RustCrypto#531.
…backend

FieldElement now wraps LazyFieldElement (which always tracks magnitude)
instead of wrapping the raw field backend directly. This makes magnitude
tracking available in both debug and release builds.

Key changes:
- FieldElement(LazyFieldElement) replaces FieldElement(Inner)
- All arithmetic ops delegate to LazyFieldElement
- normalize()/normalize_weak() actually normalize (not no-ops)
- to_bytes() normalizes before encoding
- Retrieve::retrieve() normalizes before to_u256()
- Remove unused max_magnitude() from raw backends
- Remove unused Inner/MAX_MAGNITUDE re-exports from field.rs

The old field_impl.rs is superseded by lazy.rs and is no longer compiled.
The previous commit wrapped FieldElement around LazyFieldElement but
the trait impls (Add/Sub/Mul/Neg) and the inherent
add/mul/square/negate/mul_single/double methods still returned a
LazyFieldElement without normalizing. Callers reaching for the result
through ff::Field or core::ops could therefore observe values outside
the canonical range [0, p), exactly the class of bug that motivated
the type-safe-field-elements branch (a previous lack of normalization
broke RustCrypto#1522's use of BatchInvert and other generic ff-based algorithms).

This commit:

- Normalizes the result of every FieldElement arithmetic trait impl
  and every public inherent arithmetic method, so any caller that
  obtains a FieldElement gets the documented contract for free.

- Adds an arithmetic_ops_return_normalized regression test that
  exercises the core::ops, ff::Field, and inherent-magnitude APIs
  end-to-end and confirms the result is canonical.

- Cleans up pre-existing clippy warnings in the same crate
  (wrong_self_convention on LazyFieldElement::to_bytes/to_u256,
  op_ref in scalar tests, useless_vec in the field batch_invert
  test, clone_on_copy in the schnorr bench, and a now-redundant
  #[allow(clippy::op_ref)] on try_generate_biased_from_rng).

All 91 unit tests, 8 integration tests, and 3 doc tests pass.
cargo clippy --all-targets --no-deps is clean.
- x448: 'assert!(low_order == false)' to 'assert!(!low_order)'
- ed448-goldilocks: drop redundant closures around 'X::generate' in benches
- p256: 'GENERATOR * &x' to 'GENERATOR * x' in tables proptests

Keeps 'cargo clippy --workspace --all-targets --no-deps' clean.
Completes the type-safe-field-elements refactor by replacing the raw
backend storage in AffinePoint (and the matching plumbing through
ProjectivePoint) with LazyFieldElement, the always-available wrapper
introduced in the prior commits.

- arithmetic.rs: re-export LazyFieldElement alongside FieldElement.
- arithmetic/affine.rs: AffinePoint.x/y are now LazyFieldElement; add
  new_lazy/from_field_elements constructors so hash2curve and other
  callers that compute in the public, always-normalized FieldElement
  can hand the result to the point-arithmetic internals without
  leaking the raw backend.
- arithmetic/field/lazy.rs: extract LazyFieldElement into a public
  type with magnitude tracking and weak normalization. Arithmetic ops
  produce lazy results (carries propagated, magnitude = 1, but value
  may still be >= p); .normalize() yields a fully-reduced FieldElement.
- arithmetic/projective.rs: ProjectivePoint arithmetic now operates
  on LazyFieldElement, preserving the existing magnitude-bound
  invariants and the inner-loop performance path.
- schnorr/{signing,verifying}: route x/y extraction through
  FieldElement::from(...) so callers see a fully reduced value.

All 91 unit tests, 8 integration tests, and the doctests pass.
The wrapper layer added in b6823fe incurred three categories of overhead
vs master on every field operation:

1. Redundant normalize passes:
   - FieldElement::to_bytes called self.normalize().0.to_bytes() — the
     normalize() is provably redundant by type invariant (FieldElement is
     always normalized).
   - LazyFieldElement::to_bytes then called self.value.normalize() again
     on the already-normalized inner, for two full modular reductions per
     serialization.
   - LazyFieldElement::mul / square wrapped their result in
     Self::new_weak_normalized, which re-runs Inner::normalize_weak even
     though Inner::mul / Inner::square already propagate carries.

2. Wrapper methods were not #[inline], so the compiler couldn't see
   through the FieldElement -> LazyFieldElement -> Inner layers.

3. Two struct constructions per FieldElement op (one for the inner op
   result, one for the post-normalize) added measurable struct-copy cost.

Fixes:
- Drop the redundant normalize() in both to_bytes layers and the
  redundant normalize_weak in mul / square.
- Add #[inline] to all hot wrapper methods (is_zero, is_odd, normalize,
  normalize_weak, mul, square, add, negate, mul_single, double, to_bytes,
  from_bytes, new, etc.) in both field.rs and lazy.rs.

Criterion bench vs master (k256, --measurement-time 3, n=30):

  field/normalize_weak      +5.7%
  field/normalize           -6.7%   (better than master)
  field/mul                +38.4%   (was +78.6%)
  field/square             +54.2%   (was +102.6%)
  field/invert              +3.3%   (was +18.2%)
  field/sqrt                +3.4%   (was +66.5%)
  field/to_bytes          -50.0%   (was +88.6%, now 2x faster)
  field/from_bytes          +1.7%   (was +23.1%)
  point-scalar mul          +5.4%   (was +33.4%)
  point-scalar mul (vt)     +0.0%
  mul_by_generator naive    +9.7%   (was +26.4%)
  mul_by_generator pre      +0.4%
  mul_by_generator vartime -3.0%   (better than master)
  lincomb (unoptimized)     +6.4%
  lincomb (optimized)       +7.1%
  lincomb_vartime          -5.8%   (better than master)
  ecdsa/try_sign           -0.3%   (at parity, was +32.6%)
  ecdsa/verify              +1.6%   (at parity, was +33.4%)
  schnorr/keygen            +1.7%   (was +15.8%)
  schnorr/sign             -3.5%   (at parity, was +12.1%)
  schnorr/verify           -4.9%   (better than master, was +12.9%)

All 91 unit + 8 integration + 3 doctests pass.

Remaining hotspots are field/mul (+38%) and field/square (+54%), which
appear to be the irreducible cost of the struct-construction layer in
the hot loop. Further wins likely require #[repr(transparent)] on the
wrappers or specializing the public FieldElement ops to skip the
post-op normalize when the caller is known to be running in a
constant-magnitude context.
Two further perf wins on top of fd2b92f:

1. #[repr(transparent)] on FieldElement(LazyFieldElement). FieldElement
   has a single field, so the wrapper can be collapsed by LLVM, removing
   one layer of struct construction around every public op.

2. New Inner::normalize_canonical / LazyFieldElement::normalize_canonical
   that perform ONLY the final-subtract step (overflow check + subtract
   p + conditional select). The default normalize() = normalize_weak()
   + normalize_canonical(), where normalize_weak() propagates carries
   through the limbs.

   After Inner::mul / Inner::square, the value is ALREADY weakly
   normalized (the backend mul does carry propagation as part of the
   operation). So calling full normalize() at the boundary repeats the
   carry propagation. Using normalize_canonical saves one pass of limb
   arithmetic per mul/square.

   The debug assertion on the input magnitude catches any future caller
   that violates the precondition.

   FieldElement::normalize(), FieldElement::mul(), FieldElement::square(),
   and the Mul op impls now use normalize_canonical. FieldElement::double
   still uses the full normalize() because LazyFieldElement::add()
   produces a magnitude=2 result that needs the carry-propagation pass.

Criterion bench vs master (k256, --measurement-time 5, n=50):

  field/normalize_weak      +0.4%   (noise)
  field/normalize         -51.9%   (was -10.8% — now 2x faster)
  field/mul               +21.8%   (was +35.4%)
  field/square            +26.1%   (was +50.4%)
  field/invert             -2.7%   (better than master)
  field/sqrt              -10.1%   (now better than master)
  field/to_bytes          -50.4%
  field/from_bytes         -5.1%   (better than master)
  point-scalar mul         +3.4%
  point-scalar mul (vt)    -2.7%   (better than master)
  mul_by_generator naive   +4.5%
  mul_by_generator pre     -3.2%   (better than master)
  mul_by_generator_vartime -5.3%   (better than master)
  lincomb (unoptimized)    +3.9%
  lincomb (optimized)      +3.8%
  lincomb_vartime          -5.4%   (better than master)
  scalar/sub               -7.3%
  scalar/add               -7.4%
  scalar/mul               -5.3%
  scalar/negate            -5.5%
  scalar/invert            -4.8%

All 91 unit + 8 integration + 3 doctests pass.

Remaining hot regression is field/mul (+22%) and field/square (+26%),
the irreducible cost of struct construction in the LazyFieldElement
layer (which has two fields and so cannot itself be #[repr(transparent)]).
The wrapper adds ~5ns per FieldElement::mul over master's direct call;
further wins would require collapsing LazyFieldElement into FieldElement
entirely (sacrificing the magnitude-tracking type safety) or
specializing the public FieldElement ops to take already-normalized
LazyFieldElement directly.
…anonical guard test

- F1: MAX_MAGNITUDE is now backend-specific via cpubits!
  (31 for 32-bit, 2047 for 64-bit), matching the actual
  limb-width overflow bound in negate().
- F2: add debug-mode test verifying that normalize_canonical()
  panics on magnitude > 1 input, confirming the precondition
  guard fires during testing.
- F2: add positive-control test that normalize_canonical() on
  a properly weakly-normalized (mul) input produces the same
  canonical value as normalize().
- hash2curve: use AffinePoint::from_field_elements instead of
  raw struct construction (hygiene).
The FieldElement type invariant guarantees BatchInvert returns canonical
elements. The defensive .normalize() calls masked whether the invariant
actually held end-to-end — removing them proves the type system catches
what master's RustCrypto#1522 needed a workaround for.

Tests pass in both debug and release modes.
…e fuzzer

Addresses three findings from the security audit:

1. normalize_canonical release-mode trust (F2):
   The debug_assert(magnitude == 1) is now backed by a cold fallback
   to full normalize() in release builds.  The branch is perfectly
   predictable (magnitude == 1 always holds when invariants are met)
   so the runtime cost is zero.

2. LazyFieldElement ct_eq footgun (F3):
   LazyFieldElement is now pub(crate) by default and only re-exported
   publicly behind expose-field.  This matches how FieldElement is
   already gated and prevents external callers from accidentally
   relying on representation-equality semantics.

3. Point arithmetic magnitude fuzzer:
   Added magnitude_bounds_randomized test that smashes 12 random
   projective/affine points through all combinations of add,
   add_mixed, and double, verifying results against a reference
   implementation that normalizes every step.  The debug_assert
   guards in LazyFieldElement serve as the invariant checker.

All 94 tests pass (debug + release), fmt clean, clippy clean.
@tarcieri

Copy link
Copy Markdown
Member

I think this kind of PR needs to be done both with great care and with an eye towards a general abstraction that can work with all curves: RustCrypto/traits#1997

I can leave some notes on that issue.

@42Pupusas

Copy link
Copy Markdown
Contributor Author

Understood @tarcieri .

From the comments on #1997 it seems you would prefer the "lazy" FieldElements not have a separate type, but be implemented as traits (with no-op for fields that don't need them) ?

That is a very different approach from the one taken here, I will close this PR down, and look in to the other approach.

I hope these draft PRs are not a burden to review, my reasoning is that sometimes it's easier to accept/reject an implementation when you can see the code behind it.

@42Pupusas 42Pupusas closed this Jun 22, 2026
@tarcieri

Copy link
Copy Markdown
Member

@42Pupusas they would ultimately be separate types which impl a system of traits supporting lazy normalization.

For example, in p521 it's called LooseFieldElement.

But this all needs to be carefully designed around a set of cross-crate abstractions that ideally would make it possible to implement algorithms that support lazy normalization but still use them with eagerly reduced field elements, so a single algorithm implementation works with both

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants