Skip to content

Delete more leftover UTF-16 code from compiler#128887

Merged
MichalStrehovsky merged 2 commits into
dotnet:mainfrom
MichalStrehovsky:utf8del
Jun 3, 2026
Merged

Delete more leftover UTF-16 code from compiler#128887
MichalStrehovsky merged 2 commits into
dotnet:mainfrom
MichalStrehovsky:utf8del

Conversation

@MichalStrehovsky

Copy link
Copy Markdown
Member

No description provided.

Copilot AI review requested due to automatic review settings June 2, 2026 09:41

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates NativeAOT’s compiler name mangling to further move string-literal mangling onto Utf8String (removing more UTF-16-centric code paths), and adds a NativeAOT smoke test to validate correct handling of isolated UTF-16 surrogate literals.

Changes:

  • Add a new NativeAOT smoke test (MiscTests) that validates string literals containing isolated surrogates are preserved as single char code units.
  • Change NameMangler.GetMangledStringName(string) to return Utf8String and update call sites (e.g., NodeFactory.ConstantUtf8String) accordingly.
  • Adjust NativeAotNameMangler string-literal hashing to be primarily UTF-8-based, with a special-case for literals whose UTF-8 encoding contains U+FFFD replacement bytes.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/tests/nativeaot/SmokeTests/UnitTests/UnitTests.csproj Adds the new MiscTests.cs compile item to the NativeAOT smoke tests project.
src/tests/nativeaot/SmokeTests/UnitTests/MiscTests.cs New test validating isolated surrogate string literal preservation.
src/tests/nativeaot/SmokeTests/UnitTests/Main.cs Wires MiscTests.Run into the smoke test runner.
src/coreclr/tools/Common/Compiler/NativeAotNameMangler.cs Switches string-literal mangling cache/value to Utf8String, removes old UTF-16 helper, adds replacement-char detection + hashing adjustments.
src/coreclr/tools/Common/Compiler/NameMangler.cs Updates the abstract API to return Utf8String for mangled string names.
src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/NodeFactory.cs Updates UTF-8 string symbol naming to use the new Utf8String-returning mangler API.
Comments suppressed due to low confidence (1)

src/coreclr/tools/Common/Compiler/NativeAotNameMangler.cs:632

  • The surrogate-special-case hash now uses MemoryMarshal.AsBytes(literal.AsSpan()), which hashes the in-memory UTF-16 byte order. The removed GetBytesFromString previously forced little-endian order, so this change can produce different mangled names on big-endian hosts. If big-endian builds are still supported, preserve a stable (little-endian) byte order when hashing UTF-16 data.
            lock (this)
            {
                _mangledStringLiterals.TryAdd(literal, mangledName);
            }

Comment thread src/coreclr/tools/Common/Compiler/NativeAotNameMangler.cs
@MichalStrehovsky MichalStrehovsky merged commit 5341027 into dotnet:main Jun 3, 2026
117 of 119 checks passed
@MichalStrehovsky MichalStrehovsky deleted the utf8del branch June 3, 2026 00:08
@dotnet-milestone-bot dotnet-milestone-bot Bot added this to the 11.0-preview6 milestone Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants