Fix global-buffer-overflow in Base64Decode on bytes >= 0x80#7200
Fix global-buffer-overflow in Base64Decode on bytes >= 0x80#7200groeneai wants to merge 1 commit into
Conversation
|
Thank you for your contribution @groeneai! We will review the pull request and get back to you soon. |
There was a problem hiding this comment.
Pull request overview
Fixes a signed-char indexing bug in azure-core Base64 decoding that could trigger an out-of-bounds read when decoding bytes with the high bit set (>= 0x80). The change ensures bytes are treated as unsigned char before indexing the 256-entry reverse lookup table, and extends unit tests to cover these malformed inputs.
Changes:
- Cast Base64 input bytes to
unsigned charbefore indexingBase64DecodeArrayin both the 4-byte decode helper and the trailing-group decode path. - Extend
Base64.InvalidDecodeto include high-bit bytes in each position for the tail path and in the loop path, verifying cleanstd::runtime_errorbehavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| sdk/core/azure-core/src/base64.cpp | Prevents negative indexing into Base64DecodeArray by reading input bytes as unsigned before lookup. |
| sdk/core/azure-core/test/ut/base64_test.cpp | Adds invalid-decode coverage for bytes >= 0x80 across both decode paths to guard against regressions and ASan-detected OOB reads. |
Base64DecodeArray is an int8_t[256] table indexed by the input byte; invalid bytes map to a -1 sentinel that the existing `< 0` checks reject. Two defects allowed a malformed Base64 string to misbehave before those checks ran: 1. Out-of-bounds read. The input bytes were read through `const char*`, which is signed on most platforms, so any byte >= 0x80 sign-extended to a negative index and read out of bounds of the global table (AddressSanitizer: global-buffer-overflow, READ of size 1). Fixed by reading each input byte as unsigned char before indexing, so the index is always in [0, 255]. 2. Undefined left shift of a negative value. After an invalid byte mapped to the -1 sentinel, the decoder executed `i0 <<= 18` (and the i1/i2 shifts) before checking for the sentinel. Left-shifting a negative value is undefined behavior prior to C++20; builds with -fsanitize=undefined -fno-sanitize-recover=all abort on it. Fixed by rejecting the -1 sentinel before any shift, in both the 4-byte helper int32_t Base64Decode(const char*) and the trailing-group path of std::vector<uint8_t> Base64Decode(const std::string&). Malformed input now throws "Unexpected character in Base64 encoded string" cleanly on every code path. Strengthen Base64.InvalidDecode to exercise an invalid byte (high-bit 0x80, 0xC0, 0xFF and low-value 0x01, 0x7F) in every position of both the single-group tail path and the multi-group loop path. Pre-fix these inputs read out of bounds under ASan and abort with "left shift of negative value" under UBSan; post-fix they throw cleanly, and valid Base64 still decodes unchanged.
460bed4 to
0a3457c
Compare
…efined shift The vendored Azure SDK Base64Decode mishandles invalid Base64 input in two ways, both reachable in ClickHouse from azureBlobStorage() with a malformed account_key (StorageSharedKeyCredential -> SharedKeyPolicy::GetSignature -> Convert::Base64Decode(GetAccountKey())). It surfaced under the stress-test query fuzzer and recurs across unrelated PRs; it is a latent SDK bug, not PR-specific. 1. Out-of-bounds read. Input bytes were read through 'const char*' and used to index Base64DecodeArray (int8_t[256]). On platforms where char is signed, any byte >= 0x80 sign-extended to a negative index and read out of bounds of the global table (AddressSanitizer: global-buffer-overflow, READ of size 1). 2. Undefined left shift. After an invalid byte mapped to the table's -1 sentinel, the decoder shifted it (i0 <<= 18, etc.) before checking for the sentinel. Left-shifting a negative value is undefined behavior prior to C++20; builds with -fsanitize=undefined -fno-sanitize-recover=all abort on it. Bump contrib/azure to the fix commit, which reads each input byte as unsigned char before indexing and rejects the -1 sentinel before any shift, so malformed input throws cleanly on every code path. A strengthened Base64.InvalidDecode test exercises an invalid byte in every position of both decode paths. The same fix is submitted to the vendored fork (ClickHouse/azure-sdk-for-cpp#36) and to Microsoft upstream (Azure/azure-sdk-for-cpp#7200).
|
@groeneai please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
Bug
sdk/core/azure-core/src/base64.cppmishandles invalid Base64 input in two ways, both before the existing validity check runs:const char*and used to indexBase64DecodeArray, anint8_t[256]reverse-lookup table.charis signed on most platforms, so any byte>= 0x80sign-extends to a negative index and reads out of bounds of the global table. AddressSanitizer reportsglobal-buffer-overflow, READ of size 1.-1sentinel, but the decoder shifts the looked-up value (i0 <<= 18, etc.) before checking for the sentinel. Left-shifting a negative value is undefined behavior prior to C++20 (the standard this library targets);-fsanitize=undefinedreportsleft shift of negative value.Both defects are present in two places:
int32_t Base64Decode(const char*)std::vector<uint8_t> Base64Decode(const std::string&)Fix
unsigned charbefore indexing, so the index is always in[0, 255].-1sentinel before any shift, in both decode paths.Malformed input now throws
std::runtime_error("Unexpected character in Base64 encoded string")cleanly on every code path, instead of reading out of bounds or shifting a negative value. Valid Base64 decodes unchanged.Test
Extended
Base64.InvalidDecodeto exercise an invalid byte (high-bit0x80/0xC0/0xFFand low-value0x01/0x7F) in every position of both the single-group tail path and the multi-group loop path.Validated with AddressSanitizer + UndefinedBehaviorSanitizer (clang,
-fno-sanitize-recover=all): pre-fix the inputs read out of bounds and abort on the shift under C++17; post-fix they throw cleanly under C++17/20/23 and all existing valid/invalid decode cases still pass.