6

I'm using the Stringzilla string library, which is supposed to be close to how std::string operates. It has a flag called SZ_USE_MISALIGNED_LOADS. By default it sets it to on, and it looks like this:

/**
 *  @brief  A misaligned load can be - trying to fetch eight consecutive bytes from an address
 *          that is not divisible by eight. On x86 enabled by default. On ARM it's not.
 *
 *  Most platforms support it, but there is no industry standard way to check for those.
 *  This value will mostly affect the performance of the serial (SWAR) backend.
 */
#ifndef SZ_USE_MISALIGNED_LOADS
#if defined(__x86_64__) || defined(_M_X64) || defined(__i386__) || defined(_M_IX86)
#define SZ_USE_MISALIGNED_LOADS (1) // true or false
#else
#define SZ_USE_MISALIGNED_LOADS (0) // true or false
#endif
#endif

And so I understand that on x86_64 (my platform) that unaligned loads are supported. I turned on UBSAN, and I get a warning for unaligned reads. It's when I construct a string with a char*. The stringzilla class creates a string view from my char* and then does:

void init(string_view other) noexcept(false) 
{
// "other" is the string_view I passed in the constructor
        sz_ptr_t start; // after allocating memory start = 0x7bfff5900029
        if (!_with_alloc(
                [&](sz_alloc_type &alloc) { return (start = sz_string_init_length(&string_, other.size(), &alloc)); }))

            throw std::bad_alloc();
        sz_copy(start, (sz_cptr_t)other.data(), other.size());
    }

Then inside sz_copy it looks like this:

SZ_PUBLIC void sz_copy_serial(sz_ptr_t target, sz_cptr_t source, sz_size_t length) {
    // The most typical implementation of `memcpy` suffers from Undefined Behavior:
    //
    //   for (char const *end = source + length; source < end; source++) *target++ = *source;
    //
    // As NULL pointer arithmetic is undefined for calls like `memcpy(NULL, NULL, 0)`.
    // That's mitigated in C2y with the N3322 proposal, but our solution uses a design, that has no such issues.
    // https://developers.redhat.com/articles/2024/12/11/making-memcpynull-null-0-well-defined
#if SZ_USE_MISALIGNED_LOADS
    while (length >= 8) *(sz_u64_t *)target = *(sz_u64_t const *)source, target += 8, source += 8, length -= 8;
#endif
    while (length--) *(target++) = *(source++);
}

The very first copy, i.e.:

(*(sz_u64_t *)target = *(sz_u64_t const *)source

"target" has the address which I assume is the result of the allocation, which was 0x7bfff5900029. UBSAN says here:

runtime error: store to misaligned address 0x7bfff5900029 for type 'sz_u64_t' (aka 'unsigned long'), which requires 8 byte alignment

My questions are:

  1. If x86_64 supports unaligned reads, and this feature is enabled by default, why does UBSAN say it's undefined behaviour. Should I disable it manually? If I disable it the copies end up being done byte for byte in a loop.
  2. Why is the allocation to store the string even at address 0x7bfff5900029? That sounds like a ridiculous address in terms of alignment to have received from an allocation function. Usually they're 8 or 16 byte-aligned.
  3. This custom function is implemented because memcpy(NULL) is UB? Wouldn't it be better just check for NULL and use memcpy, which is guaranteed to be much better than any manual loop?
1
  • As an aside, the original IBM PC had an 8088 processor with an 8-bit data bus. So "unaligned access" was not a thing, when doing it a byte at a time. All its decendants are, more or less, forced to handle this feature for compatibility with older programs. Commented Nov 3 at 9:14

2 Answers 2

8

If x86_64 supports unaligned reads, and this feature is enabled by default, why does UBSAN say it's undefined behaviour. Should I disable it manually? If I disable it the copies end up being done byte for byte in a loop.

UBSAN says it's undefined behaviour because it IS undefined behaviour. C standard says unaligned memory accesses are UB, period. Now, in reality some platforms would crash on misaligned accesses, some would read the data properly, some might read gibberish. X86 happens to read the data properly, so this library uses that knowledge to speed up its execution. That doesn't make it less of UB as defined by standard, so UBSAN is correct here.

You could ask the library to not do that, or you could ask UBSAN to not check for unaligned memory accesses if you don't care about them (-fno-sanitize=alignment I think), or you could ask UBSAN to ignore that specific place (via UBSAN_OPTIONS=suppressions=... environment variable), or you could patch the library to insert __attribute__((no_sanitize("alignment"))).

Why is the allocation to store the string even at address 0x7bfff5900029? That sounds like a ridiculous address in terms of alignment to have received from an allocation function. Usually they're 8 or 16 byte-aligned.

Because that is probably not coming from any allocation function. sz_string_t has a small string optimization, and its internal data storage is at offset 9 (for 64 bit little endian target).

This custom function is implemented because memcpy(NULL) is UB? Wouldn't it be better just check for NULL and use memcpy, which is guaranteed to be much better than any manual loop?

Rather check for length==0. I would agree with you on this point. It is rather ironic that the authors cite UB as a reason for custom implementation which also contains UB.

Sign up to request clarification or add additional context in comments.

5 Comments

"sz_string_t has a small string optimization, and its internal data storage is at offset 9 (for 64 bit little endian target)." - How did you know this?
I've looked at sz_string_init_length definition and then at sz_string_t definition
Oh interesting. UBSAN is reporting UB also at the line after it, the byte by byte copy, ( while (length--) *(target++) = *(source++); with error: runtime error: unsigned integer overflow: 0 - 1 cannot be represented in type 'sz_size_t' (aka 'unsigned long')
That is not an UB actually (unsigned integer overflow is clearly defined in the standard, unlike signed integer overflow). Though I can see why it makes sense to watch out for such overflows. UBSAN authors realize this is a somewhat common idiom so they provide a separate suppression key for it (see clang.llvm.org/docs/…)
Yes, I realised I can't have any of the unsigned overflow UBSAN checks on otherwise the log window gets polluted by too much, from Boost, to SDL, to pretty much everything. So I have to turn them all off. Shame, it could have have handy to catch some bugs.
2

If x86_64 supports unaligned reads, and this feature is enabled by default, why does UBSAN say it's undefined behaviour. [...]

It is not enabled by default. GCC assumes all loads and stores are aligned even when the target architecture supports it, so it is undefined behaviour. To use misaligned reads, you must use compiler extensions:

using misaligned_sz_u64_t [[gnu::aligned(1)]] = sz_u64_t;
// Or in C23
typedef sz_u64_t misaligned_sz_u64_t [[gnu::aligned(1)]];
// Or in C89
typedef sz_u64_t misaligned_sz_u64_t __attribute__((__aligned__(1)));


    while (length >= 8) *(misaligned_sz_u64_t *)target = *(misaligned_sz_u64_t const *)source, target += 8, source += 8, length -= 8;

And UBSan no longer complains. This is a bug in the string library.

If I disable it the copies end up being done byte for byte in a loop.

I am looking at the code generation for just the single loop, and it does move 8 bytes at a time until the length becomes less than 8. This is an optimisation known as vectorisation.

Why is the allocation to store the string even at address 0x7bfff5900029? That sounds like a ridiculous address in terms of alignment to have received from an allocation function. Usually they're 8 or 16 byte-aligned.

Standard C++ allocation functions (std::malloc, operator new) will be aligned to (at the very least) std::alignmax_t, usually they are slightly more aligned. What is probably happening is that there is some preceding data (either some data used by the library, like size or capacity, or this is being called to copy into the middle of a string with preceding data).

This custom function is implemented because memcpy(NULL) is UB? Wouldn't it be better just check for NULL and use memcpy, which is guaranteed to be much better than any manual loop?

It won't 'always' be better, especially since you will know more about the data than generic library function. For example, if you know your pointers are always going to be aligned to 64 bytes and your length is always going to be a multiple of 64, you could write a slightly more efficient loop.

However, in this case, memcpy is most likely going to be better, and it could be written like this for the same effect:

inline SZ_PUBLIC void sz_copy_serial(sz_ptr_t target, sz_cptr_t source, sz_size_t length) {
    if (length) [[likely]]
        memcpy(target, source, length);
}

This answer assumes GCC is the compiler used, but Clang also makes similar assumptions and also supports [[gnu::aligned(N)]]. Clang-cl and MSVC can use the __unaligned extension, and other compilers may have similar extensions. Misaligned writes are not standard C++ and compilers can (and do) assume they never happen.

1 Comment

I like and dislike your [[gnu::aligned(1)]] idea at the same time. On one hand, it does the right thing UB-wise. It's the same resulting code without relying on UB code being compiled the certain way by the compiler (even though there are countless examples of production code doing exactly the same assumption). But on the other hand, it is now "we are relying on compiler-extension code being compiled the certain way". If for some unfathomable reason this misaligned code happens to get compiled on a wrong plaform, you'd get silent performance regression instead of loud crash.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.