Mutex Implementations and Memory Fences in C

Ask Question

Asked 7 months ago

Modified 7 months ago

Viewed 104 times

I have been writing my own x86 32-bit operating system for the past month or so. My system uses just one core.

Anyway, I have been reading a lot about memory fences, CPU optimizations, and compiler optimizations. From my understanding, the CPU can optimize/reorder reads and writes at runtime without affecting the overall behavior of a program. However, when interrupts occur at random times, and different threads share memory, these optimizations may lead to undesired behavior.

I have read that it can be smart to use memory fences in interrupt handlers to force reads/writes to complete before switching contexts.

(Please correct me if anything I have said above is incorrect)

Now, I have been tasked with writing my own mutex implementation. Aside from the CPU hardware optimizations, I want to make sure the compiler does not reorder operations from inside the lock/unlock sections to outside said sections.

I read online that the compiler knows not to reorder operations passed __sync_synchronize() memory fence calls. This makes sense to me.

However, my lock and unlock implementations are in their own source file. When I call lock from another file, how does the compiler know its implementation contains a memory fence? Is there some sort of __attribute__ I write in my mutex header file? Should my mutex functions all just be static inline in the header file itself?

asked May 4 at 7:27

c.abate

4423 silver badges12 bronze badges

1

This is a total non-issue on a single-core machine. The CPU always maintains the illusion of running instructions one at a time, from the perspective of code running on this core. Including interrupt handlers. iret is serializing, but perhaps you need something in case user-space is JITting machine-code into a buffer and about to jump to it. On x86, the usual acquire/release you'd need for the kernel itself wouldn't be sufficient to make stores on one core visible to code-fetch on another without a serializing instruction somewhere, but probably iret is ok; that's something to check.

Peter Cordes
– Peter Cordes

2025-05-04 07:42:51 +00:00
Commented May 4 at 7:42
When I call lock from another file, how does the compiler know its implementation contains a memory fence? - It has to assume every opaque function call (i.e. separate file without -flto) might contain a full memory barrier or might do nothing. And that it potentially reads+writes any/every global var. That part is a duplicate of How does a mutex lock and unlock functions prevents CPU reordering? and maybe How compiler like GCC implement acquire/release semantics for std::mutex

Peter Cordes
– Peter Cordes

2025-05-04 07:49:42 +00:00
Commented May 4 at 7:49
Single-core makes things easier for sure, but compiler optimizations still could be an issue when interoperating with interrupt handlers. A bit of fencing is still needed (e.g. if you secure your interrupt-driven ring buffer operations by placing them inside cli/sti section, you'd want to make sure those operation can't be moved outside of that section by a compiler.

Andrey Turkin
– Andrey Turkin

2025-05-04 11:33:28 +00:00
Commented May 4 at 11:33
1

Yes, exactly. The cardinal rule of out-of-order exec is "don't break single-threaded code". For an OS doing context switches, it's really "don't break single-core code". e.g. later loads will reload earlier stores by snooping the store buffer, it doesn't care about software threads. sysret to a different program counter than the one that did syscall isn't special for the CPU, it's still all one instruction stream. (So the only thing you have to worry about is compile-time reordering (as Andrey mentioned), e.g. with asm("" ::: "memory") - zero asm instructions.)

Peter Cordes
– Peter Cordes

2025-05-04 16:22:20 +00:00
Commented May 4 at 16:22
1

See my answer at Store/Load ordering around function call. In short: no, you don't need attributes or inlining or anything like that. The language guarantees the semantics of barriers even when they're inside a called function, and the compiler must provide that. As such, the compiler must default to not reordering memory access around a function call, unless it can see inside the called function and prove that it does not contain a barrier.

Nate Eldredge
– Nate Eldredge

2025-05-08 14:45:20 +00:00
Commented May 8 at 14:45

| Show 2 more comments

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Mutex Implementations and Memory Fences in C

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked