I have been writing my own x86 32-bit operating system for the past month or so. My system uses just one core.
Anyway, I have been reading a lot about memory fences, CPU optimizations, and compiler optimizations. From my understanding, the CPU can optimize/reorder reads and writes at runtime without affecting the overall behavior of a program. However, when interrupts occur at random times, and different threads share memory, these optimizations may lead to undesired behavior.
I have read that it can be smart to use memory fences in interrupt handlers to force reads/writes to complete before switching contexts.
(Please correct me if anything I have said above is incorrect)
Now, I have been tasked with writing my own mutex implementation. Aside from the CPU hardware optimizations, I want to make sure the compiler does not reorder operations from inside the lock/unlock sections to outside said sections.
I read online that the compiler knows not to reorder operations passed __sync_synchronize() memory fence calls. This makes sense to me.
However, my lock and unlock implementations are in their own source file. When I call lock from another file, how does the compiler know its implementation contains a memory fence? Is there some sort of __attribute__ I write in my mutex header file? Should my mutex functions all just be static inline in the header file itself?
iretis serializing, but perhaps you need something in case user-space is JITting machine-code into a buffer and about to jump to it. On x86, the usual acquire/release you'd need for the kernel itself wouldn't be sufficient to make stores on one core visible to code-fetch on another without a serializing instruction somewhere, but probablyiretis ok; that's something to check.-flto) might contain a full memory barrier or might do nothing. And that it potentially reads+writes any/every global var. That part is a duplicate of How does a mutex lock and unlock functions prevents CPU reordering? and maybe How compiler like GCC implement acquire/release semantics for std::mutexcli/stisection, you'd want to make sure those operation can't be moved outside of that section by a compiler.sysretto a different program counter than the one that didsyscallisn't special for the CPU, it's still all one instruction stream. (So the only thing you have to worry about is compile-time reordering (as Andrey mentioned), e.g. withasm("" ::: "memory")- zero asm instructions.)