4,046 questions
5
votes
1
answer
89
views
What is the performance effect (on x64) of __atomic_fetch_add that ignores its result?
My code is
...
fragment1 // compares several regions in D1$ to D1$/D3$
__atomic_fetch_add(&lock,-1,__ATOMIC_ACQ_REL); // stmt A
fragment2 // moves several regions from D1$/D3$ to D1$
...
4
votes
0
answers
168
views
Is it impossible that the acquire load returns `1` when the loops in other threads exit?
Consider this example:
#include <atomic>
#include <cassert>
#include <thread>
int main() {
std::atomic<int> strong = {3};
std::atomic<int> weak = {1};
auto t1 ...
2
votes
0
answers
122
views
Is it possible that the assertion can fail with memory_order::relaxed to transfer pointers?
Consider this example:
#include <iostream>
#include <atomic>
#include <thread>
#include <cassert>
int main(){
std::atomic<int> val = 1;
std::atomic<std::atomic&...
0
votes
2
answers
140
views
compare_exchange_strong failed to update the expected value
I am trying to implement a lock-free multiple-producer-single-consumer ring buffer in C++. Here is the full definition and the test code.
#include <iostream>
#include <memory>
#include <...
3
votes
1
answer
151
views
Do methods on the ECMAScript Atomics object enforce that all prior shared memory operations are completed first?
The ECMAScript Language Specification states:
Atomics are carved in stone: Program transformations must not cause any Shared Data Block events whose [[Order]] is seq-cst to be removed from the is-...
1
vote
2
answers
204
views
Does this execution violate the observable behavior if ignoring the OOTA?
Consider this example:
#include <iostream>
#include <thread>
#include <atomic>
int main(){
std::atomic<int> val = 0;
std::atomic<bool> flag = false;
auto t1 = std::...
1
vote
1
answer
196
views
How to benchmark atomic<int> vs atomic<size_t>?
I have a bounded queue with small size that definitely fit in int. So I want to use atomic<int> instead of atomic<size_t> for indexing/counter, since int is smaller it should be faster.
...
0
votes
1
answer
331
views
Can external IO operations be considered as if seq_cst operations in the reasoning of multithreaded programs?
Consider this example:
// thread A:
start_transaction();
update_mysql();
commit_transaction(); // remove "key" from mysql tables
remove_redis_cache("key");
// thread B:
std::...
2
votes
1
answer
146
views
Why std::atomic<uint64_t>{}.is_lock_free() is true when targets for x86 (32 bits platform) in Visual Studio? [duplicate]
I noticed std::atomic<uint64_t>{}.is_lock_free() returns true even if I switch the target platform to x86 in Visual Studio. I also checked the disassembly of an uint64_t assignment like below.
...
7
votes
1
answer
152
views
Rename temporary file after closing its file descriptor in Python
I want to atomically write files in Python. pathlib and tempfile should be used.
I have
import os
from pathlib import Path
import tempfile
def atomic_write(f: Path, data: bytes) -> None:
with ...
Advice
0
votes
4
replies
141
views
Using lockless atomic operations instead of a mutex
I recently had an interview where I was asked to show how to prevent race conditions in C or C++ between two threads operating on shared data. I used a mutex as follows :
pthread_mutex_t mutex;
int ...
2
votes
1
answer
90
views
Django transaction.atomic() on single operation prevents race conditions?
Why I need to use atomic() when I have only 1 db operation inside atomic block? My AI-assistant tells me that it prevents race conditions, but I don't use select_for_update() inside. It tells that db ...
5
votes
0
answers
230
views
Why does this data race have some consistent invariants with writers updating one of three atomic<int> variables?
I have the following program. The relevant info is:
There are 3 variables atomic<int> x,y,z accessed by all threads.
3 writer threads: Each thread read all 3 values x,y,z, and update exactly 1 ...
-1
votes
0
answers
223
views
+50
Do the RMW operations on `cnt` still not avoid an inconsistent status for this multiple-producer single-consumer implementation?
Looking at this implementation of multiple-producer single-consumer, which was the implementation in Rust's standard library; however, its memory order model is derived from C++. So, it should be ...
0
votes
1
answer
164
views
Does an implementation that reorders evaluation in a single thread violate [intro.execution] p8?
[intro.execution] p8 says:
Given any two evaluations A and B, if A is sequenced before B (or, equivalently, B is sequenced after A), then the execution of A shall precede the execution of B.
...
3
votes
1
answer
246
views
Which memory ordering to use in lockless linked-list stack pop() implementation?
I stumbled into an interesting issue -- when it comes to intrusive lockless stacks (single-linked lists) it seems there is a consensus on how push() should look like. All internet/AI searches (and ...
3
votes
1
answer
184
views
Size of the state table in std::atomic wait implementation
While looking through implementations of the std::atomic<T>::wait, I've found that most of them used a simple hash table for mapping the state for each atomic location.
libcxx
static constexpr ...
0
votes
1
answer
178
views
Is this a conforming observable behavior in the abstract machine's sense, where the load reads a value that is not currently produced
Consider this example:
#include <atomic>
#include <iostream>
#include <chrono>
#include <thread>
#include <cassert>
int main(){
std::atomic<int> val = {0};
...
10
votes
1
answer
290
views
Why does the memory order need to be Acquire in a single consumer linked-list queue when comparing pointer values?
This is a multi-producer single-consumer implementation translated from Rust, for the language-lawyer question, rewriting it in C++
template<class T>
struct Node{
std::atomic<Node*> ...
2
votes
0
answers
93
views
Too big a latency of ping-pong between two IPC processes on Sapphire Rapids Xeon with plain loads and stores, instruction order makes a big difference
I am running simple Ping/Pong between two processes A, B with shared memory:
shm_A and shm_B are in separate cache lines. Allocated with separate calls to shm_open, so probably in different pages, ...
7
votes
1
answer
108
views
Can a channel's Drop omit Acquire ordering, as in the Rust Atomics and Locks book?
In Rust Atomics and Locks chapter 5 (available online for free), this example implementation of a one-time channel is presented:
pub struct Channel<T> {
pub message: UnsafeCell<...
1
vote
1
answer
189
views
Is sequentially consistent memory ordering strictly necessary in this readers-writers lock using only load/store, not RMW?
Consider this outline of a simple multi-threaded application. It has one writer thread, and ten reader threads.
#include <atomic>
#include <thread>
const int Num_readers{...
6
votes
0
answers
451
views
How to formally prove that a statement after a spin loop isn't executed unless another thread exchanged first, with relaxed atomic exchange + store
Consider this example:
#include <atomic>
#include <thread>
#include <cassert>
int main(){
std::atomic<int> v = 0;
std::atomic<bool> flag = false;
std::thread ...
3
votes
1
answer
265
views
Is reordering really a useful concept for multithread program reasoning?
Consider this typical example:
// Thread 1:
r1 = y.load(std::memory_order_relaxed); // A
x.store(r1, std::memory_order_relaxed); // B
// Thread 2:
r2 = x.load(std::memory_order_relaxed); // C
y.store(...
2
votes
2
answers
271
views
Can I infer the execution relationship between two evaluations across two threads in this way?
Consider this example:
std::atomic<bool> flag = false;
int arr[2] = {};
// thread 1:
arr[0] = 1; // A
flag.store(true,std::memory_order::relaxed); // B
// thread 2:
while(!flag.load(std::...
5
votes
0
answers
202
views
Why is std::atomic<T> larger than T itself for user-defined structs on MSVC but not on GCC/Clang?
I was checking the size of std::atomic compared to T on different platforms (Windows/MSVC, Linux/GCC, Android/Clang).
For intrinsic types (like int, int64_t, etc.), the size of std::atomic matches the ...
6
votes
2
answers
321
views
Is it a conforming observable behavior that a later acquired time point is less than an earlier acquired one?
#include <atomic>
#include <chrono>
#include <iostream>
#include <thread>
int main() {
std::atomic<int> flag = {0};
auto t1 = std::thread([&]() {
...
0
votes
0
answers
92
views
How does a failed spinlock CAS affect out-of-order speculation and RMW reordering on weak memory architectures?
I’m trying to understand how speculative execution interacts with weak memory models (ARM/Power) in the context of a spinlock implemented with a plain CAS. Example:
// Spinlock acquisition attempt
if (...
1
vote
1
answer
149
views
If `std::atomic_thread_fence(std::memory_order_acquire);` doesn't have an "associated atomic operation"... how does the fence gets anchored, to what?
An acquire-like load... will keep everything (both stores and loads) BELOW the load/fence.
But this doesn't mean that everything ABOVE/before the acquire-load will not move below...
This means that ...
-2
votes
1
answer
147
views
Is there a data visibility issue here?
class Sample
{
int a = 0;
public void Run()
{
// main thread.Assuming this is chromium task runner.
auto currentRunner = GetCurrentDefault();
somePooledRunner->PostTask(
[...
0
votes
0
answers
135
views
I still don't quite understand the difference between memory_order_acq_rel and memory_order_seq_cst?
I read some QA about these two operations. But I still don't understand.
acquire-release-versus-sequentially-consistent-memory-order
Can I understand the difference between memory_order_acq_rel and ...
2
votes
0
answers
116
views
Is it possible on any real hardware, for the updated value of an atomic integer to become visible earlier via an indirect path than via a direct path?
Is it possible on any real hardware in the real world, for the updated value of an atomic integer written by one thread to become visible to another thread earlier via an indirect path, where a third ...
2
votes
1
answer
131
views
Is it possible to use non-paired Acquire/Release memory orders?
I’ve spent several hours studying memory orderings, but I still have some contradictions in my head. One of them concerns the Acquire/Release memory orders.
Currently, my understanding is:
No ...
0
votes
1
answer
143
views
Why does Android SystemProperties use memory barriers like this?
I was reading the implementation of Android's system property, and I am confused why is it that the barriers are used this way.
I am looking at bionic/libc/system_properties/system_properties.cpp with ...
1
vote
0
answers
177
views
Is accessing 4-byte boundary around variable undefined behaviour? (needed for futex wait on a byte)
Since C++20, the standard library has std::atomic<uint8_t>::wait and std::atomic<uint8_t>::notify_one/all. However, these are not suitable for me, as they lack advanced features (e.g. ...
0
votes
1
answer
97
views
When an atomic variable becomes visible to a thread other than the writing thread, is it also immediately globally visible?
Suppose I have three threads. If x was written by thread2 and x is visible to thread1, do I have the guarantee that the latest value of x is also visible to thread3? In other words, can the new value ...
2
votes
2
answers
224
views
Can the hardware reorder an atomic load followed by an atomic store, if the store is conditional on the load?
Can the hardware reorder an atomic load followed by an atomic store, if the store is conditional on the load? It would be highly unintuitive if this could happen, because if thread1 speculatively due ...
2
votes
1
answer
95
views
Is there a seq_cst sequence between different parts of an atomic object when atomic operations with different sizes mixed?
Updated:
I already know that this is a UB for ISO C, I apologize for the vague statement I made earlier.
This question originates from my previous question
Can atomic operations of different sizes be ...
3
votes
1
answer
112
views
Atomics.wait - `while(true)` or recursive function no output on stdout
I try to get comfortable with Atomics in node.js.
For that i created a very simple test with 2 worker threads.
One that waits for a notify, and one that notfies the other.
main.js
const { Worker } = ...
2
votes
1
answer
205
views
Can atomic operations of different sizes be mixed?
For the same memory address, if I use atomic operations of different widths to operate on it (assuming the memory is aligned), for example(Assuming the hardware supports 128 bit atomic operations):
#...
2
votes
1
answer
156
views
Interlocked.* code section guard with minimal inter-core interference?
In order to guard a code section against repeat or concurrent execution we can use Interlocked functionality. Guarding against repeat execution is necessary for things like Dispose(), and guarding ...
0
votes
2
answers
227
views
Why is an acquire barrier cannot stop a reordering around a branch?
I was testing the behavior of the control dependencies in LINUX KERNEL MEMORY BARRIERS, and had a problem with the location of the fence.
I was testing this on AArch64 on a Qualcomm Snapdragon 835, ...
1
vote
0
answers
112
views
How to Portably Use std::atomic Inside a Union Across Platforms (MSVC/Clang on Windows/macOS/Linux)?
I'm working on a cross-platform data structure and trying to define a compact union-based layout that allows atomic access to a 64-bit word, while also optionally accessing the lower 32-bit fields.
I ...
10
votes
1
answer
903
views
Is CPP TrivialCopyable class effectively a C struct?
During coding of std::atomic, CAS, etc, I always struggle to memorize the definition of CPP class being "TriviallyCopyable".
Now I am gradually switching to C world, I accidentally found ...
6
votes
0
answers
247
views
Why load and exchange an std::atomic<bool>?
In P2300, the "1.4. Asynchronous Windows socket recv" example uses a pattern to mark completion (of setting the cancellation callback) that looks like this:
if (ready.load(std::...
2
votes
1
answer
117
views
Strange behaviour of atomicCAS when used as a mutex
I'm trying to learn CUDA programming, and recently I have been working on the lectures in this course: https://people.maths.ox.ac.uk/~gilesm/cuda/lecs/lec3.pdf, where they discussed the atomicCAS ...
2
votes
0
answers
95
views
Can I use load(Acquire) + read data + compare_exchange_weak(Relaxed, Acquire) in a concurrent ring buffer?
I've been studying several implementations of SPMC (single producer, multiple consumer) ring buffers. In many of them, I find the memory orderings to be quite conservative—often stronger than what ...
2
votes
1
answer
152
views
This MPSC Queue (Multi Producer Single Consumer Queue) keeps on waiting in the consumer side sometimes although I have used CAS operations
This MPSC Queue (Multi Producer Single Consumer Queue) keeps on waiting in the consumer side sometimes although I have used CAS operations. I have added CAS operation for the enqueue function. Since I ...
5
votes
1
answer
116
views
minimum required atomic instructions to support C++11 concurrency libraries
I'm implementing a multi core system consisting of several custom/specialty CPUs. Those CPUs need to be able to support the C++11 concurrency libraries (thread/mutex etc.).
I'm not sure what kind of ...
1
vote
1
answer
200
views
Cross-platform 128-bit atomic support: std::atomic vs std::atomic_ref on Clang/MSVC (macOS ARM64, Windows x64, Linux)
Background
I'm building a cross-platform atomic abstraction layer to support 64-bit and 128-bit atomic operations for the following types:
int64_t, uint64_t
__int128 (on Clang platforms)
A custom ...