Newest 'atomic' Questions

5 votes

1 answer

89 views

What is the performance effect (on x64) of __atomic_fetch_add that ignores its result?

My code is ... fragment1 // compares several regions in D1$ to D1$/D3$ __atomic_fetch_add(&lock,-1,__ATOMIC_ACQ_REL); // stmt A fragment2 // moves several regions from D1$/D3$ to D1$ ...

Henry Rich

91

asked yesterday

4 votes

0 answers

168 views

Is it impossible that the acquire load returns `1` when the loops in other threads exit?

Consider this example: #include <atomic> #include <cassert> #include <thread> int main() { std::atomic<int> strong = {3}; std::atomic<int> weak = {1}; auto t1 ...

xmh0511

7,628

asked 2 days ago

2 votes

0 answers

122 views

Is it possible that the assertion can fail with memory_order::relaxed to transfer pointers?

Consider this example: #include <iostream> #include <atomic> #include <thread> #include <cassert> int main(){ std::atomic<int> val = 1; std::atomic<std::atomic&...

xmh0511

7,628

asked Nov 26 at 9:51

0 votes

2 answers

140 views

compare_exchange_strong failed to update the expected value

I am trying to implement a lock-free multiple-producer-single-consumer ring buffer in C++. Here is the full definition and the test code. #include <iostream> #include <memory> #include <...

God_of_Thunder

783

asked Nov 18 at 4:51

3 votes

1 answer

151 views

Do methods on the ECMAScript Atomics object enforce that all prior shared memory operations are completed first?

The ECMAScript Language Specification states: Atomics are carved in stone: Program transformations must not cause any Shared Data Block events whose [[Order]] is seq-cst to be removed from the is-...

James Page

31

asked Nov 13 at 0:34

1 vote

2 answers

204 views

Does this execution violate the observable behavior if ignoring the OOTA?

Consider this example: #include <iostream> #include <thread> #include <atomic> int main(){ std::atomic<int> val = 0; std::atomic<bool> flag = false; auto t1 = std::...

xmh0511

7,628

asked Nov 11 at 5:52

1 vote

1 answer

196 views

How to benchmark atomic<int> vs atomic<size_t>?

I have a bounded queue with small size that definitely fit in int. So I want to use atomic<int> instead of atomic<size_t> for indexing/counter, since int is smaller it should be faster. ...

Huy Le

1,989

asked Nov 11 at 3:12

0 votes

1 answer

331 views

Can external IO operations be considered as if seq_cst operations in the reasoning of multithreaded programs?

Consider this example: // thread A: start_transaction(); update_mysql(); commit_transaction(); // remove "key" from mysql tables remove_redis_cache("key"); // thread B: std::...

xmh0511

7,628

asked Nov 7 at 6:32

2 votes

1 answer

146 views

Why std::atomic<uint64_t>{}.is_lock_free() is true when targets for x86 (32 bits platform) in Visual Studio? [duplicate]

I noticed std::atomic<uint64_t>{}.is_lock_free() returns true even if I switch the target platform to x86 in Visual Studio. I also checked the disassembly of an uint64_t assignment like below. ...

Tinggo

1,167

asked Nov 4 at 10:01

7 votes

1 answer

152 views

Rename temporary file after closing its file descriptor in Python

I want to atomically write files in Python. pathlib and tempfile should be used. I have import os from pathlib import Path import tempfile def atomic_write(f: Path, data: bytes) -> None: with ...

os_user

100

asked Nov 3 at 0:56

Advice

0 votes

4 replies

141 views

Using lockless atomic operations instead of a mutex

I recently had an interview where I was asked to show how to prevent race conditions in C or C++ between two threads operating on shared data. I used a mutex as follows : pthread_mutex_t mutex; int ...

Engineer999

4,159

asked Nov 1 at 22:24

2 votes

1 answer

90 views

Django transaction.atomic() on single operation prevents race conditions?

Why I need to use atomic() when I have only 1 db operation inside atomic block? My AI-assistant tells me that it prevents race conditions, but I don't use select_for_update() inside. It tells that db ...

Alex

66

asked Oct 29 at 9:59

5 votes

0 answers

230 views

Why does this data race have some consistent invariants with writers updating one of three atomic<int> variables?

I have the following program. The relevant info is: There are 3 variables atomic<int> x,y,z accessed by all threads. 3 writer threads: Each thread read all 3 values x,y,z, and update exactly 1 ...

Huy Le

1,989

asked Oct 29 at 8:47

-1 votes

0 answers

223 views

+50

Do the RMW operations on `cnt` still not avoid an inconsistent status for this multiple-producer single-consumer implementation?

Looking at this implementation of multiple-producer single-consumer, which was the implementation in Rust's standard library; however, its memory order model is derived from C++. So, it should be ...

xmh0511

7,628

asked Oct 29 at 8:30

0 votes

1 answer

164 views

Does an implementation that reorders evaluation in a single thread violate [intro.execution] p8?

[intro.execution] p8 says: Given any two evaluations A and B, if A is sequenced before B (or, equivalently, B is sequenced after A), then the execution of A shall precede the execution of B. ...

xmh0511

7,628

asked Oct 28 at 3:48

3 votes

1 answer

246 views

Which memory ordering to use in lockless linked-list stack pop() implementation?

I stumbled into an interesting issue -- when it comes to intrusive lockless stacks (single-linked lists) it seems there is a consensus on how push() should look like. All internet/AI searches (and ...

C.M.

3,457

asked Oct 20 at 22:31

3 votes

1 answer

184 views

Size of the state table in std::atomic wait implementation

While looking through implementations of the std::atomic<T>::wait, I've found that most of them used a simple hash table for mapping the state for each atomic location. libcxx static constexpr ...

ross1573

73

asked Oct 20 at 17:32

0 votes

1 answer

178 views

Is this a conforming observable behavior in the abstract machine's sense, where the load reads a value that is not currently produced

Consider this example: #include <atomic> #include <iostream> #include <chrono> #include <thread> #include <cassert> int main(){ std::atomic<int> val = {0}; ...

xmh0511

7,628

asked Oct 18 at 23:51

10 votes

1 answer

290 views

Why does the memory order need to be Acquire in a single consumer linked-list queue when comparing pointer values?

This is a multi-producer single-consumer implementation translated from Rust, for the language-lawyer question, rewriting it in C++ template<class T> struct Node{ std::atomic<Node*> ...

xmh0511

7,628

asked Oct 17 at 15:56

2 votes

0 answers

93 views

Too big a latency of ping-pong between two IPC processes on Sapphire Rapids Xeon with plain loads and stores, instruction order makes a big difference

I am running simple Ping/Pong between two processes A, B with shared memory: shm_A and shm_B are in separate cache lines. Allocated with separate calls to shm_open, so probably in different pages, ...

Samuel Hapak

7,284

asked Oct 16 at 7:52

7 votes

1 answer

108 views

Can a channel's Drop omit Acquire ordering, as in the Rust Atomics and Locks book?

In Rust Atomics and Locks chapter 5 (available online for free), this example implementation of a one-time channel is presented: pub struct Channel<T> { pub message: UnsafeCell<...

tux3

7,431

asked Oct 5 at 23:42

1 vote

1 answer

189 views

Is sequentially consistent memory ordering strictly necessary in this readers-writers lock using only load/store, not RMW?

Consider this outline of a simple multi-threaded application. It has one writer thread, and ten reader threads. #include <atomic> #include <thread> const int Num_readers{...

WaltK

802

asked Oct 1 at 0:00

6 votes

0 answers

451 views

How to formally prove that a statement after a spin loop isn't executed unless another thread exchanged first, with relaxed atomic exchange + store

Consider this example: #include <atomic> #include <thread> #include <cassert> int main(){ std::atomic<int> v = 0; std::atomic<bool> flag = false; std::thread ...

xmh0511

7,628

asked Sep 22 at 5:32

3 votes

1 answer

265 views

Is reordering really a useful concept for multithread program reasoning?

Consider this typical example: // Thread 1: r1 = y.load(std::memory_order_relaxed); // A x.store(r1, std::memory_order_relaxed); // B // Thread 2: r2 = x.load(std::memory_order_relaxed); // C y.store(...

xmh0511

7,628

asked Sep 17 at 6:19

2 votes

2 answers

271 views

Can I infer the execution relationship between two evaluations across two threads in this way?

Consider this example: std::atomic<bool> flag = false; int arr[2] = {}; // thread 1: arr[0] = 1; // A flag.store(true,std::memory_order::relaxed); // B // thread 2: while(!flag.load(std::...

xmh0511

7,628

asked Sep 12 at 9:03

5 votes

0 answers

202 views

Why is std::atomic<T> larger than T itself for user-defined structs on MSVC but not on GCC/Clang?

I was checking the size of std::atomic compared to T on different platforms (Windows/MSVC, Linux/GCC, Android/Clang). For intrinsic types (like int, int64_t, etc.), the size of std::atomic matches the ...

Abhishek

251

asked Sep 6 at 19:34

6 votes

2 answers

321 views

Is it a conforming observable behavior that a later acquired time point is less than an earlier acquired one?

#include <atomic> #include <chrono> #include <iostream> #include <thread> int main() { std::atomic<int> flag = {0}; auto t1 = std::thread([&]() { ...

xmh0511

7,628

asked Sep 6 at 1:58

0 votes

0 answers

92 views

How does a failed spinlock CAS affect out-of-order speculation and RMW reordering on weak memory architectures?

I’m trying to understand how speculative execution interacts with weak memory models (ARM/Power) in the context of a spinlock implemented with a plain CAS. Example: // Spinlock acquisition attempt if (...

Delark

1,385

asked Aug 28 at 15:52

1 vote

1 answer

149 views

If `std::atomic_thread_fence(std::memory_order_acquire);` doesn't have an "associated atomic operation"... how does the fence gets anchored, to what?

An acquire-like load... will keep everything (both stores and loads) BELOW the load/fence. But this doesn't mean that everything ABOVE/before the acquire-load will not move below... This means that ...

Delark

1,385

asked Aug 25 at 0:20

-2 votes

1 answer

147 views

Is there a data visibility issue here?

class Sample { int a = 0; public void Run() { // main thread.Assuming this is chromium task runner. auto currentRunner = GetCurrentDefault(); somePooledRunner->PostTask( [...

breaker00

195

asked Aug 22 at 4:37

0 votes

0 answers

135 views

I still don't quite understand the difference between memory_order_acq_rel and memory_order_seq_cst?

I read some QA about these two operations. But I still don't understand. acquire-release-versus-sequentially-consistent-memory-order Can I understand the difference between memory_order_acq_rel and ...

breaker00

195

asked Aug 22 at 2:05

2 votes

0 answers

116 views

Is it possible on any real hardware, for the updated value of an atomic integer to become visible earlier via an indirect path than via a direct path?

Is it possible on any real hardware in the real world, for the updated value of an atomic integer written by one thread to become visible to another thread earlier via an indirect path, where a third ...

Qwert Yuiop

362

asked Aug 19 at 21:10

2 votes

1 answer

131 views

Is it possible to use non-paired Acquire/Release memory orders?

I’ve spent several hours studying memory orderings, but I still have some contradictions in my head. One of them concerns the Acquire/Release memory orders. Currently, my understanding is: No ...

Eugene Usachev

63

asked Aug 17 at 19:23

0 votes

1 answer

143 views

Why does Android SystemProperties use memory barriers like this?

I was reading the implementation of Android's system property, and I am confused why is it that the barriers are used this way. I am looking at bionic/libc/system_properties/system_properties.cpp with ...

Kymdon

13

asked Aug 17 at 14:16

1 vote

0 answers

177 views

Is accessing 4-byte boundary around variable undefined behaviour? (needed for futex wait on a byte)

Since C++20, the standard library has std::atomic<uint8_t>::wait and std::atomic<uint8_t>::notify_one/all. However, these are not suitable for me, as they lack advanced features (e.g. ...

sedor

326

asked Aug 16 at 17:02

0 votes

1 answer

97 views

When an atomic variable becomes visible to a thread other than the writing thread, is it also immediately globally visible?

Suppose I have three threads. If x was written by thread2 and x is visible to thread1, do I have the guarantee that the latest value of x is also visible to thread3? In other words, can the new value ...

Qwert Yuiop

362

asked Aug 15 at 21:06

2 votes

2 answers

224 views

Can the hardware reorder an atomic load followed by an atomic store, if the store is conditional on the load?

Can the hardware reorder an atomic load followed by an atomic store, if the store is conditional on the load? It would be highly unintuitive if this could happen, because if thread1 speculatively due ...

Qwert Yuiop

362

asked Aug 15 at 20:54

2 votes

1 answer

95 views

Is there a seq_cst sequence between different parts of an atomic object when atomic operations with different sizes mixed?

Updated: I already know that this is a UB for ISO C, I apologize for the vague statement I made earlier. This question originates from my previous question Can atomic operations of different sizes be ...

untitled

563

asked Aug 15 at 16:22

3 votes

1 answer

112 views

Atomics.wait - `while(true)` or recursive function no output on stdout

I try to get comfortable with Atomics in node.js. For that i created a very simple test with 2 worker threads. One that waits for a notify, and one that notfies the other. main.js const { Worker } = ...

Marc

4,049

asked Aug 13 at 17:32

2 votes

1 answer

205 views

Can atomic operations of different sizes be mixed?

For the same memory address, if I use atomic operations of different widths to operate on it (assuming the memory is aligned), for example(Assuming the hardware supports 128 bit atomic operations): #...

untitled

563

asked Aug 10 at 9:05

2 votes

1 answer

156 views

Interlocked.* code section guard with minimal inter-core interference?

In order to guard a code section against repeat or concurrent execution we can use Interlocked functionality. Guarding against repeat execution is necessary for things like Dispose(), and guarding ...

DarthGizka

4,868

asked Aug 9 at 12:16

0 votes

2 answers

227 views

Why is an acquire barrier cannot stop a reordering around a branch?

I was testing the behavior of the control dependencies in LINUX KERNEL MEMORY BARRIERS, and had a problem with the location of the fence. I was testing this on AArch64 on a Qualcomm Snapdragon 835, ...

Kymdon

13

asked Aug 7 at 5:22

1 vote

0 answers

112 views

How to Portably Use std::atomic Inside a Union Across Platforms (MSVC/Clang on Windows/macOS/Linux)?

I'm working on a cross-platform data structure and trying to define a compact union-based layout that allows atomic access to a 64-bit word, while also optionally accessing the lower 32-bit fields. I ...

Abhishek

251

asked Aug 4 at 9:59

10 votes

1 answer

903 views

Is CPP TrivialCopyable class effectively a C struct?

During coding of std::atomic, CAS, etc, I always struggle to memorize the definition of CPP class being "TriviallyCopyable". Now I am gradually switching to C world, I accidentally found ...

PkDrew

2,301

asked Aug 1 at 2:49

6 votes

0 answers

247 views

Why load and exchange an std::atomic<bool>?

In P2300, the "1.4. Asynchronous Windows socket recv" example uses a pattern to mark completion (of setting the cancellation callback) that looks like this: if (ready.load(std::...

Mircea Baja

565

asked Jul 28 at 10:25

2 votes

1 answer

117 views

Strange behaviour of atomicCAS when used as a mutex

I'm trying to learn CUDA programming, and recently I have been working on the lectures in this course: https://people.maths.ox.ac.uk/~gilesm/cuda/lecs/lec3.pdf, where they discussed the atomicCAS ...

Dang Manh Truong

710

asked Jul 28 at 9:53

2 votes

0 answers

95 views

Can I use load(Acquire) + read data + compare_exchange_weak(Relaxed, Acquire) in a concurrent ring buffer?

I've been studying several implementations of SPMC (single producer, multiple consumer) ring buffers. In many of them, I find the memory orderings to be quite conservative—often stronger than what ...

Eugene Usachev

63

asked Jul 25 at 16:50

2 votes

1 answer

152 views

This MPSC Queue (Multi Producer Single Consumer Queue) keeps on waiting in the consumer side sometimes although I have used CAS operations

This MPSC Queue (Multi Producer Single Consumer Queue) keeps on waiting in the consumer side sometimes although I have used CAS operations. I have added CAS operation for the enqueue function. Since I ...

Dinushan Vishwajith

21

asked Jul 24 at 23:54

5 votes

1 answer

116 views

minimum required atomic instructions to support C++11 concurrency libraries

I'm implementing a multi core system consisting of several custom/specialty CPUs. Those CPUs need to be able to support the C++11 concurrency libraries (thread/mutex etc.). I'm not sure what kind of ...

dsula

267

asked Jul 17 at 18:20

1 vote

1 answer

200 views

Cross-platform 128-bit atomic support: std::atomic vs std::atomic_ref on Clang/MSVC (macOS ARM64, Windows x64, Linux)

Background I'm building a cross-platform atomic abstraction layer to support 64-bit and 128-bit atomic operations for the following types: int64_t, uint64_t __int128 (on Clang platforms) A custom ...

Abhishek

251

asked Jul 17 at 14:03

Collectives™ on Stack Overflow