0

Consider this example:

#include <atomic>
#include <iostream>
#include <chrono>
#include <thread>
#include <cassert>

int main(){
  std::atomic<int> val = {0};
  auto t1 = std::thread([&]() {
     auto r = val.load(std::memory_order::relaxed); // #1
     assert(r==1);
  });
  auto t2 = std::thread([&]() {
      std::this_thread::sleep_for(std::chrono::milliseconds(6000)); // #2
      val.store(1,std::memory_order::relaxed); // #4
  });
  t1.join();
  auto now1 = std::chrono::high_resolution_clock::now();
  std::cout<<"t1 completed\n";
  t2.join();
  auto now2 = std::chrono::high_resolution_clock::now();
  std::chrono::duration<double> duration = now2 - now1;
  // duration >=6
}

Is it possible to observe that #1 loads 1 and "t1 completed" is immediately printed(i.e., t1 completes quickly), #4 is executed by t2 six seconds after #2, in terms of the abstract machine's perspective? In other words, #1 reads a value that hasn't been produced yet by t2 at that point.

[intro.races] p10 says:

The value of an atomic object M, as determined by evaluation B, is the value stored by some unspecified side effect A that modifies M, where B does not happen before A.

From the perspective of memory order, #1 and #4 are unordered(in terms of happen-before), so #4 is a possible value read by #1. However, from the perspective of execution, #4 is executed by thread t2 after at least six seconds. That is, #4 hasn't been executed by thread t2 yet at the point when t1 was completed. Although the abstract machine doesn't care about the timeline, it only cares about ordering; however, under the described outcome, #1 reads a value that has not been produced by t2 when t1 is completed. So, I wonder, is the described outcome a conforming observable behavior in the pure abstract machine's perspective?

4
  • Thread exit is not an observable side effect, so it's unclear what you mean by "observe a thread exits quickly". In fact, in your example t1 doesn't perform any observable side effects at all, so in principle, the whole thread execution can be optimized away under the as-if rule. You are asking for an answer in terms of the abstract machine, but you aren't formulating your question in terms meaningful for the abstract machine. Commented Oct 19 at 2:36
  • @IgorTandetnik "as-if" only applies to implementations, not to the abstract machine itself, so the abstract machine does not have the concept "optimized away". Anyway, I have updated my example to make the intention more explicit. Commented Oct 19 at 2:56
  • Writes to std::cout are synchronized with each other. If std::cout<< r is printed first, then it can only print 0 since in this case val.load happens-before val.store. If std::cout<< r is printed second (unlikely in practice because of the delay, but theoretically possible) then the load and the store are unsynchronized and either 0 or 1 can be printed. Commented Oct 19 at 3:23
  • @IgorTandetnik Ok, the std::cout introduces the wrong intention for the original example. I updated the example. Commented Oct 19 at 3:55

1 Answer 1

3

This program doesn't check that t1 exits quickly, so the as-if rule allows basically having it both ways: reading the store value from later, but also not actually waiting for it.

One hypothetical mechanism would be value-prediction for the load, only confirmed much later. Real CPUs don't do inter-thread speculation, but the abstract machine doesn't forbid it in most cases, except for the prohibition of out-of-the-blue values. (Which this is not).

Or a machine with a 1 Hz clock so 6000 ms is only a handful of cycles. (Although then you probably couldn't say that t1 exited quickly. That assertion has no formal implications. I guess you could check now() after t1.join() and again after t2.join(), and check that the interval was close to 6 seconds. That wouldn't AFAIK introduce anything that would stop the abstract machine from doing this; now() doesn't sync with anything.)

Sign up to request clarification or add additional context in comments.

22 Comments

I would argue that, if #1 loads 1, that means t2 has executed #4 to produce the side effect(this is required by [intro.races] p10), which contradicts the assumption that #4 won't be executed until after six seconds. How to explain this point?
We don't need to; the abstract machine doesn't forbid some things that aren't realistically possible, so only big as-if transformations could make them plausible. But in this case, see my answer: value-prediction for loads. This is something real CS research has explored, but no real-world commercial CPU has ever implemented. Of course real CPUs would need to confirm the speculation before they could retire the load, which would give the value actually currently in memory (or coherent cache) for that location. But the abstract machine doesn't forbid extended speculation.
Yes, "value-prediction for loads" can make this result possible. However, I would like to discuss it from the purely abstract machine's perspective. [intro.races] p10 requires that there must exist a side effect corresponding to the value of the load. Or, the correct interpretation of [intro.races] p10 is that the side effect doesn't need to exist at the point when the load reads the value, it only requires the corresponding side effect eventually exist at the end of the program?
The load and store are "fenced in" by the same events (thread creation and join), and are unsynchronized wrt. each other. They're both floating around and you don't know which one actually happened first until you check the result of the load. The side effect does exist in the lifetime of the program, "B does not happen before A" because they're unsychronized. I'm reading that only as a formal statement of happens-before sync, not about wall-clock time, so there's no contradiction.
So, the correct reading to [intro.races] p10 is that the corresponding side effect only needs to exist in the lifetime of the program instead of at the point when the load reads the value, right? So, the argument in this answer might be wrong?
Yes, that's how I'm interpreting it. I don't think the argument made in that answer about stores having to have been executed before loads for their values to be seen is supported by the standard, because of things like value-prediction which allows later confirmation. (Not that a plausible mechanism has to exist for an execution which the standard doesn't forbid. It might only be plausible by aggressive as-if transformations. But value-prediction is an example of a theoretical computer-architecture thing which C++ doesn't want to forbid.)
«Not that a plausible mechanism has to exist for an execution which the standard doesn't forbid.» Of course, this question is only about the abstract machine, so it is not necessary to provide a plausible mechanism in terms of every possible execution; merely, a realistic existing mechanism can give a more powerful proof of how the wording should be interpreted. Could you please answer this mentioned question? I think that would be helpful to this question.
Also, the confusion in this question arises from the wrong answers to the question.
Let me combine your this answer to talk about my understanding. First, the basic requirement for a side effect to be visible to a load is that it must be executed somewhere in the lifetime of the program. For non-atomic objects, an extra requirement is that the side effect must happen before the load, which excludes "happens-after" and "concurrent/unordered"; For atomic objects, an extra requirement is that the side effect must not happen after the load, which excludes "happens-after"...
...as you said in the comment, "execution" only appears in the definition of sequenced-before, so "happens-before"(inter-threaded) says nothing about "executed before" and therefore is also not equivalent to "executed before". In this question, the side effect and load are on the atomic objectval, and #1 does not happen before #4(satisfying "does not happen" in [intro.races] p10), so the only requirement is that #4 is executed sometime in the lifetime of the program, which doesn't require the side effect having to be executed before the load to see the value...
... even for non-atomic objects, whose side effect must happen before a load to be visible; this doesn't require that the side effect must be executed before the load(I refer to inter-thread happen-before because sequenced-before naturally defines "executed before"). Is my above understanding right?
Yeah, looks correct to me. A side-effect can be potentially-concurrent or happen-before the load for it to be possibly seen, and that's all. The extra requirement for non-atomic loads is of course just to avoid data-race UB. The abstract machine doesn't have very concrete notions of "executed"; it seems to work better to think across threads in a similar way to quantum mechanics probability clouds (e.g. for where an electron might be near an atomic nucleus); between happens-before edges, memory ops from the PoV of other threads have an unknown order and progress until something observes it.
Well, so back to the standard wording itself, except for sequenced-before, neither the (inter-thread) happens-before nor [intro.races] p10 takes no care about "execution order", we shouldn't even talk about "execution order" in these cases, because they are not defined around "execution order" and say nothing about "execution order". Mapping to a concrete implementation, this means the side effect and the load can be executed in any order, as long as a side effect can be visible to a load; for example, an earlier load can read a later executed side effect...
...BTW, I would like to be pedantic to ask why we say "is executed in the lifetime of the program" is plausible to the wording. I guess that it is because (inter-thread) happen-before(and its opposite, "happens-after" and "concurrent") is not defined in terms of "executed before/execution order" and specifies nothing on how execution order is, and side effects are produced by expressions that are executed by a thread, which together imply that the side effect only needs to exist, that is "expression is executed in the lifetime of the program", right?
I'd say a happens-after means something is executed after, as well as not being visible. But it's not very meaningful to talk about execution without visibility in the abstract machine, only in real machines where there are concrete implementations details we can look at (e.g. with a debugger) and say something has executed from a PoV outside the program. Anyway, the tightest bound you can probably put on a store execution for it to be visible to a load is "executed not strictly after the load". So during the portion of the lifetime that doesn't happen-after the load.
But this is all useless pedantry; as we've discussed before, "execution" isn't how the standard defines anything except within a single thread. In terms of real CPUs, it's visibility that matters for memory-ordering, not execution of stores. (Loads take their value when they execute, stores just write into the store buffer to be committed after they're known non-speculative.) So "execute" isn't all that relevant for correctness even in normal machines.
Not sure what "execution" refers to in your comment: "execution order" or "is executed by thread"? I would argue that "happens before" specifies nothing about "execution order", so how the executions of two operations are ordered doesn't matter for visibility. But for "is executed", I think it matters; if an expression is never executed, the side effect of it cannot exist, let alone it could be visible to loads.
For «I'd say a happens-after means something is executed after, as well as not being visible.» Since we want to avoid talking about "execution order" in "happens before", we should also avoid that in "happens after", because they are equivalent(i.e., A happens before B is the same as B happens after A). As you mentioned many times, "execution order" only appears in the definition of sequenced-before; the definition of "happens-before"(or happens-after) specifies nothing about "execution order".
Even some non-memory things like std::chrono::now() are ordered by happens-before IIRC, so it does have to interact with execution order to some degree. I meant inter-thread-happens-before implies executes-before, at least for memory operations and ones like now() where timing/order can be visible. Within a thread, sequenced-before doesn't imply that for most operations.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.