Branch prediction and multithreading

Question

Let's suppose a simple if like this:

if (something)
   // do_something
else
   // do_else

Suppose that this if-else statement is executed in parallel in different threads, and each thread yielding a different result, but constant through its own life. For example, in thread 1 the condition is always evaluated as false, in thread 2, true; in thread 3 always true as well, and so on.

Does branch prediction consider the execution context of each thread to make its statistics? Because if it doesn't (I don't think that, but its difficult to check by testing), the CPU will see the condition follows a random pattern and won't predict at all.

Define thread. The CPU obviously does not know about OS threads. But most CPU's these days do know about hardware threads. — Kris Vandermotten
– Kris Vandermotten, Commented Sep 18, 2016 at 12:50
Branch prediction is a processor implementation detail that operates at nanosecond resolution. Thread execution operates at millisecond resolution. Those 6 orders of magnitude difference make the issue irrelevant. — Hans Passant
– Hans Passant, Commented Sep 18, 2016 at 14:01
The research and design of branch prediction based on multicore heterogeneous - Abstract: Aiming at those problem that it was difficult to improve the processor performance only by improving the single core frequency, as well as superscalar pipeline stall when process a branch instruction, the architecture of heterogeneous multi-core processor which used B-Cache structure and C-Core processor controller was introduced in this paper. The new architecture avoided the pipeline flushed due to branch miss-predict, and improve overall efficiency of Multi-Core processor. — samus
– samus, Commented May 30, 2018 at 16:54

Surt · Accepted Answer · 2016-11-08 17:59:41Z

If we ignore SMT (f.ex. hyper-threading) most architectures have a branch predictor per hardware thread. Its tightly coupled with the fetch unit of the individual core. A few (AMD?) store some branch prediction information in L1/L2 I-cache but mostly target for next fetch.

So if you don't run your code on a SMT you are in heaven and will get a 100% predicted every time at the cost of a few instructions.

If you run your code on a SMT you will often find your life is hell, with 50+% mispredict.

Now you can solve your problem easily you just have to use more code, check your condition earlier and call a branch of your code with do_something or do_else in it.

If you have a loop that calls your function where you have your branch you can do something like:

if (something) do_something_loop(); else do_else_loop();

void do_something_loop() { for (auto x : myVec) do_something; }

This has the disadvantage that you need to maintain 2 nearly equal branches of code.

Or you can have your branch in a function call branch_me() which you can make a template function and due to the magic of dead code elimination you should not get any branches in the loops.

C++ Concept code.

template<bool b_something>
void brancher() {
  // do things
  if (b_something)
    // do_something
  else
    // do_else
  }
  // do more things
}

void branch_user() {
  if (something) {
    for (auto x : myVec)
      brancher<true>();
  } else {
    for (auto x : myVec)
      brancher<false>();
  }
}

Now you only have to maintain the 2 branches of the outer function which hopefully is less work.

It's never too late to accept and answer. I guess I didn't understood your answer back then, but I was revisiting my old questions and found this. Sorry for the delay.

Collectives™ on Stack Overflow

Branch prediction and multithreading

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related