If we ignore SMT (f.ex. hyper-threading) most architectures have a branch predictor per hardware thread.
Its tightly coupled with the fetch unit of the individual core. A few (AMD?) store some branch prediction information in L1/L2 I-cache but mostly target for next fetch.
So if you don't run your code on a SMT you are in heaven and will get a 100% predicted every time at the cost of a few instructions.
If you run your code on a SMT you will often find your life is hell, with 50+% mispredict.
Now you can solve your problem easily you just have to use more code, check your condition earlier and call a branch of your code with do_something or do_else in it.
If you have a loop that calls your function where you have your branch you can do something like:
if (something)
do_something_loop();
else
do_else_loop();
void do_something_loop() {
for (auto x : myVec)
do_something;
}
This has the disadvantage that you need to maintain 2 nearly equal branches of code.
Or you can have your branch in a function call branch_me() which you can make a template function and due to the magic of dead code elimination you should not get any branches in the loops.
C++ Concept code.
template<bool b_something>
void brancher() {
// do things
if (b_something)
// do_something
else
// do_else
}
// do more things
}
void branch_user() {
if (something) {
for (auto x : myVec)
brancher<true>();
} else {
for (auto x : myVec)
brancher<false>();
}
}
Now you only have to maintain the 2 branches of the outer function which hopefully is less work.