The fastest way to use a binary expression on array of booleans

Question

I need to do something like this in the fastest way possible (O(1) would be perfect):

for (int j = 0; j < V; ++j)
        {
            if(!visited[j]) required[j]=0;
        }

I came up with this solution:

for (int j = 0; j < V; ++j)
        {
             required[j]=visited[j]&required[j];
        }

Which made the program run 3 times faster but I believe there is an even better way to do this. Am I right?

Btw. required and visited are dynamically allocated arrays

bool *required;
bool *visited;
required = new bool[V];
visited = new bool[V];

I doubt you could do it in O(1). Also be sure you are compiling with optimizations on. — Colonel Thirty Two
– Colonel Thirty Two, Commented Oct 3, 2015 at 20:43
You can do it in O(1) by not doing it at all. Compute values of required[j] lazily when they're actually needed. — Oliver Charlesworth
– Oliver Charlesworth, Commented Oct 3, 2015 at 20:45
The "fastest way" has little to do with big O's and whatnot, and more with the low level optimization that you were sort of already doing. Do you want more of that? Perhaps using SIMD intrinsics? — user555045
– user555045, Commented Oct 3, 2015 at 20:46
Do you have to use an array of bool? Something like a bitset (or dynamic_bitset) would be much faster, because one instruction can handle 32 or 64 booleans at once. — Alan Stokes
– Alan Stokes, Commented Oct 3, 2015 at 20:53

Aidan Gomez · Accepted Answer · 2015-10-03 21:42:39Z

2

In the case where you're using a list of simple objects, you are most likely best suited using the functionality provided by the C++ Standard Library. Structures like valarray and vectors are recognized and optimized very effectively by all modern compilers.

Much debate exists as to how much you can rely on your compiler, but one guarantee is, your compiler was built alongside the standard library and relying on it for basic functionality (such as your problem) is generally a safe bet.

Never be afraid to run your own time tests and race your compiler! It's a fun exercise and one that is ever increasingly difficult to achieve.

Construct a valarray (highly optimized in c++11 and later):

std::valarray<bool> valRequired(required, V);
std::valarray<bool> valVisited(visited, V);
valRequired &= valVisited;

Alternatively, you could do it with one line using transform:

std::transform(required[0], required[V-1], visited[0], required[0], [](bool r, bool v){ return r & v; })

Edit: while fewer lines is not faster, your compiler will likely vectorize this operation.

I also tested their timing:

int main(int argc, const char * argv[]) {
    auto clock = std::chrono::high_resolution_clock{};
    {
        bool visited[5] = {1,0,1,0,0};
        bool required[5] = {1,1,1,0,1};

        auto start = clock.now();
        for (int i = 0; i < 5; ++i) {
            required[i] &= visited[i];
        }
        auto end = clock.now();
        std::cout << "1: " << (end - start).count() << std::endl;
    }

    {
        bool visited[5] = {1,0,1,0,0};
        bool required[5] = {1,1,1,0,1};

        auto start = clock.now();
        for (int i = 0; i < 5; ++i) {
            required[i] = visited[i] & required[i];
        }
        auto end = clock.now();
        std::cout << "2: " << (end - start).count() << std::endl;
    }

    {
        bool visited[5] = {1,0,1,0,0};
        bool required[5] = {1,1,1,0,1};

        auto start = clock.now();
        std::transform(required, required + 4, visited, required, [](bool r, bool v){ return r & v; });
        auto end = clock.now();
        std::cout << "3: " << (end - start).count() << std::endl;
    }

    {
        bool visited[5] = {1,0,1,0,0};
        bool required[5] = {1,1,1,0,1};
        std::valarray<bool> valVisited(visited, 5);
        std::valarray<bool> valrequired(required, 5);

        auto start = clock.now();
        valrequired &= valVisited;
        auto end = clock.now();
        std::cout << "4: " << (end - start).count() << std::endl;
    }
}

Output:

1: 102
2: 55
3: 47
4: 45
Program ended with exit code: 0

edited Oct 3, 2015 at 21:42

answered Oct 3, 2015 at 20:49

Aidan Gomez

8,7056 gold badges31 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

Aidan Gomez Over a year ago

@RussSchultz and harold You're both right, it certainly isn't magic. In fact, compiler have the very non-magical insight into the standard library through it being a standard. Using it provides consistency and guarantee that the implementation will be one accessible and familiar to your compiler; unlike you trying to optimize your code the way you might in the 90s. Yes, you're both right, there are cases where one can foresee the compiler taking "the long route" being of an unfamiliar object structure, and one should handle those cases appropriately.

Aidan Gomez Over a year ago

As @harold pointed out, this is not one of those cases. However, I'll revise and note this in my answer. Let me know if you see any more room for improvement in my answer.

Russ Schultz Over a year ago

No, the compiler only could have insight into the interface of the STL, not the implementation (because the STL can be replaced, i.e. newlib vs. uclib, vs. vendor supplied). The compiler can't (or shouldn't) make any assumptions as to what's happening inside the calls to the STL, though I know some compilers for embedded products turn strcpy, strlen, memcpy, memset, etc. into inline assembly rather than calls into libc.

Russ Schultz Over a year ago

That being said, use STL first, then look for optimizations once your code works. Nothing slows a project down worse than getting stuck trying to right the exact optimal solution for every operation.

Aidan Gomez Over a year ago

@RussSchultz Interesting, it was my understanding that compilers provide their own implementation of the STL? After reading a couple pieces I see that it's the ABI that is bundled with them. Thanks for the insight!

|

user1196549 · Accepted Answer · 2015-10-04 13:52:49Z

0

In the line of @AlanStokes, use packed binary data and combine with the AVX instruction _mm512_and_epi64, 512 bits at a time. Be prepared for your hair messed up.

answered Oct 4, 2015 at 13:52

user1196549

Collectives™ on Stack Overflow

The fastest way to use a binary expression on array of booleans

2 Answers 2

12 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

12 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related