1

I need to do something like this in the fastest way possible (O(1) would be perfect):

for (int j = 0; j < V; ++j)
        {
            if(!visited[j]) required[j]=0;
        }

I came up with this solution:

for (int j = 0; j < V; ++j)
        {
             required[j]=visited[j]&required[j];
        }

Which made the program run 3 times faster but I believe there is an even better way to do this. Am I right?

Btw. required and visited are dynamically allocated arrays

bool *required;
bool *visited;
required = new bool[V];
visited = new bool[V];
4
  • 1
    I doubt you could do it in O(1). Also be sure you are compiling with optimizations on. Commented Oct 3, 2015 at 20:43
  • 1
    You can do it in O(1) by not doing it at all. Compute values of required[j] lazily when they're actually needed. Commented Oct 3, 2015 at 20:45
  • The "fastest way" has little to do with big O's and whatnot, and more with the low level optimization that you were sort of already doing. Do you want more of that? Perhaps using SIMD intrinsics? Commented Oct 3, 2015 at 20:46
  • Do you have to use an array of bool? Something like a bitset (or dynamic_bitset) would be much faster, because one instruction can handle 32 or 64 booleans at once. Commented Oct 3, 2015 at 20:53

2 Answers 2

2

In the case where you're using a list of simple objects, you are most likely best suited using the functionality provided by the C++ Standard Library. Structures like valarray and vectors are recognized and optimized very effectively by all modern compilers.

Much debate exists as to how much you can rely on your compiler, but one guarantee is, your compiler was built alongside the standard library and relying on it for basic functionality (such as your problem) is generally a safe bet.

Never be afraid to run your own time tests and race your compiler! It's a fun exercise and one that is ever increasingly difficult to achieve.

Construct a valarray (highly optimized in c++11 and later):

std::valarray<bool> valRequired(required, V);
std::valarray<bool> valVisited(visited, V);
valRequired &= valVisited;

Alternatively, you could do it with one line using transform:

std::transform(required[0], required[V-1], visited[0], required[0], [](bool r, bool v){ return r & v; })

Edit: while fewer lines is not faster, your compiler will likely vectorize this operation.

I also tested their timing:

int main(int argc, const char * argv[]) {
    auto clock = std::chrono::high_resolution_clock{};
    {
        bool visited[5] = {1,0,1,0,0};
        bool required[5] = {1,1,1,0,1};

        auto start = clock.now();
        for (int i = 0; i < 5; ++i) {
            required[i] &= visited[i];
        }
        auto end = clock.now();
        std::cout << "1: " << (end - start).count() << std::endl;
    }

    {
        bool visited[5] = {1,0,1,0,0};
        bool required[5] = {1,1,1,0,1};

        auto start = clock.now();
        for (int i = 0; i < 5; ++i) {
            required[i] = visited[i] & required[i];
        }
        auto end = clock.now();
        std::cout << "2: " << (end - start).count() << std::endl;
    }

    {
        bool visited[5] = {1,0,1,0,0};
        bool required[5] = {1,1,1,0,1};

        auto start = clock.now();
        std::transform(required, required + 4, visited, required, [](bool r, bool v){ return r & v; });
        auto end = clock.now();
        std::cout << "3: " << (end - start).count() << std::endl;
    }

    {
        bool visited[5] = {1,0,1,0,0};
        bool required[5] = {1,1,1,0,1};
        std::valarray<bool> valVisited(visited, 5);
        std::valarray<bool> valrequired(required, 5);

        auto start = clock.now();
        valrequired &= valVisited;
        auto end = clock.now();
        std::cout << "4: " << (end - start).count() << std::endl;
    }
}

Output:

1: 102
2: 55
3: 47
4: 45
Program ended with exit code: 0
Sign up to request clarification or add additional context in comments.

12 Comments

@RussSchultz and harold You're both right, it certainly isn't magic. In fact, compiler have the very non-magical insight into the standard library through it being a standard. Using it provides consistency and guarantee that the implementation will be one accessible and familiar to your compiler; unlike you trying to optimize your code the way you might in the 90s. Yes, you're both right, there are cases where one can foresee the compiler taking "the long route" being of an unfamiliar object structure, and one should handle those cases appropriately.
As @harold pointed out, this is not one of those cases. However, I'll revise and note this in my answer. Let me know if you see any more room for improvement in my answer.
No, the compiler only could have insight into the interface of the STL, not the implementation (because the STL can be replaced, i.e. newlib vs. uclib, vs. vendor supplied). The compiler can't (or shouldn't) make any assumptions as to what's happening inside the calls to the STL, though I know some compilers for embedded products turn strcpy, strlen, memcpy, memset, etc. into inline assembly rather than calls into libc.
That being said, use STL first, then look for optimizations once your code works. Nothing slows a project down worse than getting stuck trying to right the exact optimal solution for every operation.
@RussSchultz Interesting, it was my understanding that compilers provide their own implementation of the STL? After reading a couple pieces I see that it's the ABI that is bundled with them. Thanks for the insight!
|
0

In the line of @AlanStokes, use packed binary data and combine with the AVX instruction _mm512_and_epi64, 512 bits at a time. Be prepared for your hair messed up.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.