What is the fastest implementation for accessing and changing a long array of boolean?

Question

I want to implement a very long boolean array (as a binary genome) and access some intervals to check if that interval is all true or not, and in addition I want to change some intervals value,

For example, I can create 4 representations:

boolean binaryGenome1[10e6]={false};
vector<bool> binaryGenome2; binaryGenome2.resize(10e6);
vector<char> binaryGenome3; binaryGenome3.resize(10e6);
bitset<10e6> binaryGenome4;

and access this way:

inline bool checkBinGenome(long long start , long long end){
  for(long long i = start; i < end+1 ; i++)
    if(binaryGenome[i] == false)
        return false;
  return true;
}
inline void changeBinGenome(long long start , long long end){
  for(long long i = start; i < end+1 ; i++)
    binaryGenome[i] = true;
}

vector<char> and normal boolean array (ass stores every boolean in a byte) both seem to be a poor choice as I need to be efficient in space. But what are the differences between vector<bool> and bitset?

Somewhere else I read that vector has some overhead as you can choose it's size and compile time - "overhead" for what - accessing? And how much is that overhead?

As I want to access array elements many times using CheckBinGenome() and changeBinGenome(), what is the fastest implementation?

You haven't done enough homework. std::vector<bool> is special - it is a space-efficient representation. Also look up std::bitset, which is also a space-efficient representation of an array of bool, except its size is fixed at compile time. — Peter
– Peter, Commented Sep 24, 2016 at 6:47
@Peter std::vector<bool> is considered a bad specialization, std::bitset is way better probably. — πάντα ῥεῖ
– πάντα ῥεῖ, Commented Sep 24, 2016 at 6:54
@ πάντα ῥεῖ - std::vector<bool> is considered a bad specialisation, because it behaves differently from other standard containers in quite a few ways. However, if you work within its limits (in particular, don't expect it to play like other standard containers) it is still useful. — Peter
– Peter, Commented Sep 24, 2016 at 7:22

vivek · Accepted Answer · 2016-09-24 06:46:02Z

2

Use std::bitset It's the best.

answered Sep 24, 2016 at 6:46

vivek

5,2496 gold badges27 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

ameerosein Over a year ago

are accessing functions ran in constant time? i mean operator[] and .test

ameerosein Over a year ago

and what are the advantages of std::bitset in comparison with vecor<bool> ? thanks.

Toby Speight Over a year ago

Could you please explain what characteristics make it the "best"? Is there some objective measure you've used, or is this just an opinion? AFAICS, the only meaningful difference between bitset and vector<bool> (yes, the bastard non-container specialization) is that the vector is resizable. Is there wording that ensures bitset must be more space-efficient than bool[]?

Community · Accepted Answer · 2020-06-20 09:12:55Z

If the length of the data is known at compile time, consider std::array<bool> or std::bitset. The latter is likely to be more space-efficient (you'll have to measure whether the associated extra work in access times outweighs the speed gain from reducing cache pressure - that will depend on your workload).

If your array's length is not fixed, then you'll need a std::vector<bool> or std::vector<char>; there's also boost::dynamic_bitset but I've never used that.

If you will be changing large regions at once, as your sample implies, it may well be worth constructing your own representation and manipulating the underlying storage directly, rather than one bit at a time through the iterators. For example, if you use an array of char as the underlying representation, then setting a large range to 0 or 1 is mostly a memset() or std::fill() call, with computation only for the values at the start and end of the range. I'd start with a simple implementation and a good set of unit tests before trying anything like that.

It is (at least theoretically) possible that your Standard Library has specialized versions of algorithms for the iterators of std::vector<bool>, std::array<bool> and/or std::bitset that do exactly the above, or you may be able to write and contribute such specializations. That's a better path if possible - the world may thank you, and you'll have shared some of the maintenance responsibility.

Important note

If using std::array<bool>, you do need to be aware that, unlike other std::array<> instantiations, it does not implement the standard container semantics. That's not to say it shouldn't be used, but make sure you understand its foibles!

missing in your answer: what are differences between vector<bool> and bitset? more than fixed or no fixed length at compile time. can you explain differences in speed? why bitset is faster?

Frederick Zhang · Accepted Answer · 2016-09-24 07:21:49Z

E.g., checking whether all the elements are true

I am really NOT sure whether this will give us more overheads than speedup or not. Actually I think that nowadays CPU can do this quite fast, are you really experiencing a poor performance? (or is this just a skeleton of your real problem?)

#include <omp.h>
#include <iostream>
#include <cstring>

using namespace std;

#define N 10000000
bool binaryGenome[N];

int main() {
    memset(binaryGenome, true, sizeof(bool) * N);
    int shouldBreak = 0;
    bool result = true;
    cout << result << endl;
    binaryGenome[9999995] = false;

    bool go = true;
    uint give = 0;
#pragma omp parallel
    {
        uint start, stop;

#pragma omp critical
        {
            start = give;
            give += N / omp_get_num_threads();
            stop = give;

            if (omp_get_thread_num() == omp_get_num_threads() - 1)
                stop = N;
        }

        while (start < stop && go) {
            if (!binaryGenome[start]) {
                cout << start << endl;
                go = false;
                result = false;
            }
            ++start;
        }
    }

    cout << result << endl;
}

Collectives™ on Stack Overflow

What is the fastest implementation for accessing and changing a long array of boolean?

3 Answers 3

3 Comments

Important note

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Important note

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related