-2

I want to implement a very long boolean array (as a binary genome) and access some intervals to check if that interval is all true or not, and in addition I want to change some intervals value,

For example, I can create 4 representations:

boolean binaryGenome1[10e6]={false};
vector<bool> binaryGenome2; binaryGenome2.resize(10e6);
vector<char> binaryGenome3; binaryGenome3.resize(10e6);
bitset<10e6> binaryGenome4;

and access this way:

inline bool checkBinGenome(long long start , long long end){
  for(long long i = start; i < end+1 ; i++)
    if(binaryGenome[i] == false)
        return false;
  return true;
}
inline void changeBinGenome(long long start , long long end){
  for(long long i = start; i < end+1 ; i++)
    binaryGenome[i] = true;
}

vector<char> and normal boolean array (ass stores every boolean in a byte) both seem to be a poor choice as I need to be efficient in space. But what are the differences between vector<bool> and bitset?

Somewhere else I read that vector has some overhead as you can choose it's size and compile time - "overhead" for what - accessing? And how much is that overhead?

As I want to access array elements many times using CheckBinGenome() and changeBinGenome(), what is the fastest implementation?

7
  • 3
    You haven't done enough homework. std::vector<bool> is special - it is a space-efficient representation. Also look up std::bitset, which is also a space-efficient representation of an array of bool, except its size is fixed at compile time. Commented Sep 24, 2016 at 6:47
  • See stackoverflow.com/questions/3806469/bit-array-in-c Commented Sep 24, 2016 at 6:47
  • @Peter std::vector<bool> is considered a bad specialization, std::bitset is way better probably. Commented Sep 24, 2016 at 6:54
  • what about parallelizing the process? Commented Sep 24, 2016 at 7:17
  • 1
    @ πάντα ῥεῖ - std::vector<bool> is considered a bad specialisation, because it behaves differently from other standard containers in quite a few ways. However, if you work within its limits (in particular, don't expect it to play like other standard containers) it is still useful. Commented Sep 24, 2016 at 7:22

3 Answers 3

2

Use std::bitset It's the best.

Sign up to request clarification or add additional context in comments.

3 Comments

are accessing functions ran in constant time? i mean operator[] and .test
and what are the advantages of std::bitset in comparison with vecor<bool> ? thanks.
Could you please explain what characteristics make it the "best"? Is there some objective measure you've used, or is this just an opinion? AFAICS, the only meaningful difference between bitset and vector<bool> (yes, the bastard non-container specialization) is that the vector is resizable. Is there wording that ensures bitset must be more space-efficient than bool[]?
1

If the length of the data is known at compile time, consider std::array<bool> or std::bitset. The latter is likely to be more space-efficient (you'll have to measure whether the associated extra work in access times outweighs the speed gain from reducing cache pressure - that will depend on your workload).

If your array's length is not fixed, then you'll need a std::vector<bool> or std::vector<char>; there's also boost::dynamic_bitset but I've never used that.

If you will be changing large regions at once, as your sample implies, it may well be worth constructing your own representation and manipulating the underlying storage directly, rather than one bit at a time through the iterators. For example, if you use an array of char as the underlying representation, then setting a large range to 0 or 1 is mostly a memset() or std::fill() call, with computation only for the values at the start and end of the range. I'd start with a simple implementation and a good set of unit tests before trying anything like that.

It is (at least theoretically) possible that your Standard Library has specialized versions of algorithms for the iterators of std::vector<bool>, std::array<bool> and/or std::bitset that do exactly the above, or you may be able to write and contribute such specializations. That's a better path if possible - the world may thank you, and you'll have shared some of the maintenance responsibility.

Important note

If using std::array<bool>, you do need to be aware that, unlike other std::array<> instantiations, it does not implement the standard container semantics. That's not to say it shouldn't be used, but make sure you understand its foibles!

1 Comment

missing in your answer: what are differences between vector<bool> and bitset? more than fixed or no fixed length at compile time. can you explain differences in speed? why bitset is faster?
0

E.g., checking whether all the elements are true

I am really NOT sure whether this will give us more overheads than speedup or not. Actually I think that nowadays CPU can do this quite fast, are you really experiencing a poor performance? (or is this just a skeleton of your real problem?)

#include <omp.h>
#include <iostream>
#include <cstring>

using namespace std;

#define N 10000000
bool binaryGenome[N];

int main() {
    memset(binaryGenome, true, sizeof(bool) * N);
    int shouldBreak = 0;
    bool result = true;
    cout << result << endl;
    binaryGenome[9999995] = false;

    bool go = true;
    uint give = 0;
#pragma omp parallel
    {
        uint start, stop;

#pragma omp critical
        {
            start = give;
            give += N / omp_get_num_threads();
            stop = give;

            if (omp_get_thread_num() == omp_get_num_threads() - 1)
                stop = N;
        }

        while (start < stop && go) {
            if (!binaryGenome[start]) {
                cout << start << endl;
                go = false;
                result = false;
            }
            ++start;
        }
    }

    cout << result << endl;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.