12

I have got here two programs with me, both are doing exactly the same task. They are just setting an boolean array / vector to the value true. The program using vector takes 27 seconds to run whereas the program involving array with 5 times greater size takes less than 1 s. I would like to know the exact reason as to why there is such a major difference ? Are vectors really that inefficient ?

Program using vectors

#include <iostream>
#include <vector>
#include <ctime>

using namespace std;

int main(){
 const int size = 2000;
 time_t start, end;
 time(&start);
 vector<bool> v(size);
 for(int i = 0; i < size; i++){
  for(int j = 0; j < size; j++){
   v[i] = true;
  }
 }
 time(&end);
 cout<<difftime(end, start)<<" seconds."<<endl;
}

Runtime - 27 seconds

Program using Array

#include <iostream>
#include <ctime>

using namespace std;

int main(){
 const int size = 10000; // 5 times more size
 time_t start, end;
 time(&start);
 bool v[size];
 for(int i = 0; i < size; i++){
  for(int j = 0; j < size; j++){
   v[i] = true;
  }
 }
 time(&end);
 cout<<difftime(end, start)<<" seconds."<<endl;
}

Runtime - < 1 seconds

Platform - Visual Studio 2008 OS - Windows Vista 32 bit SP 1 Processor Intel(R) Pentium(R) Dual CPU T2370 @ 1.73GHz Memory (RAM) 1.00 GB

Thanks

Amare

3
  • 5
    std::vector<bool> is not a container. Read this: gotw.ca/publications/mill09.htm Commented Dec 19, 2009 at 8:14
  • Important note: Even though you arrive at the correct conclusion, you're not performing a proper comparison. You perform N^2 iterations of the innermost loop (the v[i] = true statement), but N is 2000 in one test and 10000 in the other, so you're really doing 25 times as much work, not 5 times as much, aside from the difference between a vector and a plain array. This actually makes the difference even more pronounced. Commented Dec 19, 2009 at 16:38
  • 1
    @user235022 Have you meant v[j] = true; instead of v[i] = true? Otherwise it should be very simple for compiler to optimize internal loop out, since your actions do not depend on the loop variable. Commented Nov 30, 2011 at 21:47

4 Answers 4

41

You're using std::vector of bool and that's not what you think it is!

vector of bool is a bastard child template specialization that should never have existed and actually stores 1 bool in each bit. Accesses into it are more complicated due to masking and shifting logic, so will definitely be somewhat slower.

Click here for some info on vector of bool.

Also, you may be running an unoptimized build (almost certainly given the times you listed, 27 seconds is outrageous for 4 million iterations). The Standard Template Library relies very heavily on the optimizer to do things like inline function calls and elide temporaries. Lack of this optimization causes an especially heavy performance degradation for vector of bool because it has to return a proxy object when you index into it, because you can't take the address of a bit, so operator [] can't return a reference.

Click here for more info on proxied containers (The last half is about vector of bool)

In addition many STL implementations have helpful debugging bits that are not part of the standard which help you catch bugs, but really drag performance down. You'll want to make sure those are disabled in your optimized build.

Once you turn on the optimizer, have the right settings (i.e. no STL debugging turned on), and are actually testing the same thing in both loops, you will see almost no difference.

I had to make your loop much larger to test on my machine, but here's two builds of your vector of bool loop on my machine, showing the impact of optimizer flags on STL code

$ g++ main.cpp 
$ ./a.out 
17 seconds.
$ g++ -O2 main.cpp 
$ ./a.out 
1 seconds.
Sign up to request clarification or add additional context in comments.

4 Comments

Ya i also think so. I run the same scenario and it took almost same time.
VC2005+ in particular has bound checking and iterator validation checks for debug builds for all STL objects.
Thanks don.neufeld, your explanation as well as the link are really helpful. Good to learn something new :-)
You should also try out a std::deque<bool>, it does not suffer from the template specialization error that was made with the vector.
2

Have a look at this

Using arrays or std::vectors in C++, what’s the performance gap?

Comments

1

std::vector<bool> is optimized for memory consumption rather than performance.

You may trick it by using std::vector<int>. You should not have performance drawbacks then.

4 Comments

Fixed your post to use code formatting. The angle brackets disappeared without it
Instead of vector<int> I'd suggest vector<char> (or unsigned char, or if the compiler supports it std::uint8_t). No reason to use more space than you need. But definitely not vector<bool>.
A reason to use more space is better speed on most 32-bit architectures.
The speed difference between int and char probably isn't worth using 4x more memory for (especially since cache misses hurt much more).
1

Other answers are very good, but you could easily answer it yourself by this method.

ADDED: In response to comments, let me show you what I mean. I'm running VC On Windows, but this works on any language/OS. I took your first program and increased size to 20000 so it would run long enough. Then while it was running, I took several stackshots. They all look like this:

std::vector<bool,std::allocator<bool> >::begin() line 93 + 25 bytes
std::vector<bool,std::allocator<bool> >::operator[]() line 132 + 37 bytes
main() line 24 + 12 bytes
mainCRTStartup() line 206 + 25 bytes
KERNEL32! 7c817077()

So what that says is that it is spending essentially all of it's time in the indexing operation on line 24, and the reason it's spending that time is that the [] operator is calling the begin operator. In more detail:

main() line 24 + 12 bytes

is this code:

for(int j = 0; j < size; j++){
==> v[i] = true;
}

that calls:

std::vector<bool,std::allocator<bool> >::operator[]() line 132 + 37 bytes

which is this code (that I reformatted a little bit):

reference operator[](size_type _P){
==> return (*(begin() + _P));
}

that calls:

std::vector<bool,std::allocator<bool> >::begin() line 93 + 25 bytes

which is doing this (in more detail):

92:       iterator begin()
93:           {return (_First); }
00402890   push        ebp
00402891   mov         ebp,esp
00402893   sub         esp,44h
00402896   push        ebx
00402897   push        esi
00402898   push        edi
00402899   push        ecx
0040289A   lea         edi,[ebp-44h]
0040289D   mov         ecx,11h
004028A2   mov         eax,0CCCCCCCCh
004028A7   rep stos    dword ptr [edi]
004028A9   pop         ecx    <===============
004028AA   mov         dword ptr [ebp-4],ecx
004028AD   mov         eax,dword ptr [ebp-4]
004028B0   mov         eax,dword ptr [eax+4]
004028B3   pop         edi
004028B4   pop         esi
004028B5   pop         ebx
004028B6   mov         esp,ebp
004028B8   pop         ebp
004028B9   ret

What it is doing is writing 68 bytes of 0xCC on the stack (for some debug reason) as part of getting the begin address of the vector, as part of computing the address of v[i], prior to doing the assignment.

The fraction of time it spends doing this is near 100%, because it was doing it on every one of the several samples that were taken. Could you have guessed that that's what it was spending nearly all of its time doing? I couldn't have.

This is, of course, a Debug build. If you switch to a Release build, but turn on debugging information, all these functions get inlined and optimized away, so it goes like 30 times faster, and again the stackshots tell exactly what it's doing.

So - people can tell you what it might be doing, but this shows how to find out for yourself what it is really doing.

On your environment it will undoubtedly be different.

3 Comments

yes indeed. Rather than understanding the properties of standard library datastructures, point him to information on how to profile your code in another OS than he's actually using. And if you've ever tried to debug or profile or otherwise read standard library containers, you'll know that it's not exactly easily readable. Profiling might tell you which lines of code causes the slowdown, but it may not answer the question of what's going on.
@jalf: Come on. It's independent of OS, and profiling as it's commonly understood may not tell you what's going on, but stackshots will tell you exactly what's going on as long as there's source code of the libraries.
... It's the old saw about giving someone a fish versus teaching them to fish.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.