Alternative to Vector≪Bool≫

Alternative to vector<bool>

Use std::deque if you don't need the array, yes.

Otherwise use an alternative vector that doesn't specialize on bool, such as the one in Boost Container.

Why isn't vector<bool> a STL container?

For space-optimization reasons, the C++ standard (as far back as C++98) explicitly calls out vector<bool> as a special standard container where each bool uses only one bit of space rather than one byte as a normal bool would (implementing a kind of "dynamic bitset"). In exchange for this optimization it doesn't offer all the capabilities and interface of a normal standard container.

In this case, since you can't take the address of a bit within a byte, things such as operator[] can't return a bool& but instead return a proxy object that allows to manipulate the particular bit in question. Since this proxy object is not a bool&, you can't assign its address to a bool* like you could with the result of such an operator call on a "normal" container. In turn this means that bool *pb =&v[0]; isn't valid code.

On the other hand deque doesn't have any such specialization called out so each bool takes a byte and you can take the address of the value return from operator[].

Finally note that the MS standard library implementation is (arguably) suboptimal in that it uses a small chunk size for deques, which means that using deque as a substitute isn't always the right answer.

How to prevent specialization of std::vector<bool>

You simply cannot have templated code behave regularly for T equal to bool if your data is represented by std::vector<bool> because this is not a container. As pointed out by @Mark Ransom, you could use std::vector<char> instead, e.g. through a trait like this

template<typename T> struct vector_trait { typedef std::vector<T> type; };
template<> struct vector_trait<bool> { typedef std::vector<char> type; };

and then use typename vector_trait<T>::type wherever you currently use std::vector<T>. The disadvantage here is that you need to use casts to convert from char to bool.

An alternative as suggested in your own answer is to write a wrapper with implicit conversion and constructor

template<typename T>
class wrapper
{
public:
wrapper() : value_(T()) {}
/* explicit */ wrapper(T const& t): value_(t) {}
/* explicit */ operator T() { return value_; }
private:
T value_;
};

and use std::vector< wrapper<bool> > everywhere without ever having to cast. However, there are also disadvantages to this because standard conversion sequences containing real bool parameters behave differently than the user-defined conversions with wrapper<bool> (the compiler can at most use 1 user-defined conversion, and as many standard conversions as necessary). This means that template code with function overloading can subtly break. You could uncomment the explicit keywords in the code above but that introduces the verbosity again.

Is there a standard way to replace a C-style bool array?

You can use std::unique_ptr<T[]> and std::make_unique:

int main()
{
int somenumber = 6;
// somenumber is set to some value here

auto isBitXSet = std::make_unique<bool[]>(somenumber);
// initialisation of isBitXSet.

legacyFunction(somenumber, isBitXSet.get());

return 0;
}

Alternatively, you can "trick" std::vector by creating your own bool wrapper:

struct my_bool { bool _b; };
std::vector<my_bool> v; // will not use `vector<bool>` specialization

If you know the size of your array at compile-time, consider using std::array.

Is the use of std::vector<bool> objects in C++ acceptable, or should I use an alternative?

There's nothing wrong with vector<bool>, except that it isn't equivalent to a vector<T> were T is the integer type equivalent to bool. This only shows in performance (CPUs only access bytes at a time, where in a vector<bool> every element is stored in one bit) and memory access (a reference to a first element of a vector<bool> is not equivalent to an array like with any other vector<T>.

It is part of the Standard, unfortunately: see section 23.3.7 (C++0x FDIS).

If std::vector<bool> was rewritten to use the standard vector implementation, how would that break old software?

Firstly, that would be an ABI break. A major reason why changing anything from the standard library is difficult.

Secondly, anything using flip would break:

#include <vector>

int main() {
std::vector<bool> vec { true };
vec[0].flip(); // can't be done with regular bool
}

Thirdly, there would probably be other problems due to overload resolution someone probably relies on.


Side note: You can always use boost::container::vector instead, which is not specialized for bool.

Ignore a template spezialization and explicitly use the unspezialized template (std::vector<bool>)

The answer is NO, because there is no guarantee an unspecialized template even exists.

In particular, it's quite reasonable for a compiler to have a specializations for all built-in types. So you typically see a compile-time dispatch: vector<T> forwards to __VectorImplBuiltIn<T> whenever T is a built-in type, and __VectorImplBuiltIn<T> in turn is specialized for each individual case. So vector<float> may use AVX to copy 4 floats at a time, and vector<bool> is packed.

Now there is no portable way to name that specific implementation class (it's literally an implementation detail) and that unspecialized __VectorImplBuiltIn<T> wouldn't even have a generic implementation because the compiler vendor would obviously know all built-in types.

In this scheme, the "normal" vector<T> would expand to __VectorImplClassType<T>, and that could be unusuable for T==bool because bool doesn't have a constructor.

So a perfectly reasonable scheme where std::vector<T> starts out by distinguishing types with and without constructor will make your idea impossible. And therefore it's no surprise that the ISO C++ Standard doesn't allow what you want.

Is using a vector of boolean values slower than a dynamic bitset?

A great deal here depends on how many Boolean values you're working with.

Both bitset and vector<bool> normally use a packed representation where a Boolean is stored as only a single bit.

On one hand, that imposes some overhead in the form of bit manipulation to access a single value.

On the other hand, that also means many more of your Booleans will fit in your cache.

If you're using a lot of Booleans (e.g., implementing a sieve of Eratosthenes) fitting more of them in the cache will almost always end up a net gain. The reduction in memory use will gain you a lot more than the bit manipulation loses.

Most of the arguments against std::vector<bool> come back to the fact that it is not a standard container (i.e., it does not meet the requirements for a container). IMO, this is mostly a question of expectations -- since it says vector, many people expect it to be a container (other types of vectors are), and they often react negatively to the fact that vector<bool> isn't a container.

If you're using the vector in a way that really requires it to be a container, then you probably want to use some other combination -- either deque<bool> or vector<char> can work fine. Think before you do that though -- there's a lot of (lousy, IMO) advice that vector<bool> should be avoided in general, with little or no explanation of why it should be avoided at all, or under what circumstances it makes a real difference to you.

Yes, there are situations where something else will work better. If you're in one of those situations, using something else is clearly a good idea. But, be sure you're really in one of those situations first. Anybody who tells you (for example) that "Herb says you should use vector<char>" without a lot of explanation about the tradeoffs involved should not be trusted.

Let's give a real example. Since it was mentioned in the comments, let's consider the Sieve of Eratosthenes:

#include <vector>
#include <iostream>
#include <iterator>
#include <chrono>

unsigned long primes = 0;

template <class bool_t>
unsigned long sieve(unsigned max) {
std::vector<bool_t> sieve(max, false);
sieve[0] = sieve[1] = true;

for (int i = 2; i < max; i++) {
if (!sieve[i]) {
++primes;
for (int temp = 2 * i; temp < max; temp += i)
sieve[temp] = true;
}
}
return primes;
}

// Warning: auto return type will fail with older compilers
// Fine with g++ 5.1 and VC++ 2015 though.
//
template <class F>
auto timer(F f, int max) {
auto start = std::chrono::high_resolution_clock::now();
primes += f(max);
auto stop = std::chrono::high_resolution_clock::now();

return stop - start;
}

int main() {
using namespace std::chrono;

unsigned number = 100000000;

auto using_bool = timer(sieve<bool>, number);
auto using_char = timer(sieve<char>, number);

std::cout << "ignore: " << primes << "\n";
std::cout << "Time using bool: " << duration_cast<milliseconds>(using_bool).count() << "\n";
std::cout << "Time using char: " << duration_cast<milliseconds>(using_char).count() << "\n";
}

We've used a large enough array that we can expect a large portion of it to occupy main memory. I've also gone to a little pain to ensure that the only thing that changes between one invocation and the other is the use of a vector<char> vs. vector<bool>. Here are some results. First with VC++ 2015:

ignore: 34568730
Time using bool: 2623
Time using char: 3108

...then the time using g++ 5.1:

ignore: 34568730
Time using bool: 2359
Time using char: 3116

Obviously, the vector<bool> wins in both cases--by around 15% with VC++, and over 30% with gcc. Also note that in this case, I've chosen the size to show vector<char> in quite favorable light. If, for example, I reduce number from 100000000 to 10000000, the time differential becomes much larger:

ignore: 3987474
Time using bool: 72
Time using char: 249

Although I haven't done a lot of work to confirm, I'd guess that in this case, the version using vector<bool> is saving enough space that the array fits entirely in the cache, while the vector<char> is large enough to overflow the cache, and involve a great deal of main memory access.



Related Topics



Leave a reply



Submit