Why Would I Prefer Using Vector to Deque

Why would I prefer using vector to deque

Elements in a deque are not contiguous in memory; vector elements are guaranteed to be. So if you need to interact with a plain C library that needs contiguous arrays, or if you care (a lot) about spatial locality, then you might prefer vector. In addition, since there is some extra bookkeeping, other ops are probably (slightly) more expensive than their equivalent vector operations. On the other hand, using many/large instances of vector may lead to unnecessary heap fragmentation (slowing down calls to new).

Also, as pointed out elsewhere on StackOverflow, there is more good discussion here: http://www.gotw.ca/gotw/054.htm .

Why prefer std::vector over std::deque?

why most people use vector instead of deque?

Because this is what they have been taught.

vector and deque serve slightly different purposes. They can both be used as a simply container of objects, if that's all you need. When learning to program C++, that is all most people need -- a bucket to drop stuff in to, get stuff out of, and walk over.

When StackOverflow is asked a question like "which container should I use by default," the answer is almost invariably vector. The question is generally asked from the context of learning to program in C++, and at the point where a programmer is asking such a question, they don't yet know what they don't know. And there's a lot they don't yet know. So, we (StackOverflow) need a container that fits almost every need for better or worse, can be used in almost any context, and doesn't require that the programmer has asked all the right questions before landing on something approximating the correct answer. Furthermore, the Standard specifically recommends using vector. vector isn't best for all uses, and in fact deque is better than vector for many common uses -- but it's not so much better for a learning programmer that we should vary from the Standard's advice to newbie C++ programmers, so StackOverflow lands on vector.

After having learned the basics of the syntax and, shall we say, the strategies behind programming in C++, programmers split in to two branches: those who care to learn more and write better programs, and those who don't. Those who don't will stick on vector forever. I think many programmers fall in to this camp.

The rarer programmers who try to move beyond this phase start asking other questions -- questions like you've asked here. They know there is lots they don't yet know, and they want to start discovering what those things are. They will quickly (or less quickly) discover that when choosing between vector and deque, some questions they didn't think to ask before are:

  1. Do I need the memory to be contigious?
  2. Do I need to avoid lots of reallocations?
  3. Do I need to keep valid iterators after insertions?
  4. Do I need my collection to be compatible with some ancient C-like function?

Then they really start thinking about the code they are writing, discover yet more stuff they don't know, and the beat goes on...

Why is std::vector so much more popular than std::deque?

I can't speak for anybody else, but I can for myself.

When I first read about std::deque, I thought it was cool enough that for a while, I treated it not only as the default container, but as nearly the only container I used.

Then somebody asked about why, and I expounded at length on its virtues and why it was the best container to use for practically everything, and how it was much more versatile than std::vector.

Fortunately, the guy questioning my decision on it was persuasive enough that I did some testing. Testing showed that in nearly every case, std::deque was slower than std::vector -- often by a substantial factor (e.g., around 2). In fact, of the code I'd written using std::deque, just replacing std::deque with std::vector gave a speedup in all but a handful of cases.

I have used std::deque in a few cases since then, but definitely don't treat it as the default any more. The simple fact is that in the usual implementation it's noticeably slower than std::vector for most purposes.

I should add, however, that I'm reasonably certain that with the right implementation, it could be nearly equivalent to std::vector in virtually all cases. Most use a representation that's undoubtedly great from an asymptotic viewpoint, but doesn't work out quite so wonderfully (for many purposes) in the real world.

Vector vs Deque operator[]

A std::vector<T> is a flat array of T elements. std::deque<T> is an array of equal sized arrays of T. The complexity of index access is O(1) in both cases but std::deque<T> needs to do a lot more work to determine the element to access and there is also an indication. Like, iterators on std::deque<T> need to do multiple checks (although algorithms could optimize this out mostly making the overhead rather small by recognizing the segmented structure. Thus, if you need to use the subscript operator often std::deque<T> may be a bad choice and the cost of moving elements in a std::vector<T> when inserting/deleting at the front may be offset.

Just to explain, roughly what std::deque<T> needs to do in case of using a subscript operator: it can be thought of doing these operations:

T& deque<T>::operator[](size_t n) {
n += this->first_element;
size_t subarray = n / size_of_subarrays;
size_t index = n % size_of_subarrays;
return this->subarrays[subarray][index];
}

The division/modulus operators are unlikely to be too expensive as size_of_subarray is almost certainly chosen to be a power of 2, i.e., they basically amount to bit operations.

Why is an STL deque not implemented as just a circular vector?

As cppreference writes

As opposed to std::vector, the elements of a deque are not stored contiguously: typical implementations use a sequence of individually allocated fixed-size arrays.

This means that the large internal reallocations std::vector occasionally does, are not performed by std::deque. When space runs out, only a small fixed-size array is added. (The same but reverse happens when space becomes too large because of erasing.)

Here is a small test:

#include <vector>
#include <deque>
#include <string>
#include <iostream>
#include <chrono>

using namespace std;

int main()
{
{
const auto start = chrono::high_resolution_clock::now();

vector<string> v;
for(size_t i = 0; i < 9999999; ++i)
v.push_back(string("hello"));

cout << chrono::duration_cast<chrono::milliseconds>(chrono::high_resolution_clock::now() - start).count() << endl;
}

{
const auto start = chrono::high_resolution_clock::now();

deque<string> v;
for(size_t i = 0; i < 9999999; ++i)
v.push_back(string("hello"));

cout << chrono::duration_cast<chrono::milliseconds>(chrono::high_resolution_clock::now() - start).count() << endl;
}

return 0;
}

On my machine, it shows deque is twice as fast as vector for this case:

$ ./a.out 
301
164

STL Containers - difference between vector, list and deque

Use deque if you need efficient insertion/removal at the beginning and end of the sequence and random access; use list if you need efficient insertion anywhere, at the sacrifice of random access. Iterators and references to list elements are very stable under almost any mutation of the container, while deque has very peculiar iterator and reference invalidation rules (so check them out carefully).

Also, list is a node-based container, while a deque uses chunks of contiguous memory, so memory locality may have performance effects that cannot be captured by asymptotic complexity estimates.

deque can serve as a replacement for vector almost everywhere and should probably have been considered the "default" container in C++ (on account of its more flexible memory requirements); the only reason to prefer vector is when you must have a guaranteed contiguous memory layout of your sequence.

Why are deques used as the underlying container for stacks by default, when vectors would do the trick?

I don't see any reason to prefer deques.

A reason to prefer deque that applies to the stack use case is that individual push back has worst case constant complexity compared to vector whose individual push back is linear in worst case (it has amortised constant complexity over multiple push backs). This was particularly significant prior to C++11 when reallocating vector had to copy the elements which could be very expensive. Consider case where the elements themselves are long strings.

Another reason to prefer deques is that they release memory as they shrink. Vectors don't. Hence, if you have a stack that temporarily grows large, then shrinks and remains small for the rest of the execution, then an underlying vector would be wasting a lot of memory.

Historically, when STL was designed and thus when the default was chosen, there used to also be issues with very large vectors because the size of the address space didn't exceed (significantly, or at all) the amount of memory (this was before 64 bit processing was common). The consequence of the limited address space was that memory fragmentation would make it expensive or impossible to allocate large contiguous blocks of memory that a large vector would require. Furthermore, the way that vector grows by deallocating old buffers is a behaviour that causes such fragmentation.

Why is deque using so much more RAM than vector in C++?

It all depends on the internal implementation of deque (I won't speak about vector since it is relatively straightforward).

Fact is, deque has completely different guarantees than vector (the most important one being that it supports O(1) insertion at both ends while vector only supports O(1) insertion at the back). This in turn means the internal structures managed by deque have to be more complex than vector.

To allow that, a typical deque implementation will split its memory in several non-contiguous blocks. But each individual memory block has a fixed overhead to allow the memory management to work (eg. whatever the size of the block, the system may need another 16 or 32 bytes or whatever in addition, just for bookkeeping). Since, contrary to a vector, a deque requires many small, independent blocks, the overhead stacks up which can explain the difference you see. Also note that those individual memory blocks need to be managed (maybe in separate structures?), which probably means some (or a lot of) additional overhead too.

As for a way to solve your problem, you could try what @BasileStarynkevitch suggested in the comments, this will indeed reduce your memory usage but it will get you only so far because at some point you'll still run out of memory. And what if you try to run your program on a machine that only has 256MB RAM? Any other solution which goal is to reduce your memory footprint while still trying to keep all your data in memory will suffer from the same problems.

A proper solution when handling large datasets like yours would be to adapt your algorithms and data structures in order to be able to handle small partitions at a time of your whole dataset, and load/save those partitions as needed in order to make room for the other partitions. Unfortunately since it probably means disk access, it also means a big drop in performance but hey, you can't eat the cake and have it too.



Related Topics



Leave a reply



Submit