How Is C++ Std::Vector Implemented

What's the best c implementation of the C++ vector?

EDIT

This answer is from a million years ago, but at some point, I actually implemented a macro-based, efficient, type-safe vector work-alike in C that covers all the typical features and needs. You can find it here:

https://github.com/eteran/c-vector

Original answer below.


What about a vector are you looking to replicate? I mean in the end, it all boils down to something like this:

int *create_vector(size_t n) {
return malloc(n * sizeof(int));
}

void delete_vector(int *v) {
free(v);
}

int *resize_vector(int *v, size_t n) {
return realloc(v, n * sizeof(int));
/* returns NULL on failure here */
}

You could wrap this all up in a struct, so it "knows its size" too, but you'd have to do it for every type (macros here?), but that seems a little uneccessary... Perhaps something like this:

typedef struct {
size_t size;
int *data;
} int_vector;

int_vector *create_vector(size_t n) {
int_vector *p = malloc(sizeof(int_vector));
if(p) {
p->data = malloc(n * sizeof(int));
p->size = n;
}
return p;
}

void delete_vector(int_vector *v) {
if(v) {
free(v->data);
free(v);
}
}

size_t resize_vector(int_vector *v, size_t n) {
if(v) {
int *p = realloc(v->data, n * sizeof(int));
if(p) {
v->data = p;
v->size = n;
}
return v->size;
}
return 0;
}

int get_vector(int_vector *v, size_t n) {
if(v && n < v->size) {
return v->data[n];
}
/* return some error value, i'm doing -1 here,
* std::vector would throw an exception if using at()
* or have UB if using [] */
return -1;
}

void set_vector(int_vector *v, size_t n, int x) {
if(v) {
if(n >= v->size) {
resize_vector(v, n);
}
v->data[n] = x;
}
}

After which, you could do:

int_vector *v = create_vector(10);
set_vector(v, 0, 123);

I dunno, it just doesn't seem worth the effort.

How is vector implemented in C++

it is a simple templated class which wraps a native array. It does not use malloc/realloc. Instead, it uses the passed allocator (which by default is std::allocator).

Resizing is done by allocating a new array and copy constructing each element in the new array from the old one (this way it is safe for non-POD objects). To avoid frequent allocations, often they follow a non-linear growth pattern.

UPDATE: in C++11, the elements will be moved instead of copy constructed if it is possible for the stored type.

In addition to this, it will need to store the current "size" and "capacity". Size is how many elements are actually in the vector. Capacity is how many could be in the vector.

So as a starting point a vector will need to look somewhat like this:

template <class T, class A = std::allocator<T> >
class vector {
public:
// public member functions
private:
T* data_;
typename A::size_type capacity_;
typename A::size_type size_;
A allocator_;
};

The other common implementation is to store pointers to the different parts of the array. This cheapens the cost of end() (which no longer needs an addition) ever so slightly at the expense of a marginally more expensive size() call (which now needs a subtraction). In which case it could look like this:

template <class T, class A = std::allocator<T> >
class vector {
public:
// public member functions
private:
T* data_; // points to first element
T* end_capacity_; // points to one past internal storage
T* end_; // points to one past last element
A allocator_;
};

I believe gcc's libstdc++ uses the latter approach, but both approaches are equally valid and conforming.

NOTE: This is ignoring a common optimization where the empty base class optimization is used for the allocator. I think that is a quality of implementation detail, and not a matter of correctness.

Does the std::vector implementation use an internal array or linked list or other?

From the C++11 standard, in the "sequence containers" library section (emphasis mine):

[23.3.6.1 Class template vector overview][vector.overview]

A vector is a sequence container that supports (amortized) constant time insert and erase operations at the
end; insert and erase in the middle take linear time. Storage management is handled automatically, though
hints can be given to improve efficiency.

This does not defeat the purpose of dynamic size -- part of the point of vector is that not only is it very fast to access a single element, but scanning over the vector has very good memory locality because everything is tightly packed together. In practice, having good memory locality is very important because it greatly reduces cache misses, which has a large impact on runtime. This is a major advantage of vector over list in many situations, particularly those where you need to iterate over the entire container more often than you need to add or remove elements.

How is std::vector::swap implemented?

A simplified, minimal vector implementation might have something like the following members to manage the data in the vector:

template <typename T>
class simple_vector
{
public:
// ...

static
void swap(simple_vector<T>& x, simple_vector<T>& y);

private:
T* elements; // a pointer to the block of memory holding the elements in the vector
size_t count; // number of 'active' elements
size_t allocated; // number of elements allocated (not all of which are necessarily used at the moment)
};

A swap() operation would just swap the 'guts' of each simplified_vector, leaving all of the dynamically allocated buffers (and the elements contained in them) in place. Only the pointers to those dynamic allocations get move around:

template <typename T>
void simple_vector<T>::swap(simple_vector<T>& x, simple_vector<T>& y)
{
T* tmp_elements = x.elements;
size_t tmp_count = x.count;
size_t tmp_allocated = x.allocated;

x.elements = y.elements;
x.count = y.count;
x.allocated = y.allocated;

y.elements = tmp_elements;
y.count = tmp_count;
y.allocated = tmp_allocated;
}

Note that the actual std::vector implementation might use techniques that aren't exactly the same (such as move constructing a temporary) as this simple example, but I think it conveys the general concept.

how c++ vector works

Actually, the implementation of vector is visible, since it's a template, so you can look into that for details:

iterator erase(const_iterator _Where)
{ // erase element at where
if (_Where._Mycont != this
|| _Where._Myptr < _Myfirst || _Mylast <= _Where._Myptr)
_DEBUG_ERROR("vector erase iterator outside range");
_STDEXT unchecked_copy(_Where._Myptr + 1, _Mylast, _Where._Myptr);
_Destroy(_Mylast - 1, _Mylast);
_Orphan_range(_Where._Myptr, _Mylast);
--_Mylast;
return (iterator(_Where._Myptr, this));
}

Basically, the line

unchecked_copy(_Where._Myptr + 1, _Mylast, _Where._Myptr);

does exactly what you thought - copies the following elements over (or moves them in C++11 as bames53 pointed out).

To answer your second question, no, the capacity cannot decrease on its own.

The complexities of the algorithms in std can be found at http://www.cplusplus.com/reference/stl/ and the implementation, as previously stated, is visible.

How are STL List and Vector implemented?

Hash table or binary tree? Why?

std::vector, as the name itself suggests, is implemented with a normal dynamically-allocated array, that is reallocated when its capacity is exhausted (usually doubling its size or something like that).

std::list instead is (usually1) implemented with a doubly-linked list.

The binary tree you mentioned is the usual implementation of std::map; the hash table instead is generally used for the unordered_map container (available in the upcoming C++0x standard).



  1. "Usually" because the standard do not mandate a particular implementation, but specifies the asymptotic complexity of its methods, and such constraints are met easily with a doubly-linked list.
    On the other hand, for std::vector the "contiguous space" requirement is enforced by the standard (from C++03 onwards), so it must be some form of dynamically allocated array.

How is std::vector insert implemented? C++

As a general rule, the standard containers separate allocation
from initialization (as should any containers you write too).
The standard containers use a very complex mechanism in order to
allow customization of both allocation and initialization, but
in the default case, it boils down to using the
operator new/operator delete functions to allocate the memory,
placement new to initialize it, and an explicit call to the
destructor to destroy the objects. In other words, insteaad of
the sequence:

p = new T[n];
// ...
delete [] p;

it uses:

p = operator new( n * sizeof( T ) );
for ( int i = 0; i != n; ++ i ) {
new ( p + i ) T( otherValue );
}

// ...
for ( int i = 0; i != n; ++ i ) {
p->~T();
}
operator delete( p );

(This is a radical simplification, to show the basic concept.
In practice, it will be more complex, for reasons of exception
safety, for example.)



Related Topics



Leave a reply



Submit