Using Vector<Char> as a Buffer Without Initializing It on Resize()

Using vectorchar as a buffer without initializing it on resize()

There's nothing in the standard library that meets your requirements, and nothing I know of in boost either.

There are three reasonable options I can think of:

  • Stick with std::vector for now, leave a comment in the code and come back to it if this ever causes a bottleneck in your application.
  • Use a custom allocator with empty construct/destroy methods - and hope your optimiser will be smart enough to remove any calls to them.
  • Create a wrapper around a a dynamically allocated array, implementing only the minimal functionality that you require.

Is this behavior of vector::resize(size_type n) under C++11 and Boost.Container correct?

Not an answer, but a lengthy addendum to Howard's: I use an allocator adapter that basically works the same as Howard's allocator, but is safer since

  1. it only interposes on value-initialization and not all initializations,
  2. it correctly default-initializes.
// Allocator adaptor that interposes construct() calls to
// convert value initialization into default initialization.
template <typename T, typename A=std::allocator<T>>
class default_init_allocator : public A {
typedef std::allocator_traits<A> a_t;
public:
template <typename U> struct rebind {
using other =
default_init_allocator<
U, typename a_t::template rebind_alloc<U>
>;
};

using A::A;

template <typename U>
void construct(U* ptr)
noexcept(std::is_nothrow_default_constructible<U>::value) {
::new(static_cast<void*>(ptr)) U;
}
template <typename U, typename...Args>
void construct(U* ptr, Args&&... args) {
a_t::construct(static_cast<A&>(*this),
ptr, std::forward<Args>(args)...);
}
};

Is it good practice to use std::vector as a simple buffer?

  1. Sure, this'll work fine. The one thing you need to worry about is ensuring that the buffer is correctly aligned, if your class relies on a particular alignment; in this case you may want to use a vector of the datatype itself (like float).
  2. No, reserve is not necessary here; resize will automatically grow the capacity as necessary, in exactly the same way.
  3. Before C++03, technically not (but in practice yes). Since C++03, yes.

Incidentally, though, memcpy_s isn't the idiomatic approach here. Use std::copy instead. Keep in mind that a pointer is an iterator.

Starting in C++17, std::byte is the idiomatic unit of opaquely typed storage such as you are using here. char will still work, of course, but allows unsafe usages (as char!) which byte does not.

char* buffer in Vector implementation

The explanation is right there in the answer,

You are going to have a hard time making this work if the buffer is of type T. Every time you expand the buffer all the elements in the buffer will be initialized with T constructor. For int this is not a problem. But if T has a non trivial constructor then you are going to pay a heavy price initializing elements that may never be used.

When you can use

char* buffer_;

all the unused elements of buffer_ will contain uninitialized data but that's ok. You don't pay the price of initializing each object with a non-trivial constructor that you must pay when you use

T* buffer_;

@FrançoisAndrieux brings up anothe valid point. If T is not default constructible, you won't be able to use new T[capacity] to allocate memory.

Regarding your comment, an an array of char objects can be used to hold any object. You just have to allocate the appopriate number of char objects. Instead of capacity number of T objects, you'll have to allocate capacity*sizeof(T) number of char objects.

What is the advantage of using vectorchar as input buffer over char array?

A vector<char> is essentially just a managed character array.

So you can write:

{
vector<char> buf(4096);
...
int result = recv(fd, &buf[received_so_far], buf.size() - received_so_far);
...
}

The vector "knows" its size, so you can use buf.size() everywhere and never have to worry about overrunning your buffer. You can also change the size in the declaration and have it take effect everywhere without any messy #defines.

This use of buf will allocate the underlying array on the heap, and it will free it automatically when buf goes out of scope, no matter how that happens (e.g. exceptions or early return). So you get the nice semantics of stack allocation while still keeping large objects on the heap.

You can use buf.swap() to "hand ownership" of the underlying character array to another vector<char> very efficiently. (This is a good idea for network traffic... Modern networks are fast. The last thing you want to do is to create yet another copy of every byte you receive from the network.) And you still do not have to worry about explicitly freeing the memory.

Those are the big advantages that come to mind off the top of my head for this particular application.

Surprisingly inefficent custom allocator for vectorchar

Updated

This is a complete rewrite. There was an error in the original post/my answer which made me benchmark the same allocator twice. Oops.

Well, I can see huge differences in performance. I have made the following test bed, which takes several precautions to ensure crucial stuff isn't completely optimized out. I then verified (with -O0 -fno-inline) that the allocator's construct and destruct calls are getting called the expected number of times (yes):

#include <vector>
#include <cstdlib>

template<typename T>
struct MyAllocator : public std::allocator<T> {
typedef std::allocator<T> Alloc;
//void destroy(Alloc::pointer p) {} // pre-c+11
//void construct(Alloc::pointer p, Alloc::const_reference val) {} // pre-c++11
template< class U > void destroy(U* p) {}
template< class U, class... Args > void construct(U* p, Args&&... args) {}
template<typename U> struct rebind {typedef MyAllocator other;};
};

int main()
{
typedef char T;
#ifdef OWN_ALLOCATOR
std::vector<T, MyAllocator<T> > v;
#else
std::vector<T> v;
#endif
volatile unsigned long long x = 0;
v.reserve(1000000); // or more. Make sure there is always enough allocated memory
for(auto i=0ul; i< 1<<18; i++) {
v.resize(1000000);
x += v[rand()%v.size()];//._x;
v.clear(); // or v.resize(0);
};
}

The timing difference is marked:

g++ -g -O3 -std=c++0x -I ~/custom/boost/ test.cpp -o test 

real 0m9.300s
user 0m9.289s
sys 0m0.000s

g++ -g -O3 -std=c++0x -DOWN_ALLOCATOR -I ~/custom/boost/ test.cpp -o test

real 0m0.004s
user 0m0.000s
sys 0m0.000s

I can only assume that what you are seeing is related to the standard library optimizing allocator operations for char (it being a POD type).

The timings get even farther apart when you use

struct NonTrivial
{
NonTrivial() { _x = 42; }
virtual ~NonTrivial() {}
char _x;
};

typedef NonTrivial T;

In this case, the default allocator takes in excess of 2 minutes (still running).
whereas the 'dummy' MyAllocator spends ~0.006s. (Note that this invokes undefined behaviour referencing elements that haven't been properly initialized.)

C++ vector that *doesn't* initialize its members?

For default and value initialization of structs with user-provided default constructors which don't explicitly initialize anything, no initialization is performed on unsigned char members:

struct uninitialized_char {
unsigned char m;
uninitialized_char() {}
};

// just to be safe
static_assert(1 == sizeof(uninitialized_char), "");

std::vector<uninitialized_char> v(4 * (1<<20));

GetMyDataFromC(reinterpret_cast<unsigned char*>(&v[0]), v.size());

I think this is even legal under the strict aliasing rules.

When I compared the construction time for v vs. a vector<unsigned char> I got ~8 µs vs ~12 ms. More than 1000x faster. Compiler was clang 3.2 with libc++ and flags: -std=c++11 -Os -fcatch-undefined-behavior -ftrapv -pedantic -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-missing-prototypes

C++11 has a helper for uninitialized storage, std::aligned_storage. Though it requires a compile time size.


Here's an added example, to compare total usage (times in nanoseconds):

VERSION=1 (vector<unsigned char>):

clang++ -std=c++14 -stdlib=libc++ main.cpp -DVERSION=1 -ftrapv -Weverything -Wno-c++98-compat -Wno-sign-conversion -Wno-sign-compare -Os && ./a.out

initialization+first use: 16,425,554
array initialization: 12,228,039
first use: 4,197,515
second use: 4,404,043

VERSION=2 (vector<uninitialized_char>):

clang++ -std=c++14 -stdlib=libc++ main.cpp -DVERSION=2 -ftrapv -Weverything -Wno-c++98-compat -Wno-sign-conversion -Wno-sign-compare -Os && ./a.out

initialization+first use: 7,523,216
array initialization: 12,782
first use: 7,510,434
second use: 4,155,241


#include <iostream>
#include <chrono>
#include <vector>

struct uninitialized_char {
unsigned char c;
uninitialized_char() {}
};

void foo(unsigned char *c, int size) {
for (int i = 0; i < size; ++i) {
c[i] = '\0';
}
}

int main() {
auto start = std::chrono::steady_clock::now();

#if VERSION==1
using element_type = unsigned char;
#elif VERSION==2
using element_type = uninitialized_char;
#endif

std::vector<element_type> v(4 * (1<<20));

auto end = std::chrono::steady_clock::now();

foo(reinterpret_cast<unsigned char*>(v.data()), v.size());

auto end2 = std::chrono::steady_clock::now();

foo(reinterpret_cast<unsigned char*>(v.data()), v.size());

auto end3 = std::chrono::steady_clock::now();

std::cout.imbue(std::locale(""));
std::cout << "initialization+first use: " << std::chrono::nanoseconds(end2-start).count() << '\n';
std::cout << "array initialization: " << std::chrono::nanoseconds(end-start).count() << '\n';
std::cout << "first use: " << std::chrono::nanoseconds(end2-end).count() << '\n';
std::cout << "second use: " << std::chrono::nanoseconds(end3-end2).count() << '\n';
}

I'm using clang svn-3.6.0 r218006

Load data into std::vectorchar efficiently

Is there any way to tell the v2 vector that its internal memory buffer is loaded with data?

No.

The behaviour of your second example is undefined.

This would be useful when you want to use a std::vector to hold data that is sourced from a file stream.

You can read a file into vector like this:

std::vector<char> v3(count);
ifs.read(v3.data(), count);

Or like this:

using It = std::istreambuf_iterator<char>;
std::vector<char> v4(It{ifs}, It{});


Related Topics



Leave a reply



Submit