Should C++ Programmer Avoid Memset

Should C++ programmer avoid memset?

The issue is not so much using memset() on the built-in types, it is using them on class (aka non-POD) types. Doing so will almost always do the wrong thing and frequently do the fatal thing - it may, for example, trample over a virtual function table pointer.

What is the advantage of using memset() in C

This applies to both memset() and memcpy():

  1. Less Code: As you have already mentioned, it's shorter - fewer lines of code.
  2. More Readable: Shorter usually makes it more readable as well. (memset() is more readable than that loop)
  3. It can be faster: It can sometimes allow more aggressive compiler optimizations. (so it may be faster)
  4. Misalignment: In some cases, when you're dealing with misaligned data on a processor that doesn't support misaligned accesses, memset() and memcpy() may be the only clean solution.

To expand on the 3rd point, memset() can be heavily optimized by the compiler using SIMD and such. If you write a loop instead, the compiler will first need to "figure out" what it does before it can attempt to optimize it.

The basic idea here is that memset() and similar library functions, in some sense, "tells" the compiler your intent.


As mentioned by @Oli in the comments, there are some downsides. I'll expand on them here:

  1. You need to make sure that memset() actually does what you want. The standard doesn't say that zeros for the various datatypes are necessarily zero in memory.
  2. For non-zero data, memset() is restricted to only 1 byte content. So you can't use memset() if you want to set an array of ints to something other than zero (or 0x01010101 or something...).
  3. Although rare, there are some corner cases, where it's actually possible to beat the compiler in performance with your own loop.*

*I'll give one example of this from my experience:

Although memset() and memcpy() are usually compiler intrinsics with special handling by the compiler, they are still generic functions. They say nothing about the datatype including the alignment of the data.

So in a few (abeit rare) cases, the compiler isn't able to determine the alignment of the memory region, and thus must produce extra code to handle misalignment. Whereas, if you the programmer, is 100% sure of alignment, using a loop might actually be faster.

A common example is when using SSE/AVX intrinsics. (such as copying a 16/32-byte aligned array of floats) If the compiler can't determine the 16/32-byte alignment, it will need to use misaligned load/stores and/or handling code. If you simply write a loop using SSE/AVX aligned load/store intrinsics, you can probably do better.

float *ptrA = ...  //  some unknown source, guaranteed to be 32-byte aligned
float *ptrB = ... // some unknown source, guaranteed to be 32-byte aligned
int length = ... // some unknown source, guaranteed to be multiple of 8

// memcopy() - Compiler can't read comments. It doesn't know the data is 32-byte
// aligned. So it may generate unnecessary misalignment handling code.
memcpy(ptrA, ptrB, length * sizeof(float));

// This loop could potentially be faster because it "uses" the fact that
// the pointers are aligned. The compiler can also further optimize this.
for (int c = 0; c < length; c += 8){
_mm256_store_ps(ptrA + c, _mm256_load_ps(ptrB + c));
}

Is memset() more efficient than for loop in C?

Most certainly, memset will be much faster than that loop. Note how you treat one character at a time, but those functions are so optimized that set several bytes at a time, even using, when available, MMX and SSE instructions.

I think the paradigmatic example of these optimizations, that go unnoticed usually, is the GNU C library strlen function. One would think that it has at least O(n) performance, but it actually has O(n/4) or O(n/8) depending on the architecture (yes, I know, in big O() will be the same, but you actually get an eighth of the time). How? Tricky, but nicely: strlen.

Faster way to zero memory than with memset?

x86 is rather broad range of devices.

For totally generic x86 target, an assembly block with "rep movsd" could blast out zeros to memory 32-bits at time. Try to make sure the bulk of this work is DWORD aligned.

For chips with mmx, an assembly loop with movq could hit 64bits at a time.

You might be able to get a C/C++ compiler to use a 64-bit write with a pointer to a long long or _m64. Target must be 8 byte aligned for the best performance.

for chips with sse, movaps is fast, but only if the address is 16 byte aligned, so use a movsb until aligned, and then complete your clear with a loop of movaps

Win32 has "ZeroMemory()", but I forget if thats a macro to memset, or an actual 'good' implementation.

Do compilers optimize memset in destructor?

The problem for optimizers here is that your memset isn't writing to a member at all. Yes, key will cease to exist, but not so key.data. That memory will be returned to std::allocator. And std::allocator will very likely read adjacent memory to determine the memory block from which key.data came. Typical implementations store such data in the header of allocated blocks, i.e. at negative offsets. It's not unlikely that the header will be updated to reflect the block is free, or to coalesce the free block with other free blocks.

This may even be inlined, so the optimizer sees one function doing a memset and then the header access. It would be unreasonable to expect that the optimizer can figure out the memset is harmless. For all it knows, the allocator may be keeping a pool of zeroed blocks.

memset() or value initialization to zero out a struct?

Those two constructs a very different in their meaning. The first one uses a memset function, which is intended to set a buffer of memory to certain value. The second to initialize an object. Let me explain it with a bit of code:

Lets assume you have a structure that has members only of POD types ("Plain Old Data" - see What are POD types in C++?)

struct POD_OnlyStruct
{
int a;
char b;
};

POD_OnlyStruct t = {}; // OK

POD_OnlyStruct t;
memset(&t, 0, sizeof t); // OK as well

In this case writing a POD_OnlyStruct t = {} or POD_OnlyStruct t; memset(&t, 0, sizeof t) doesn't make much difference, as the only difference we have here is the alignment bytes being set to zero-value in case of memset used. Since you don't have access to those bytes normally, there's no difference for you.

On the other hand, since you've tagged your question as C++, let's try another example, with member types different from POD:

struct TestStruct
{
int a;
std::string b;
};

TestStruct t = {}; // OK

{
TestStruct t1;
memset(&t1, 0, sizeof t1); // ruins member 'b' of our struct
} // Application crashes here

In this case using an expression like TestStruct t = {} is good, and using a memset on it will lead to crash. Here's what happens if you use memset - an object of type TestStruct is created, thus creating an object of type std::string, since it's a member of our structure. Next, memset sets the memory where the object b was located to certain value, say zero. Now, once our TestStruct object goes out of scope, it is going to be destroyed and when the turn comes to it's member std::string b you'll see a crash, as all of that object's internal structures were ruined by the memset.

So, the reality is, those things are very different, and although you sometimes need to memset a whole structure to zeroes in certain cases, it's always important to make sure you understand what you're doing, and not make a mistake as in our second example.

My vote - use memset on objects only if it is required, and use the default initialization x = {} in all other cases.

Why is memset causing problem despite being used on built-in types?

This doesn't have anything to do with memset. Your code is going outside the boundaries of your array, plain and simple.

In your input case you have n = 4 and k = 4, so occurrence_count is 4 elements long (its valid indexes go from 0 to 3 inclusive). Then, you do

    cin>>input_array[i];
++occurence_count[input_array[i]];

Given that the last value is 4, you are ultimately doing ++occurence_count[4], which goes out of the boundaries of your array. This is undefined behavior, which in your case manifests as incrementing memory not belonging to that array, which most probably doesn't start to 0 and messes up the later check.

The problem is not seen in your second code snippet, as you make occurence_count 5010 elements big and zero-inizialized by default as it is a global variable.

Now, if you are going to count occurrences of array values, of course sizing the occurrences array to be as big as the number of elements is wrong - that's the count of numbers you are going to read (and it would be fine to size input_array), not the maximum value you can read. Given that the array element values are from 1 to 5000, the occurrences array must be either sized 5001 (keeping values as they are), or sized 5000 (decrementing the values you read by 1 to index such array).

(In general, be careful as all indexes in the problem text are 1 based, while indexes in C++ are 0 based; you risk off-by-one errors if you reason with the problem indexes and then use them as C indexes, unless you over-size arrays by one and ignore the 0-th element).


Finally, some extra remarks:

  • if you compile with enough warnings enabled or with a recent enough compiler, it'll rightly complain as memset is not defined or that it is being implicitly defined (with an incorrect prototype, BTW); you should #include <string.h> to use memset;
  • as @Nicol Bolas went to a great length to explain in his answer, you are using VLAs (variable length arrays) when declaring a local array with size known only at runtime (int occurence_count[n]).

    VLAs aren't standard C++, so they aren't well specified, several compilers don't support them and, in general, are problematic in a number of ways (mostly, you shouldn't really allocate an unknown amount of data on the stack, which is generally small);

    You should probably avoid them in favor of std::vector or, given that the problem provides you an upper limit of both colors and elements (5000), just static arrays.

Can `memset` function call be removed by compiler?

"compiler has no right to assume that whatever happens inside it, will have no side effects."

That's correct. But if the compiler in fact knows what actually happens inside it and can determine that it really has no side effects, then no assumption is needed.

This is how almost all compiler optimizations work. The code says "X". The compiler determines that if "Y" is true, then it can replace code "X" with code "Z" and there will be no detectable difference. It determines "Y" is true, and then it replaces "X" with "Z".

For example:

void func()
{
int j = 2;
foo();
if (j == 2) bar();
else baz();
}

The compiler can optimize this to foo(); bar();. The compiler can see that foo cannot legally modify the value of j. If foo() somehow magically figures out where j is on the stack and modifies it, then the optimization will change the behavior of the code, but that's the programmer's fault for using "magic".

void func()
{
int j = 2;
foo(&j);
if (j == 2) bar();
else baz();
}

Now it can't because foo can legally modify the value of j without any magic. (Assuming the compiler can't look inside foo, which in some cases it can.)

If you do "magic", then the compiler can make optimizations that break your code. Stick to the rules and don't use magic.

In the example you linked to, the code relies on the compiler bothering to put a particular value in a variable that is never accessed and immediately ceases to exist. The compiler is not required to do anything that has no effect on the operation of your code.

The only way that could effect the code is if it peeked at unallocated portions of the stack or relied on new allocations on the stack having values they previously had. Requiring the compiler to do that would make a huge number of optimizations impossible, including replacing local variables with registers.



Related Topics



Leave a reply



Submit