Which Is Faster/Preferred: Memset or for Loop to Zero Out an Array of Doubles

Which is faster/preferred: memset or for loop to zero out an array of doubles?

Note that for memset you have to pass the number of bytes, not the number of elements because this is an old C function:

memset(d, 0, sizeof(double)*length);

memset can be faster since it is written in assembler, whereas std::fill is a template function which simply does a loop internally.

But for type safety and more readable code I would recommend std::fill() - it is the c++ way of doing things, and consider memset if a performance optimization is needed at this place in the code.

Memset or for loop to zero multiple same length arrays

Your method 1 (the for loop), is bad for the cache.

As arr1, arr2, arr3 may not be anywhere near each other in memory, and very likely will not be in the cache together, you could have frequent cache-misses, and the CPU will have to constantly fetch new pieces from memory, just to set them to zero.

By doing a set of memset operations, you'll hit ALL of arr1, at once, almost certainly entirely from the cache. Then you'll cache and set all of arr2 very quickly, etc, etc.

That, and because memset may well have assembly tricks and optimizations to make it faster, I would definitely prefer option 2 over option 1.

What is the fastest way to zero an existing array?

Fastest ... probably yes.
Buggy almost sure!

It mostly depends on the implementation, platform and ... what type the array contains.

In C++ when a variable is defined its constructor is called. When an array is defined, all the array's elements' constructors are called.

Wiping out the memory can be considered "good" only for the cases when the array type is know to have an initial state that can be represented by all zero and for which the default constructor doesn't perform any action.

This is in general true for built-in types, but also false for other types.

The safest way is to assign the elements with a default initialized temporary.

template<class T, size_t N>
void reset(T* v)
{
for(size_t i=0; i<N; ++i)
v[i] = T();
}

Note that, if T is char, the function instantiates and translates exactly as memset. So it is the same speed, no more no less.

is memset(ary,0,length) a portable way of inputting zero in double array

If you are in a C99 environment, you get no guarantee whatsoever. The representation of floating point numbers is defined in § 5.2.4.2.2, but that is only the logical, mathematical representation. That section does not even mention how floating point numbers are stored in terms of bytes. Instead, it says in a footnote:

The floating-point model is intended to clarify the description of each floating-point characteristic and does not require the floating-point arithmetic of the implementation to be identical.

Further, § 6.2.6.1 says:

The representations of all types are unspecified except as stated in this subclause.

And in the rest of that subclause, floating point types are not mentioned.

In summary, there is no guarantee that a 0.0 is represented as all-bits-zero.

C increment pointer vs for() loop performance

Here is the generated assembly for both the code snippets

f2:                          ;; pointer based approach
test edx, edx
jle .L5
sub edx, 1
movsx esi, sil
movsx rdx, edx
add rdx, 1
jmp memset
.L5:
ret

f3: ;; loop based approach
test edx, edx
jle .L8
sub edx, 1
movsx esi, sil
add rdx, 1
jmp memset
.L8:
ret

I understand that shorter assembly does not mean faster code, however the compiler does generate a few extra instructions for the pointer based version. The difference in number of instructions is even larger when I tried the same using clang. If anything, the pointer based version will be a little slower, not faster.

Note that both of these are calling memset and the code prior to that is merely checking and setting up the registers for that call to memset. You can go ahead and make the memset call yourself.

memset(buf, bufval, buflen)

This generates the following assembly

f1:                          ;; memset based approach
movsx rdx, edx
jmp memset

Coming back to the original question, which version is faster. It cannot be emphasised enough that modern compilers are smart. Micro-optimizations like these rarely, if ever, provide performance benefits. Writing idiomatic code, where it is easier for compiler to understand the intent, will always give you better performance.

Here is a link to godbolt if you want to play with the assembly output: https://godbolt.org/g/NxHS5F

memcpy vs for loop - What's the proper way to copy an array from a pointer?

Memcpy will probably be faster, but it's more likely you will make a mistake using it.
It may depend on how smart your optimizing compiler is.

Your code is incorrect though. It should be:

memcpy(myGlobalArray, nums, 10 * sizeof(int) );

Memcpy takes the same time as memset

The point is that malloc and calloc on most platforms don't allocate memory; they allocate address space.

malloc etc work by:

  • if the request can be fulfilled by the freelist, carve a chunk out of it

    • in case of calloc: the equivalent ofmemset(ptr, 0, size) is issued
  • if not: ask the OS to extend the address space.

For systems with demand paging (COW) (an MMU could help here), the second options winds downto:

  • create enough page table entries for the request, and fill them with a (COW) reference to /dev/zero
  • add these PTEs to the address space of the process

This will consume no physical memory, except only for the Page Tables.

  • Once the new memory is referenced for read, the read will come from /dev/zero. The /dev/zero device is a very special device, in this case mapped to every page of the new memory.
  • but, if the new page is written, the COW logic kicks in (via a page fault):

    • physical memory is allocated
    • the /dev/zero page is copied to the new page
    • the new page is detached from the mother page
    • and the calling process can finally do the update which started all this

memset() or value initialization to zero out a struct?

Those two constructs a very different in their meaning. The first one uses a memset function, which is intended to set a buffer of memory to certain value. The second to initialize an object. Let me explain it with a bit of code:

Lets assume you have a structure that has members only of POD types ("Plain Old Data" - see What are POD types in C++?)

struct POD_OnlyStruct
{
int a;
char b;
};

POD_OnlyStruct t = {}; // OK

POD_OnlyStruct t;
memset(&t, 0, sizeof t); // OK as well

In this case writing a POD_OnlyStruct t = {} or POD_OnlyStruct t; memset(&t, 0, sizeof t) doesn't make much difference, as the only difference we have here is the alignment bytes being set to zero-value in case of memset used. Since you don't have access to those bytes normally, there's no difference for you.

On the other hand, since you've tagged your question as C++, let's try another example, with member types different from POD:

struct TestStruct
{
int a;
std::string b;
};

TestStruct t = {}; // OK

{
TestStruct t1;
memset(&t1, 0, sizeof t1); // ruins member 'b' of our struct
} // Application crashes here

In this case using an expression like TestStruct t = {} is good, and using a memset on it will lead to crash. Here's what happens if you use memset - an object of type TestStruct is created, thus creating an object of type std::string, since it's a member of our structure. Next, memset sets the memory where the object b was located to certain value, say zero. Now, once our TestStruct object goes out of scope, it is going to be destroyed and when the turn comes to it's member std::string b you'll see a crash, as all of that object's internal structures were ruined by the memset.

So, the reality is, those things are very different, and although you sometimes need to memset a whole structure to zeroes in certain cases, it's always important to make sure you understand what you're doing, and not make a mistake as in our second example.

My vote - use memset on objects only if it is required, and use the default initialization x = {} in all other cases.

Assigning a zero to all array elements in C

If it's just a small array (like three elements), it probably won't make much difference whether you use mem* functions, or a loop, or three distinct assignments. In fact, that latter case may even be faster as you're not suffering the cost of a function call:

myArry[0] = myArray[1] = myArray[2] = 0;

But, even if one is faster, the difference would probably not be worth worrying about. I tend to optimise for readability first then, if needed, optimise for space/storage later.

If it was a choice between memcpy and memset, I'd choose the latter (assuming, as seems to be the case, that the all-zero bit pattern actually represented 0.0 in your implementation) for two reasons:

  • it doesn't require storage of a zeroed array; and
  • the former will get you into trouble if you change the size of one array and forget the other.

And, for what it's worth, your memset solution doesn't need to have the multiplication. Since you can get the size of the entire array, you can just do:

memset (myArray, 0, sizeof (myArray));


Related Topics



Leave a reply



Submit