Efficiency of std::copy vs memcpy
A reasonably decent implementation will have std::copy
compile to a call memmove
in the situations where this is possible (i.e. the element type is a POD).
If your implementation doesn't have contiguous storage (the C++03 standard requires it), memmove
might be faster than std::copy
, but probably not too much. I would start worrying only when you have measurements to show it is indeed an issue.
Why is `std::copy` 5x (!) slower than `memcpy` for reading one int from a char buffer, in my test program?
That is not the results I get:
> g++ -O3 XX.cpp
> ./a.out
cast: 5 ms
memcpy: 4 ms
std::copy: 3 ms
(counter: 1264720400)
Hardware: 2GHz Intel Core i7
Memory: 8G 1333 MHz DDR3
OS: Max OS X 10.7.5
Compiler: i686-apple-darwin11-llvm-g++-4.2 (GCC) 4.2.1
On a Linux box I get different results:
> g++ -std=c++0x -O3 XX.cpp
> ./a.out
cast: 3 ms
memcpy: 4 ms
std::copy: 21 ms
(counter: 731359744)
Hardware: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Memory: 61363780 kB
OS: Linux ip-10-58-154-83 3.2.0-29-virtual #46-Ubuntu SMP
Compiler: g++ (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
memcpy vs for loop - What's the proper way to copy an array from a pointer?
Memcpy will probably be faster, but it's more likely you will make a mistake using it.
It may depend on how smart your optimizing compiler is.
Your code is incorrect though. It should be:
memcpy(myGlobalArray, nums, 10 * sizeof(int) );
In what cases should I use memcpy over standard operators in C++?
Efficiency should not be your concern.
Write clean maintainable code.
It bothers me that so many answers indicate that the memcpy() is inefficient. It is designed to be the most efficient way of copy blocks of memory (for C programs).
So I wrote the following as a test:
#include <algorithm>
extern float a[3];
extern float b[3];
extern void base();
int main()
{
base();
#if defined(M1)
a[0] = b[0];
a[1] = b[1];
a[2] = b[2];
#elif defined(M2)
memcpy(a, b, 3*sizeof(float));
#elif defined(M3)
std::copy(&a[0], &a[3], &b[0]);
#endif
base();
}
Then to compare the code produces:
g++ -O3 -S xr.cpp -o s0.s
g++ -O3 -S xr.cpp -o s1.s -DM1
g++ -O3 -S xr.cpp -o s2.s -DM2
g++ -O3 -S xr.cpp -o s3.s -DM3
echo "=======" > D
diff s0.s s1.s >> D
echo "=======" >> D
diff s0.s s2.s >> D
echo "=======" >> D
diff s0.s s3.s >> D
This resulted in: (comments added by hand)
======= // Copy by hand
10a11,18
> movq _a@GOTPCREL(%rip), %rcx
> movq _b@GOTPCREL(%rip), %rdx
> movl (%rdx), %eax
> movl %eax, (%rcx)
> movl 4(%rdx), %eax
> movl %eax, 4(%rcx)
> movl 8(%rdx), %eax
> movl %eax, 8(%rcx)
======= // memcpy()
10a11,16
> movq _a@GOTPCREL(%rip), %rcx
> movq _b@GOTPCREL(%rip), %rdx
> movq (%rdx), %rax
> movq %rax, (%rcx)
> movl 8(%rdx), %eax
> movl %eax, 8(%rcx)
======= // std::copy()
10a11,14
> movq _a@GOTPCREL(%rip), %rsi
> movl $12, %edx
> movq _b@GOTPCREL(%rip), %rdi
> call _memmove
Added Timing results for running the above inside a loop of 1000000000
.
g++ -c -O3 -DM1 X.cpp
g++ -O3 X.o base.o -o m1
g++ -c -O3 -DM2 X.cpp
g++ -O3 X.o base.o -o m2
g++ -c -O3 -DM3 X.cpp
g++ -O3 X.o base.o -o m3
time ./m1
real 0m2.486s
user 0m2.478s
sys 0m0.005s
time ./m2
real 0m1.859s
user 0m1.853s
sys 0m0.004s
time ./m3
real 0m1.858s
user 0m1.851s
sys 0m0.006s
Performance of memmove compared to memcpy twice?
From http://en.cppreference.com/w/cpp/string/byte/memmove
Despite being specified "as if" a temporary buffer is used, actual implementations of this function do not incur the overhead of double copying or extra memory. For small count, it may load up and write out registers; for larger blocks, a common approach (glibc and bsd libc) is to copy bytes forwards from the beginning of the buffer if the destination starts before the source, and backwards from the end otherwise, with a fall back to std::memcpy when there is no overlap at all.
Therefore the overhead in all likelihood is a couple of conditional branches. Hardly worth worrying about for large blocks.
However, it is worth remembering that std::memcpy
is a 'magic' function, being the only legal way to cast between two dissimilar types.
In c++, this is illegal (undefined behaviour):
union {
float a;
int b;
} u;
u.a = 10.0;
int x = u.b;
This is legal:
float a = 10.0;
int b;
std::memcpy(std::addressof(b), std::addressof(a), size(b));
and does what you'd expect the union to do if you were a C programmer.
Is there still a performance advantage to redefine standard like memcpy?
The functions like memcpy
belong to the standard library and almost sure they are implemented in assembler, not in C.
If you redefine them it will surely work slower. If you want to optimize memcpy
you should either use memmove
instead or declaring the pointers as restrict
, to tell that they do not overlap and treat them as fast as memmove
.
Those engineers who wrote the Standard C library for the given arhitechture for sure they used the existing assembler function to move memory faster.
EDIT:
Taking the remarks from some comments, every generation of code that keeps the semantics of copying (including replacing memcpy by mov-instructions or other code) is allowed.
For algorithms of copying (including the algorithm that newlib is using) you can check this article . Quote from this article:
Special situations If you know all about the data you're copying as
well as the environment in which memcpy runs, you may be able to
create a specialized version that runs very fast
Related Topics
Why Should the Copy Constructor Accept Its Parameter by Reference in C++
Which Iomanip Manipulators Are 'Sticky'
Conveniently Declaring Compile-Time Strings in C++
Static Initialization Order Fiasco
What Is the Usefulness of 'Enable_Shared_From_This'
Operator New Initializes Memory to Zero
Alternative to Itoa() For Converting Integer to String C++
Qt Linker Error: "Undefined Reference to Vtable"
Unmangling the Result of Std::Type_Info::Name
What Is a Converting Constructor in C++ ? What Is It For
Gcc Optimization Flag -O3 Makes Code Slower Than -O2
Why Does C++ Require a Cast For Malloc() But C Doesn'T
Initializer Lists and Rhs of Operators
Strptime() Equivalent on Windows
How to Have Functions Inside Functions in C++
Is (4 ≫ Y ≫ 1) a Valid Statement in C++? How to Evaluate It If So