Why is valarray so slow?
I suspect that the reason c = a*b
is so much slower than performing the operations an element at a time is that the
template<class T> valarray<T> operator*
(const valarray<T>&, const valarray<T>&);
operator must allocate memory to put the result into, then returns that by value.
Even if a "swaptimization" is used to perform the copy, that function still has the overhead of
- allocating the new block for the resulting
valarray
- initializing the new
valarray
(it's possible that this might be optimized away) - putting the results into the new
valarray
- paging in the memory for the new
valarray
as it is initialized or set with result values - deallocating the old
valarray
that gets replaced by the result
Why is valarray so slow on Visual Studio 2015?
Am I doing everything right?
You're doing everything right. The problem is in the Visual Studio std::valarray
implementation.
Why is
std::valarray
not working as needed on Visual Studio 2015?
Just open the implementation of any valarray
operator, for example operator+
. You will see something like (after macro expansion):
template<class _Ty> inline
valarray<_Ty> operator+(const valarray<_Ty>& _Left,
const valarray<_Ty>& _Right)
{
valarray<TYPE> _Ans(_Left.size());
for (size_t _Idx = 0; _Idx < _Ans.size(); ++_Idx)
_Ans[_Idx] = _Left[_Idx] + _Right[_Idx];
return (_Ans)
}
As you can see, a new object is created in which the result of the operation is copied. There really is no optimization. I do not know why, but it is a fact. It looks like in Visual Studio, std::valarray
was added for compatibility only.
For comparison, consider the GNU implementation. As you can see, each operator returns the template class _Expr which contains only the operation, but does not contain data. The real computation is performed in the assignment operator and more specifically in the __valarray_copy function. Thus, until you perform assignment, all actions are performed on the proxy object _Expr
. Only once operator=
is called, is the operation stored in _Expr
performed in a single loop. This is the reason why you get such good results with g++.
How can I solve the problem?
You need to find a suitable std::valarray
implementation on the internet or you can write your own. You can use the GNU implementation as an example.
Why is valarray so slow?
I suspect that the reason c = a*b
is so much slower than performing the operations an element at a time is that the
template<class T> valarray<T> operator*
(const valarray<T>&, const valarray<T>&);
operator must allocate memory to put the result into, then returns that by value.
Even if a "swaptimization" is used to perform the copy, that function still has the overhead of
- allocating the new block for the resulting
valarray
- initializing the new
valarray
(it's possible that this might be optimized away) - putting the results into the new
valarray
- paging in the memory for the new
valarray
as it is initialized or set with result values - deallocating the old
valarray
that gets replaced by the result
Why is there no std::data() overload for std::valarray?
As 1201ProgramAlarm stated in the comments, the proposal to add std::data
does not make any mention of std::valarray
. Unless someone can point out why &(a[0])
can't be used to obtain the valarray
's data pointer, the simple answer is that valarray
was either forgotten or ignored in the proposal.
Using std::valarray in numerical simulation
First of all, when you write or change code, always start with a first working version, and ensure the code is compiling between each steps. That would make isolating miscompiling code much easier.
Why Am I telling you this? It's because there are parts of your code that has never compiled correctly. No matter before the introduction of valarray or after.
For example, this:
for (auto& particle : particles) {
for (const auto& p: particle) {
p.position = 1.0
p.velocity = 2.0
p.force_new = 3.0
p.force_old = 4.0
}
}
A single particle is not an iterable type, and there is no semicolon at the end of the lines.
Just take it step by step, and ensure the code is compiling between each steps.
Second, I don't think valarray is what you're looking for. Unless you want each particle to have a dynamic number of dimension for each property, which would be very surprising.
I'd suggest you to introduce a vec type that would have the component you need to do your simplation. Such vec type can be found in libraries such as glm
provide a vec2
class that has operators such as +-/*
and more.
Even without a library, a simple vector type can be created.
Here's an example of your code using vectors (in the matematical sense) instead of std::valarray
:
struct Particle{
glm::vec2 position;
glm::vec2 velocity;
glm::vec2 force_new;
glm::vec2 force_old;
void update_position();
};
void Particle::update_position() {
position += 0.1*(velocity + force_new);
force_old = force_new;
}
void compute_position(std::vector<Particle>& particles) {
for (auto& particle: particles) {
particle.update_position();
}
}
void print_data(const std::vector<Particle>& particles) {
for (const auto& particle : particles) {
std::cout << particle.position.x << ", " << particle.position.y << " ";
std::cout << particle.velocity.x << ", " << particle.velocity.y << " ";
std::cout << particle.force_new.x << ", " << particle.force_new.y << " ";
std::cout << std::endl;
}
}
void init_data(std::vector<Particle>& particles) {
for (auto& particle : particles) {
particle.position = {1, 2};
particle.velocity = {2, 24};
particle.force_old = {1, 5};
particle.force_new = {-4, 2};
}
}
int main() {
const unsigned int n = 10;
std::vector<Particle> particles(n);
init_data(particles);
compute_position(particles);
print_data(particles);
return 0;
}
Live example with custom (imcomplete) vec2 type
How to use std::valarray in if statement without redundant computation?
If you want to skip elemets in v[i]
use a nested loop
using vfloat = std::valarray<float>;
std::vector<vfloat> v = {vfloat{0.f, 0.f}, vfloat{1.f, 0.f}, vfloat{2.f, 0.f}};
for(size_t i = 0; i < v.size(); ++i)
for(size_t j = 0; j < v[i].size(); ++j)
if (v[i][j] != 0)
v[i][j] = v[i][j] * v[i][j] * v[i][j]; // Time-consuming computation.
or apply
:
using vfloat = std::valarray<float>;
std::vector<vfloat> v = {vfloat{0.f, 0.f}, vfloat{1.f, 0.f}, vfloat{2.f, 0.f}};
for(size_t i = 0; i < v.size(); ++i)
v[i] = v[i].apply([](float n) -> float {
if (n == 0) return 0;
return n * n * n; // Time-consuming computation.
});
If you want to skip elements in v
using vfloat = std::valarray<float>;
std::vector<vfloat> v = {vfloat{0.f, 0.f}, vfloat{1.f, 0.f}, vfloat{2.f, 0.f}};
for(size_t i = 0; i < v.size(); ++i)
if ((v[i] != 0).max() != 0)
v[i] = v[i] * v[i] * v[i]; // Time-consuming computation.
Related Topics
Fast Implementation of Trigonometric Functions for C++
Implementing the Derivative in C/C++
Win32 Setforegroundwindow Unreliable
Properties File Library for C (Or C++)
Why Vector Access Operators Are Not Specified as Noexcept
Is It Valid for a Lambda To, Essentially, Close Over Itself
Redefining or Changing MACro Value
Compiling and Running a C++ Program with Vim
Matching an Overloaded Function to Its Polymorphic Argument
C++ Virtual Function Being Hidden
C++ Frontend Only Compiler (Convert C++ to C)
Search 25 000 Words Within a Text
Qthread Emits Finished() Signal But Isrunning() Returns True and Isfinished() Returns False
Assigning a String of Characters to a Char Array
Which Sorting Algorithm Is Used by Stl's List::Sort()
What Are Differences Between Std, Tr1 and Boost (As Namespaces And/Or Libraries)