Why Is Valarray So Slow

Why is valarray so slow?

I suspect that the reason c = a*b is so much slower than performing the operations an element at a time is that the

template<class T> valarray<T> operator*
(const valarray<T>&, const valarray<T>&);

operator must allocate memory to put the result into, then returns that by value.

Even if a "swaptimization" is used to perform the copy, that function still has the overhead of

  • allocating the new block for the resulting valarray
  • initializing the new valarray (it's possible that this might be optimized away)
  • putting the results into the new valarray
  • paging in the memory for the new valarray as it is initialized or set with result values
  • deallocating the old valarray that gets replaced by the result

Why is valarray so slow on Visual Studio 2015?

Am I doing everything right?

You're doing everything right. The problem is in the Visual Studio std::valarray implementation.

Why is std::valarray not working as needed on Visual Studio 2015?

Just open the implementation of any valarray operator, for example operator+. You will see something like (after macro expansion):

   template<class _Ty> inline
valarray<_Ty> operator+(const valarray<_Ty>& _Left,
const valarray<_Ty>& _Right)
{
valarray<TYPE> _Ans(_Left.size());
for (size_t _Idx = 0; _Idx < _Ans.size(); ++_Idx)
_Ans[_Idx] = _Left[_Idx] + _Right[_Idx];
return (_Ans)
}

As you can see, a new object is created in which the result of the operation is copied. There really is no optimization. I do not know why, but it is a fact. It looks like in Visual Studio, std::valarray was added for compatibility only.

For comparison, consider the GNU implementation. As you can see, each operator returns the template class _Expr which contains only the operation, but does not contain data. The real computation is performed in the assignment operator and more specifically in the __valarray_copy function. Thus, until you perform assignment, all actions are performed on the proxy object _Expr. Only once operator= is called, is the operation stored in _Expr performed in a single loop. This is the reason why you get such good results with g++.

How can I solve the problem?

You need to find a suitable std::valarray implementation on the internet or you can write your own. You can use the GNU implementation as an example.

Why is valarray so slow?

I suspect that the reason c = a*b is so much slower than performing the operations an element at a time is that the

template<class T> valarray<T> operator*
(const valarray<T>&, const valarray<T>&);

operator must allocate memory to put the result into, then returns that by value.

Even if a "swaptimization" is used to perform the copy, that function still has the overhead of

  • allocating the new block for the resulting valarray
  • initializing the new valarray (it's possible that this might be optimized away)
  • putting the results into the new valarray
  • paging in the memory for the new valarray as it is initialized or set with result values
  • deallocating the old valarray that gets replaced by the result

Why is there no std::data() overload for std::valarray?

As 1201ProgramAlarm stated in the comments, the proposal to add std::data does not make any mention of std::valarray. Unless someone can point out why &(a[0]) can't be used to obtain the valarray's data pointer, the simple answer is that valarray was either forgotten or ignored in the proposal.

Using std::valarray in numerical simulation

First of all, when you write or change code, always start with a first working version, and ensure the code is compiling between each steps. That would make isolating miscompiling code much easier.

Why Am I telling you this? It's because there are parts of your code that has never compiled correctly. No matter before the introduction of valarray or after.

For example, this:

for (auto& particle : particles) {
for (const auto& p: particle) {
p.position = 1.0
p.velocity = 2.0
p.force_new = 3.0
p.force_old = 4.0
}
}

A single particle is not an iterable type, and there is no semicolon at the end of the lines.

Just take it step by step, and ensure the code is compiling between each steps.


Second, I don't think valarray is what you're looking for. Unless you want each particle to have a dynamic number of dimension for each property, which would be very surprising.

I'd suggest you to introduce a vec type that would have the component you need to do your simplation. Such vec type can be found in libraries such as glm provide a vec2 class that has operators such as +-/* and more.

Even without a library, a simple vector type can be created.

Here's an example of your code using vectors (in the matematical sense) instead of std::valarray:

struct Particle{
glm::vec2 position;
glm::vec2 velocity;
glm::vec2 force_new;
glm::vec2 force_old;
void update_position();
};

void Particle::update_position() {
position += 0.1*(velocity + force_new);
force_old = force_new;
}

void compute_position(std::vector<Particle>& particles) {
for (auto& particle: particles) {
particle.update_position();
}
}

void print_data(const std::vector<Particle>& particles) {
for (const auto& particle : particles) {
std::cout << particle.position.x << ", " << particle.position.y << " ";
std::cout << particle.velocity.x << ", " << particle.velocity.y << " ";
std::cout << particle.force_new.x << ", " << particle.force_new.y << " ";
std::cout << std::endl;
}
}

void init_data(std::vector<Particle>& particles) {
for (auto& particle : particles) {
particle.position = {1, 2};
particle.velocity = {2, 24};
particle.force_old = {1, 5};
particle.force_new = {-4, 2};
}
}

int main() {
const unsigned int n = 10;
std::vector<Particle> particles(n);
init_data(particles);
compute_position(particles);
print_data(particles);
return 0;
}

Live example with custom (imcomplete) vec2 type

How to use std::valarray in if statement without redundant computation?

If you want to skip elemets in v[i] use a nested loop

using vfloat = std::valarray<float>;
std::vector<vfloat> v = {vfloat{0.f, 0.f}, vfloat{1.f, 0.f}, vfloat{2.f, 0.f}};
for(size_t i = 0; i < v.size(); ++i)
for(size_t j = 0; j < v[i].size(); ++j)
if (v[i][j] != 0)
v[i][j] = v[i][j] * v[i][j] * v[i][j]; // Time-consuming computation.

or apply:

using vfloat = std::valarray<float>;
std::vector<vfloat> v = {vfloat{0.f, 0.f}, vfloat{1.f, 0.f}, vfloat{2.f, 0.f}};
for(size_t i = 0; i < v.size(); ++i)
v[i] = v[i].apply([](float n) -> float {
if (n == 0) return 0;
return n * n * n; // Time-consuming computation.
});

If you want to skip elements in v

using vfloat = std::valarray<float>;
std::vector<vfloat> v = {vfloat{0.f, 0.f}, vfloat{1.f, 0.f}, vfloat{2.f, 0.f}};
for(size_t i = 0; i < v.size(); ++i)
if ((v[i] != 0).max() != 0)
v[i] = v[i] * v[i] * v[i]; // Time-consuming computation.


Related Topics



Leave a reply



Submit