Performance Issue for Vector::Size() in a Loop in C++

Performance issue for vector::size() in a loop in C++

In theory, it is called each time, since a for loop:

for(initialization; condition; increment)
body;

is expanded to something like

{
initialization;
while(condition)
{
body;
increment;
}
}

(notice the curly braces, because initialization is already in an inner scope)

In practice, if the compiler understands that a piece of your condition is invariant through all the duration of the loop and it does not have side-effects, it can be smart enough to move it out. This is routinely done with strlen and things like that (that the compiler knows well) in loops where its argument isn't written.

However it must be noted that this last condition isn't always trivial to prove; in general, it's easy if the container is local to the function and is never passed to external functions; if the container is not local (e.g. it's passed by reference - even if it's const) and the loop body contains calls to other functions, the compiler often has to assume that such functions may alter it, thus blocking the hoisting of the length calculation.

Doing that optimization by hand is worthy if you know that a part of your condition is "expensive" to evaluate (and such condition usually isn't, since it usually boils down to a pointer subtraction, which is almost surely inlined).


Edit: as others said, in general with containers it's better to use iterators, but for vectors it's not so important, because random access to elements via operator[] is guaranteed to be O(1); actually with vectors it usually is a pointer sum (vector base+index) and dereference vs the pointer increment (preceding element+1) and dereference of iterators. Since the target address is still the same, I don't think that you can gain something from iterators in terms of cache locality (and even if so, if you're not walking big arrays in tight loops you shouldn't even notice such kind of improvements).

For lists and other containers, instead, using iterators instead of random access can be really important, since using random access may mean walk every time the list, while incrementing an iterator is just a pointer dereference.

performance of std::vector c++ size() inside loop in member function

Do not get into the habit of doing things like that.

The cases where the optimization you make in (2) is:

  • safe to do
  • has a noticeable difference
  • something your compiler cannot figure out on its own

are few and far in-between.

If it were just the latter two points, I would just advise that you're worrying about something unimportant. However, that first point is the real killer: you do not want to get in the habit of giving yourself extra chances to make mistakes. It's far, far easier to accelerate slow, correct code than it is to debug fast, buggy code.

Now, that said, I'll try answering your question. The definitions of the functions SomeNotConstFunction and SomeConstFunction are (presumably) in the same translation unit. So if these functions really do not modify the vector, the compiler can figure it out, and it will only "call" size once.

However, the compiler does not have access to the definition of SomeExternalFunction, and so must assume that every call to that function has the potential of modifying your vector. The presence of that function in your loop guarantees that `size is "called" every time.

I put "called" in quotes, however, because it is such a trivial function that it almost certainly gets inlined. Also, the function is ridiculously cheap -- two memory lookups (both nearly guaranteed to be cache hits), and either a subtraction and a right shift, or maybe even a specialized single instruction that does both.

Even if SomeExternalFunction does absolutely nothing, it's quite possible that "calling" size every time would still only be a small-to-negligible fraction of the running time of your loop.

Edit: In response to the edit....

what exactly is the time cost incurred by size() when the array really might change

The difference in the times you see when you time the two different versions of code. If you're doing very low level optimizations like that, you can't get answers through "pure reason" -- you must empirically test the results.

And if you really are doing such low level optimizations (and you can guarantee that the vector won't resize), you should probably be more worried about the fact the compiler doesn't know the base pointer of the array is constant, rather than it not knowing the size is constant.

If SomeExternalFunction really is external to the compilation unit, then you have pretty much no chance of the compiler vectorizing the loop, no matter what you do. (I suppose it might be possible at link time, though....) And it's also unlikely to be "trivial" because it requires function call overhead -- at least if "trivial" means the same thing to you as to me. (again, I don't know how good link time optimizations are....)

If you really can guarantee that some operations will not resize the vector, you might consider refining your class's API (or at least it's protected or private parts) to include functions that self-evidently won't resize the vector.

Is it expensive to compute vector size in for loops, each iteration?

buildings.size() will likely be inlined by the compiler to directly access the private size field on the vector<T> class. So you shouldn't separate the call to size. This kind of micro-optimization is something you don't want to worry about anyway (unless you're in some really tight loop identified as a bottleneck by profiling).

For loop, is it faster to check the size of a vector outside the loop?

It is not important at all. Optimizing compiler will remove those two or three lines, they do nothing.

Seriously. If compiler could deduce if a container is not changed in a loop, it would do optimization that you did manually. To help compiler to apply optimization you can even declare a container be constant (example for vector):

const std::vector<int> myVec = {0,1,2,3};

Proper Way to size check a std::vector inside a loop

The first version will often be faster even with modern compilers. It is difficult for the optimizer to prove that the size does not change due to aliasing with the location written to in the loop body and so in many cases the second version will have to recalculate the size on each loop iteration.

I measured this in Visual Studio 2013 Release and found a performance difference for both 32 and 64 bit code. Both versions are handily beaten by std::fill(). These measurements are averages over 1000 runs with 10 million element vectors (increasing the number of elements to a billion somewhat reduces the performance difference as memory access becomes more of a bottleneck).

Method                   Time relative to uncached for loop
x86 x64

uncached for loop 1.00 1.00
cached for loop 0.70 0.98
std::fill() 0.42 0.57

The baseline cached size for loop code:

const auto size = vec.size();
for (vector<int>::size_type i = 0; i < size; ++i) {
vec[i] = val;
}

Compiles to this loop body (x86 Release):

00B612C0  mov         ecx,dword ptr [esi]  
00B612C2 mov dword ptr [ecx+eax*4],edi
00B612C5 inc eax
00B612C6 cmp eax,edx
00B612C8 jb forCachedSize+20h (0B612C0h)

Whereas the version that does not cache the vector's size:

for (vector<int>::size_type i = 0; i < vec.size(); ++i) {
vec[i] = val;
}

Compiles to this, which recomputes vec.size() every time through the loop:

00B612F0  mov         dword ptr [edx+eax*4],edi  
00B612F3 inc eax
00B612F4 mov ecx,dword ptr [esi+4] <-- Load vec.end()
00B612F7 mov edx,dword ptr [esi] <-- Load vec.begin()
00B612F9 sub ecx,edx <-- ecx = vec.end() - vec.begin()
00B612FB sar ecx,2 <-- exc = (vec.end() - vec.begin()) / sizeof(int)
00B612FE cmp eax,ecx
00B61300 jb forComputedSize+20h (0B612F0h)

What is the best way to improve the performance of iterating through a std::vector?

That might well be faster, but since you're using C++, an alternative might be to use an iterator instead:

// declare it to be some appropriate type of iterator
for (it = my_vector.begin(); it != my_vector.end(); ++it)
doSomething(*it);

You will want to be careful here, though, that you're not inadvertently constructing temporary copies of each item in my_vector.

If you're using C++11, another option is to use range-for:

for (const auto &thing : my_vector)
doSomething(thing);

Note that the use of const and & mean that we avoid copying each item in my_vector.

Performance of vector::size() : is it as fast as reading a variable?

Interesting question.

So, what's going to happened ? Well if you debug with gdb you'll see something like 3 member variables (names are not accurate):

  • _M_begin: pointer to the first element of the dynamic array
  • _M_end: pointer one past the last element of the dynamic array
  • _M_capacity: pointer one past the last element that could be stored in the dynamic array

The implementation of vector<T,Alloc>::size() is thus usually reduced to:

return _M_end - _M_begin;  // Note: _Mylast - _Myfirst in VC 2008

Now, there are 2 things to consider when regarding the actual optimizations possible:

  • will this function be inlined ? Probably: I am no compiler writer, but it's a good bet since the overhead of a function call would dwarf the actual time here and since it's templated we have all the code available in the translation unit
  • will the result be cached (ie sort of having an unnamed local variable): it could well be, but you won't know unless you disassemble the generated code

In other words:

  • If you store the size yourself, there is a good chance it will be as fast as the compiler could get it.
  • If you do not, it will depend on whether the compiler can establish that nothing else is modifying the vector; if not, it cannot cache the variable, and will need to perform memory reads (L1) every time.

It's a micro-optimization. In general, it will be unnoticeable, either because the performance does not matter or because the compiler will perform it regardless. In a critical loop where the compiler does not apply the optimization, it can be a significant improvement.

error in for loop, when using with vector size

size() returns an unsigned type. With unsigned types, 0 - 1 "underflows" to the maximum value. Since i is less than the maximum possible value of the type it is converted to, the for loop is entered.

Erasing indices of std::vector inside a for loop based on vector size

Each time you erase() an element from a container, its size() decrements, and the indexes of the remaining elements are decremented as well. But you are incrementing your loop counter unconditionally, so every time you erase an element, you skip the next element that had followed it!

Also, you are passing your vector by-value, so you are operating on a copy of the vector, and the caller will not see any changes in the original vector.

The correct approach would be to either:

  1. increment your index variable inside of the loop body only when an element is NOT erased. Leave the variable as-is when you DO erase an element:

    void FilterContours( std::vector<std::vector<cv::Point>> &contours )
    {
    int i = 0;
    while ( i < contours.size() ) {
    if ( contours[i].size() < 5 ) {
    contours.erase(contours.begin() + i);
    continue;
    }

    //Other filtering...

    ++i;
    }
    }
  2. use iterators instead of indexes:

    void FilterContours( std::vector<std::vector<cv::Point>> &contours )
    {
    auto it = contours.begin();
    while ( it != contours.end() ) {
    if ( it->size() < 5 ) {
    it = contours.erase(it);
    continue;
    }

    //Other filtering...

    ++it;
    }
    }
  3. use the erase-remove idiom:

    void FilterContours( std::vector<std::vector<cv::Point>> &contours )
    {
    contours.erase(
    std:::remove_if(
    contours.begin(),
    contours.end(),
    [](const std::vector<cv::Point> &v)
    {
    if (v.size() < 5) return true;
    //Other filtering...
    return false;
    }
    ),
    contours.end()
    );
    }


Related Topics



Leave a reply



Submit