Performance Hit from C++ Style Casts

Performance hit from C++ style casts?

If the C++ style cast can be conceptualy replaced by a C-style cast there will be no overhead. If it can't, as in the case of dynamic_cast, for which there is no C equivalent, you have to pay the cost one way or another.

As an example, the following code:

int x;
float f = 123.456;

x = (int) f;
x = static_cast<int>(f);

generates identical code for both casts with VC++ - code is:

00401041   fld         dword ptr [ebp-8]
00401044 call __ftol (0040110c)
00401049 mov dword ptr [ebp-4],eax

The only C++ cast that can throw is dynamic_cast when casting to a reference. To avoid this, cast to a pointer, which will return 0 if the cast fails.

which cast is faster static_castint () or int()

There should be no difference at all if you compare int() to equivalent functionality of static_cast<int>().

Using VC2008:

    double d = 10.5;
013A13EE fld qword ptr [__real@4025000000000000 (13A5840h)]
013A13F4 fstp qword ptr [d]
int x = int(d);
013A13F7 fld qword ptr [d]
013A13FA call @ILT+215(__ftol2_sse) (13A10DCh)
013A13FF mov dword ptr [x],eax
int y = static_cast<int>(d);
013A1402 fld qword ptr [d]
013A1405 call @ILT+215(__ftol2_sse) (13A10DCh)
013A140A mov dword ptr [y],eax

Obviously, it is 100% the same!

A conversion that C-style cast can handle, but C++ casts cannot

Cast to inaccessible base can only be expressed as a C style cast (one of the syntactic variants). In that context it is equivalent to a static_cast, which may change the address, except that static_cast can't access the base.

Example:

struct Base
{
int x = 42;
};

struct Oh_my
: private Base
{
virtual ~Oh_my() {}
};

#include <iostream>
using namespace std;
auto main() -> int
{
Oh_my o;
cout << "C cast: " << ((Base&)o).x << endl;
cout << "reinterpret_cast: " << reinterpret_cast<Base&>(o).x << endl;
}

Output with MingW g++ in Windows 7:


C cast: 42
reinterpret_cast: 4935184

But since it's pretty Undefined Behavior, the last output operation could just crash.

How much of a performane hit will i take from casting when trying to make this code mistake proof?

How about having another method with where clause in the CrudModel class.

public IEnumerable<T> GetAll<T>(Func<T, bool> whereClause) where T : DomainBase
{
var items = Repository.Query<T>();
return items.Where(whereClause);
}

And call using

List<int> intList = new List<int>() { 1 };
intList.GetAll<int>((i) => sampledict.ContainsKey(i));

I felt it is not proper to make things complex by having logic cramped into one single GetAll method and since CrudModel seems to be generic, better to have generic method that accepts any condition.

Is it OK to use C-style cast for built-in types?

I would not, for the following reasons:

  • Casts are ugly and should be ugly and stand out in your code, and be findable using grep and similar tools.
  • "Always use C++ casts" is a simple rule that is much more likely to be remembered and followed than, "Use C++ casts on user-defined types, but it's OK to use C-style casts on built-in types."
  • C++ style casts provide more information to other developers about why the cast is necessary.
  • C-style casts may let you do conversions you didn't intend -- if you have an interface that takes in (int*) and you were using c-style casts to pass it a const int*, and the interface changes to take in a long*, your code using c-style casts will continue to work, even if it's not what you wanted.

Does typecasting consume extra CPU cycles

I would like to say that "converting between types" is what we should be looking at, not whether there is a cast or not. For example

 int a = 10;
float b = a;

will be the same as :

 int a = 10;
float b = (float)a;

This also applies to changing the size of a type, e.g.

 char c = 'a';
int b = c;

this will "extend c into an int size from a single byte [using byte in the C sense, not 8-bit sense]", which will potentially add an extra instruction (or extra clockcycle(s) to the instruction used) above and beyond the datamovement itself.

Note that sometimes these conversions aren't at all obvious. On x86-64, a typical example is using int instead of unsigned int for indices in arrays. Since pointers are 64-bit, the index needs to be converted to 64-bit. In the case of an unsigned, that's trivial - just use the 64-bit version of the register the value is already in, since a 32-bit load operation will zero-fill the top part of the register. But if you have an int, it could be negative. So the compiler will have to use the "sign extend this to 64 bits" instruction. This is typically not an issue where the index is calculated based on a fixed loop and all values are positive, but if you call a function where it is not clear if the parameter is positive or negative, the compiler will definitely have to extend the value. Likewise if a function returns a value that is used as an index.

However, any reasonably competent compiler will not mindlessly add instructions to convert something from its own type to itself (possibly if optimization is turned off, it may do - but minimal optimization should see that "we're converting from type X to type X, that doesn't mean anything, lets take it away").

So, in short, the above example is does not add any extra penalty, but there are certainly cases where "converting data from one type to another does add extra instructions and/or clockcycles to the code".

Cast performance from size_t to double

For your original questions:

  1. The code is slow because it involves the conversion from integer to
    float data types. That's why it's easily sped up when you use also
    an integer datatype for the sum-variables because it doesn't require
    a float-conversion anymore.
  2. The difference is the result of several
    factors. For example it depends on how efficient a platform is able
    to perform an int->float conversion. Furthermore this conversion
    could also mess up processor-internal optimizations in the program
    flow and prediction engine, caches, ... and also the internal
    parallelizing-features of the processors can have a huge influence in
    such calculations.

For the additional questions:

  • "Surprisingly int is faster than uint_fast32_t"? What's the
    sizeof(size_t) and sizeof(int) on your platform? One guess I can make is, that both are
    probably 64bit and therefore a cast to 32bit not only can give you
    calculation errors but also includes a different-size-casting
    penalty.

In general try to avoid visible and hidden casts as good as possible if these aren't really necessary. For example try to find out what real datatype is hidden behind "size_t" on your environment (gcc) and use that one for the loop-variable.
In your example the square of uint's cannot be a float datatype so it makes no sense to use double here. Stick to integer types to achieve maximum performance.

Performance of dynamic_cast?

Firstly, you need to measure the performance over a lot more than just a few iterations, as your results will be dominated by the resolution of the timer. Try e.g. 1 million+, in order to build up a representative picture. Also, this result is meaningless unless you compare it against something, i.e. doing the equivalent but without the dynamic casting.

Secondly, you need to ensure the compiler isn't giving you false results by optimising away multiple dynamic casts on the same pointer (so use a loop, but use a different input pointer each time).

Dynamic casting will be slower, because it needs to access the RTTI (run-time type information) table for the object, and check that the cast is valid. Then, in order to use it properly, you will need to add error-handling code that checks whether the returned pointer is NULL. All of this takes up cycles.

I know you didn't want to talk about this, but "a design where dynamic_cast is used a lot" is probably an indicator that you're doing something wrong...



Related Topics



Leave a reply



Submit