C++ Performance Challenge: Integer to Std::String Conversion

C++ performance challenge: integer to std::string conversion

#include <string>

const char digit_pairs[201] = {
"00010203040506070809"
"10111213141516171819"
"20212223242526272829"
"30313233343536373839"
"40414243444546474849"
"50515253545556575859"
"60616263646566676869"
"70717273747576777879"
"80818283848586878889"
"90919293949596979899"
};


std::string& itostr(int n, std::string& s)
{
if(n==0)
{
s="0";
return s;
}

int sign = -(n<0);
unsigned int val = (n^sign)-sign;

int size;
if(val>=10000)
{
if(val>=10000000)
{
if(val>=1000000000)
size=10;
else if(val>=100000000)
size=9;
else
size=8;
}
else
{
if(val>=1000000)
size=7;
else if(val>=100000)
size=6;
else
size=5;
}
}
else
{
if(val>=100)
{
if(val>=1000)
size=4;
else
size=3;
}
else
{
if(val>=10)
size=2;
else
size=1;
}
}
size -= sign;
s.resize(size);
char* c = &s[0];
if(sign)
*c='-';

c += size-1;
while(val>=100)
{
int pos = val % 100;
val /= 100;
*(short*)(c-1)=*(short*)(digit_pairs+2*pos);
c-=2;
}
while(val>0)
{
*c--='0' + (val % 10);
val /= 10;
}
return s;
}

std::string& itostr(unsigned val, std::string& s)
{
if(val==0)
{
s="0";
return s;
}

int size;
if(val>=10000)
{
if(val>=10000000)
{
if(val>=1000000000)
size=10;
else if(val>=100000000)
size=9;
else
size=8;
}
else
{
if(val>=1000000)
size=7;
else if(val>=100000)
size=6;
else
size=5;
}
}
else
{
if(val>=100)
{
if(val>=1000)
size=4;
else
size=3;
}
else
{
if(val>=10)
size=2;
else
size=1;
}
}

s.resize(size);
char* c = &s[size-1];
while(val>=100)
{
int pos = val % 100;
val /= 100;
*(short*)(c-1)=*(short*)(digit_pairs+2*pos);
c-=2;
}
while(val>0)
{
*c--='0' + (val % 10);
val /= 10;
}
return s;
}

This will blow up on systems that disallow unaligned memory accesses (in which case, the first unaligned assignment via *(short*) would cause a segfault), but should work very nicely otherwise.

One important thing to do is to minimize the use of std::string. (Ironic, I know.) In Visual Studio, for example, most calls to methods of std::string are not inlined, even if you specify /Ob2 in compiler options. So even something as trivial as a call to std::string::clear(), which you might expect to be very fast, can take 100 clockticks when linking CRT as a static library, and as much as 300 clockticks when linking as a DLL.

For the same reason, returning by reference is better because it avoids an assignment, a constructor and a destructor.

Faster conversion to string

You could try using template specialization in addition to the code you gave, to optimize the most common cases (this is how, for example, vector<bool> can be more space-efficient).

For example:

template<>
string toString<char>(char sAttrValue) {
...
}

template<>
string toString<int>(int sAttrValue) {
...
}

And so on for the other types, each one using a conversion method specific and optimized for that type. Any type that doesn't match one of these specialized templates will fall back to the default stringstream method.

Converting an int to std::string

You can use std::to_string in C++11

int i = 3;
std::string str = std::to_string(i);

integer to string

I think the "recursive decomposition" is

N = T * 10 + O

This leads to a recursive algorithm

string(N) = let T = idiv(N, 10),
let O = mod(N, 10),
concat(T? string(T): empty_string, todigit(O))

Now you just have to write that up in C++, figuring out the little details along the way.

Then if you want to see more efficient ways, read

  • C++ performance challenge: integer to std::string conversion

Integer to string conversion issues

I'm attempting to convert Integer to string with the following code:

CryptoPP::Integer i("12345678900987654321");

std::ostrstream oss;
oss << i;
std::string s(oss.str());
LOGDEBUG(oss.str()); // Pumps log to console and log file

The output appears to have extra garbage data:

12345678900987654321.ÍÍÍÍÍÍÍÍÍÍÍýýýý««««««««îþîþ

I can't reproduce this with Crypto++ 5.6.2 on Visual Studio 2010. The corrupted output is likely the result of some other issue, not a bug in Crypto++. If you haven't done so already, I'd suggest trying to reproduce this in a minimal program just using CryptoPP::Integer and std::cout, and none of your other application code, to eliminate all other possible problems. If it's not working in a trivial stand-alone test (which would be surprising), there could be problems with the way the library was built (e.g. maybe it was built with a different C++ runtime or compiler version from what your application is using). If your stand-alone test passes, you can add in other string operations, logging code etc. until you find the culprit.

I do notice though that you're using std::ostrstream which is deprecated. You may want to use std::ostringstream instead. This Stack Overflow answer to the question "Why was std::strstream deprecated?" may be of interest, and it may even the case that the issues mentioned in that answer are causing your problems here.

Additionally, I cannot get precision or scientific notation working.
The following will output the same results:

std::cout.precision(5); // Does nothing with CryptoPP::Integer
std::cout << "Dec: " << std::setprecision(1) << std::dec << i << std::endl;
std::cout << "Sci: " << std::setprecision(5) << std::scientific << i << std::endl;

std::setprecision and std::scientific modify floating-point input/output. So, with regular integer types in C++ like int or long long this wouldn't work either (but I can see that especially with arbitrary-length integers like CryptoPP:Integer being able to output in scientific notation with a specified precision would make sense).

Even if C++ didn't define it like this, Crypto++'s implementation would still need to heed those flags. From looking at the Crypto++ implementation of std::ostream& operator<<(std::ostream& out, const Integer &a), I can see that the only iostream flags it recognizes are std::ios::oct and std::ios::hex (for octal and hex format numbers respectively).

If you want scientific notation, you'll have to format the output yourself (or use a different library).

On top of all of this, sufficiently large numbers breaks the entire
thing.

CryptoPP::Integer i("12345");

// Calculate i^16
for (int x = 0; x < 16; x++)
{
i *= i;
}

std::cout << i << std::endl; // Will never finish

That will actually calculate i^(2^16) = i^65536, not i^16, because on each loop you're multiplying i with its new intermediate value, not with its original value. The actual result with this code would be 268,140 digits long, so I expect it's just taking Crypto++ a long time to produce that output.

Here is the code adjusted to produce the correct result:

CryptoPP::Integer i("12345");
CryptoPP::Integer i_to_16(1);

// Calculate i^16
for (int x = 0; x < 16; x++)
{
i_to_16 *= i;
}

std::cout << i_to_16 << std::endl;

what's concise C++ idiom for conversion from int|double to string?

Using the std::to_string family of functions:

std::string s = std::to_string(3.1416);

If you don't have the required C++11, another option is boost::lexical_cast.

std::string s = boost::lexical_cast<std::string>(3.1416);

Why was integer to string conversion not explicitly included in C++ until now?

In hindsight, it was an oversight. However, without knowing details about the development history of C++, I’d venture the guess that this oversight has good reasons, rooted in theory. See, a conversion from number to string and vice versa is far from trivial, and it doesn’t fit the normal definition of a “cast” very well (in reality, it requires a parser / formatter), even though most other languages do provide such a cast.

Added to that is the fact that C++’ support for string types is rather … pedestrian. C doesn’t even have a real, dedicated type for it, it uses char arrays instead. C++ goes slightly further but stops well short of proper built-in string support. This can be seen in many aspects, from the fact that the string literal is still a null-terminated char array, to the broad consensus that std::string has a bloated, poorly designed interface. And don’t forget that std::string doesn’t even represent a string! It represents an array of bytes! This is an important distinction, and the reason for this is simply that std::string is completely encoding agnostic.

Ah, but C++ actually does support proper encoding, and proper parsing and formatting. It simply doesn’t provide it for strings – it provides it for streams.

And there we have it. C++ has no proper string type. Instead, it has input/output streams.

How to convert unsigned/signed integer/long to C string in an elegant and efficient way?

Just use sprintf or itoa (non portable):

char* append_num(char* buf, int n)
{
return buf + sprintf(buf, "%d", n);
}

some std::to_string implementations actually use sprintf and copy result to a new std::string.

Here's something that could be considered well optimized. The difference from regular itoa is that it does twice less integer divisions which aren't trivial instructions on most CPUs.

static int log10_1(unsigned int num)
{
int ret;
static_assert(sizeof(num) == 4, "expected 32-bit unsigned int");
// extend this logic for 64 bit numbers
if (num >= 10000)
{
if (num >= 1000000)
{
if (num >= 100000000)
ret = (num >= 1000000000) ? 10 : 9;
else
ret = (num >= 10000000) ? 8 : 7;
}
else
ret = (num >= 100000) ? 6 : 5;
}
else
{
if (num >= 100)
ret = num >= 1000 ? 4 : 3;
else
ret = num >= 10 ? 2 : 1;
}
return ret;
}

// write string representation of number `n` into buf and return pointer to rterminating null
char* to_str(char* buf, unsigned int n)
{
static const char dig_[] = "0001020304050607080910111213141516171819"
"20212223242526272829303132333435363738394041424344454647484950515253545556575859"
"60616263646566676869707172737475767778798081828384858687888990919293949596979899";
int len = log10_1(n);
char *p = buf + len;
*p-- = 0;
while (n >= 100)
{
unsigned int x = (n % 100) * 2;
n /= 100;
*p-- = dig_[x + 1];
*p-- = dig_[x];
}
if (n >= 10)
{
unsigned int x = n * 2;
*p-- = dig_[x + 1];
*p-- = dig_[x];
}
else
*p-- = (char)('0' + n);
return buf + len;
}

// write string representation of number `n` into buf and return pointer to terminating null
char* to_str(char* buf, int n)
{
unsigned int l;
if (n < 0)
{
*buf++ = '-';
if (n == INT_MIN)
{
static_assert(sizeof(n) == 4, "expected 32-bit int");
memcpy(buf, "2147483648", 10);
return buf + 10;
}
l = (unsigned int)(-n);
}
else
l = (unsigned int)n;
return to_str(buf, l);
}

to_str is more than twice as fast compared to cdhowie's foo and about 6 times as fast as sprintf. Compare times:

foo time: 745 ms
to_str time: 327 ms
sprintf time: 1989 ms

There is already a good stackoverflow page for optimized to_string function: C++ performance challenge: integer to std::string conversion. Fastest algorithm is essentially identical to mine.



Related Topics



Leave a reply



Submit