C++ Most Efficient Way to Convert String to Int (Faster Than Atoi)

How can I convert a std::string to int?

In C++11 there are some nice new convert functions from std::string to a number type.

So instead of

atoi( str.c_str() )

you can use

std::stoi( str )

where str is your number as std::string.

There are version for all flavours of numbers:
long stol(string), float stof(string), double stod(string),...
see http://en.cppreference.com/w/cpp/string/basic_string/stol

What is the difference between sscanf or atoi to convert a string to an integer?

You have 3 choices:

  1. atoi

This is probably the fastest if you're using it in performance-critical code, but it does no error reporting. If the string does not begin with an integer, it will return 0. If the string contains junk after the integer, it will convert the initial part and ignore the rest. If the number is too big to fit in int, the behaviour is unspecified.


  1. sscanf

Some error reporting, and you have a lot of flexibility for what type to store (signed/unsigned versions of char/short/int/long/long long/size_t/ptrdiff_t/intmax_t).

The return value is the number of conversions that succeed, so scanning for "%d" will return 0 if the string does not begin with an integer. You can use "%d%n" to store the index of the first character after the integer that's read in another variable, and thereby check to see if the entire string was converted or if there's junk afterwards. However, like atoi, behaviour on integer overflow is unspecified.


  1. strtol and family

Robust error reporting, provided you set errno to 0 before making the call. Return values are specified on overflow and errno will be set. You can choose any number base from 2 to 36, or specify 0 as the base to auto-interpret leading 0x and 0 as hex and octal, respectively. Choices of type to convert to are signed/unsigned versions of long/long long/intmax_t.

If you need a smaller type you can always store the result in a temporary long or unsigned long variable and check for overflow yourself.

Since these functions take a pointer to pointer argument, you also get a pointer to the first character following the converted integer, for free, so you can tell if the entire string was an integer or parse subsequent data in the string if needed.


Personally, I would recommend the strtol family for most purposes. If you're doing something quick-and-dirty, atoi might meet your needs.

As an aside, sometimes I find I need to parse numbers where leading whitespace, sign, etc. are not supposed to be accepted. In this case it's pretty damn easy to roll your own for loop, eg.,

for (x=0; (unsigned)*s-'0'<10; s++) 
x=10*x+(*s-'0');

Or you can use (for robustness):

if (isdigit(*s))
x=strtol(s, &s, 10);
else /* error */

Most insanely fastest way to convert 9 char digits into an int or unsigned int

Yes, SIMD is possible, as mentioned in comments. You can take advantage of it to parse the HH, MM, and SS parts of the string at the same time.

Since you have a 100% fixed format with leading 0s where necessary, this is easier than How to implement atoi using SIMD? - Place-values are fixed and we don't need any compare / bit-scan or pcmpistri to look up a shuffle control mask or scale-factor. Also SIMD string to unsigned int parsing in C# performance improvement has some good ideas, like tweaking the place-value multipliers to avoid a step at the end (TODO, do that here.)

9 decimal digits breaks down into two dwords and one leftover byte that's probably best to grab separately.

Assuming you care about throughput (ability to overlap this with surrounding code, or do this in a loop on independent elements) moreso than critical path latency in cycles from input pointer and data in memory being ready to nanoseconds integer being ready, SSSE3 SIMD should be very good on modern x86. (With SSE4.1 being useful if you want to unpack your hours, minutes, seconds into contiguous uint32_t elements e.g. in a struct). It might be competitive on latency, too, vs. scalar.

Fun fact: clang auto-vectorizes your convert2 / convert3 functions, widening to 8x dword in a YMM register for vpmulld (2 uops), then a chain of shuffle/add.

The strategy is to use pmaddubsw and pmaddwd to multiply-and-add pairs horizontally, in a way that gets each digit multiplied by its place value. e.g. 10 and 1 pairs, then 100 and 1 for pairs of integer that come from double-digits. Then extract to scalar for the last pair: multiply the most-significant part by 100 * 100, and add to the least-significant part. I'm pretty sure overflow is impossible at any step for inputs that are actually '0'..'9'; This runs and compiles to the asm I expected, but I didn't verify the numeric results.

#include <immintrin.h>

typedef struct { // for output into memory
alignas(16) unsigned hours;
unsigned minutes, seconds, nanos;
} hmsn;

void str2hmsn(hmsn *out, const char str[15]) // HHMMSSXXXXXXXXX 15 total, with 9-digit nanoseconds.
{ // 15 not including the terminating 0 (if any) which we don't read
//hmsn retval;
__m128i digs = _mm_loadu_si128((const __m128i*)str);
digs = _mm_sub_epi8( digs, _mm_set1_epi8('0') );
__m128i hms_x_words = _mm_maddubs_epi16( digs, _mm_set1_epi16( 10U + (1U<<8) )); // SSSE3 pairs of digits => 10s, 1s places.

__m128i hms_unpacked = _mm_cvtepu16_epi32(hms_x_words); // SSE4.1 hours, minutes, seconds unpack from uint16_t to uint32
//_mm_storeu_si128((__m128i*)&retval, hms_unpacked); // store first 3 struct members; last to be written separately
_mm_storeu_si128((__m128i*)out, hms_unpacked);
// or scalar extract with _mm_cvtsi128_si64 (movq) and shift / movzx

__m128i xwords = _mm_bsrli_si128(hms_x_words, 6); // would like to schedule this sooner, so oldest-uop-first starts this critical path shuffle ahead of pmovzx
// 8 bytes of data, lined up in low 2 dwords, rather than split across high 3
// could have got here with an 8-byte load that starts here, if we didn't want to get the H,M,S integers cheaply.

__m128i xdwords = _mm_madd_epi16(xwords, _mm_setr_epi16(100, 1, 100, 1, 0,0,0,0)); // low/high uint32 chunks, discard the 9th x digit.
uint64_t pair32 = _mm_cvtsi128_si64(xdwords);
uint32_t msd = 100*100 * (uint32_t)pair32; // most significant dword was at lower address (in printing order), so low half on little-endian x86. encourage compilers to use 32-bit operand-size for imul
uint32_t first8_x = msd + (uint32_t)(pair32 >> 32);
uint32_t nanos = first8_x * 10 + ((unsigned char)str[14] - '0'); // total*10 + lowest digit
out->nanos = nanos;
//retval.nanos = nanos;
//return retval;

// returning the struct by value encourages compilers in the wrong direction
// into not doing separate stores, even when inlining into a function that assigns the whole struct to a pointed-to output
}

On Godbolt with a test loop that uses asm("" ::"m"(sink): "memory" ) to make the compiler redo the work in a loop. Or a std::atomic_thread_fence(acq_rel) hack that gets MSVC to not optimize away the loop either. On my i7-6700k with GCC 11.1, x86-64 GNU/Linux, energy_performance_preference = performance, I got this to run at one iteration per 5 cycles.

IDK why it doesn't run at one per 4c; I tweaked GCC options to avoid the JCC erratum slowdown without padding, and to have the loop in hopefully 4 uop cache lines. (6 uops, 1 uop ended by a 32B boundary, 6 uops, 2 uops ended by the dec/jnz). Perf counters say the front-end was "ok", and uops_dispatched_port shows all 4 ALU ports at less than 4 uops per iteration, highest being port0 at 3.34.
Manually padding the early instructions gets it down to 3 total lines, of 3, 6, 6 uops but still no improvement from 5c per iter, so I guess the front-end really is ok.

LLVM-MCA seems very ambitious in projecting 3c per iter, apparently based on a wrong model of Skylake with a "dispatch" (front-end rename I think) width of 6. Even with -mcpu=haswell with a proper 4-wide model it projects 4.5c. (I used asm("# LLVM-MCA-BEGIN") etc. macros on Godbolt and included an LLVM-MCA output window for the test loop.) It doesn't have fully accurate uop->port mapping, apparently not knowing about slow-LEA running only on port 1, but IDK if that's significant.

Throughput may be limited by the ability to find instruction-level parallelism and overlap across several iterations, as in Understanding the impact of lfence on a loop with two long dependency chains, for increasing lengths

The test loop is:

#include <stdlib.h>
#ifndef __cplusplus
#include <stdalign.h>
#endif
#include <stdint.h>

#if 1 && defined(__GNUC__)
#define LLVM_MCA_BEGIN asm("# LLVM-MCA-BEGIN")
#define LLVM_MCA_END asm("# LLVM-MCA-END")
#else
#define LLVM_MCA_BEGIN
#define LLVM_MCA_END
#endif

#if defined(__cplusplus)
#include <atomic>
using std::atomic_thread_fence, std::memory_order_acq_rel;
#else
#include <stdatomic.h>
#endif

unsigned testloop(const char str[15]){
hmsn sink;
for (int i=0 ; i<1000000000 ; i++){
LLVM_MCA_BEGIN;
str2hmsn(&sink, str);
// compiler memory barrier
// force materializing the result, and forget about the input string being the same
#ifdef __GNUC__
asm volatile("" ::"m"(sink): "memory");
#else
//#warning happens to be enough with current MSVC
atomic_thread_fence(memory_order_acq_rel); // strongest barrier that doesn't require any asm instructions on x86; MSVC defeats signal_fence.
#endif
}
LLVM_MCA_END;
volatile unsigned dummy = sink.hours + sink.nanos; // make sure both halves are really used, else MSVC optimizes.
return dummy;
}

int main(int argc, char *argv[])
{
// performance isn't data-dependent, so just use a handy string.
// alignas(16) static char str[] = "235959123456789";
uintptr_t p = (uintptr_t)argv[0];
p &= -16;
return testloop((char*)p); // argv[0] apparently has a cache-line split within 16 bytes on my system, worsening from 5c throughput to 6.12c
}

I compiled as follows, to squeeze the loop in so it ends before the 32-byte boundary it's almost hitting. Note that -march=haswell allows it to use AVX encodings, saving an instruction or two.

$ g++ -fno-omit-frame-pointer -fno-stack-protector -falign-loops=16 -O3 -march=haswell foo.c -masm=intel
$ objdump -drwC -Mintel a.out | less

...
0000000000001190 <testloop(char const*)>:
1190: 55 push rbp
1191: b9 00 ca 9a 3b mov ecx,0x3b9aca00
1196: 48 89 e5 mov rbp,rsp
1199: c5 f9 6f 25 6f 0e 00 00 vmovdqa xmm4,XMMWORD PTR [rip+0xe6f] # 2010 <_IO_stdin_used+0x10>
11a1: c5 f9 6f 15 77 0e 00 00 vmovdqa xmm2,XMMWORD PTR [rip+0xe77] # 2020 <_IO_stdin_used+0x20> # vector constants hoisted
11a9: c5 f9 6f 0d 7f 0e 00 00 vmovdqa xmm1,XMMWORD PTR [rip+0xe7f] # 2030 <_IO_stdin_used+0x30>
11b1: 66 66 2e 0f 1f 84 00 00 00 00 00 data16 cs nop WORD PTR [rax+rax*1+0x0]
11bc: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
### Top of loop is 16-byte aligned here, instead of ending up with 8 byte default
11c0: c5 d9 fc 07 vpaddb xmm0,xmm4,XMMWORD PTR [rdi]
11c4: c4 e2 79 04 c2 vpmaddubsw xmm0,xmm0,xmm2
11c9: c4 e2 79 33 d8 vpmovzxwd xmm3,xmm0
11ce: c5 f9 73 d8 06 vpsrldq xmm0,xmm0,0x6
11d3: c5 f9 f5 c1 vpmaddwd xmm0,xmm0,xmm1
11d7: c5 f9 7f 5d f0 vmovdqa XMMWORD PTR [rbp-0x10],xmm3
11dc: c4 e1 f9 7e c0 vmovq rax,xmm0
11e1: 69 d0 10 27 00 00 imul edx,eax,0x2710
11e7: 48 c1 e8 20 shr rax,0x20
11eb: 01 d0 add eax,edx
11ed: 8d 14 80 lea edx,[rax+rax*4]
11f0: 0f b6 47 0e movzx eax,BYTE PTR [rdi+0xe]
11f4: 8d 44 50 d0 lea eax,[rax+rdx*2-0x30]
11f8: 89 45 fc mov DWORD PTR [rbp-0x4],eax
11fb: ff c9 dec ecx
11fd: 75 c1 jne 11c0 <testloop(char const*)+0x30>
# loop ends 1 byte before it would be a problem for the JCC erratum workaround
11ff: 8b 45 fc mov eax,DWORD PTR [rbp-0x4]

So GCC made the asm I had planned by hand before writing the intrinsics this way, using as few instructions as possible to optimize for throughput. (Clang favours latency in this loop, using a separate add instead of a 3-component LEA).

This is faster than any of the scalar versions that just parse X, and it's parsing HH, MM, and SS as well. Although clang auto-vectorization of convert3 may give this a run for its money in that department, but it strangely doesn't do that when inlining.

GCC's scalar convert3 takes 8 cycles per iteration. clang's scalar convert3 in a loop takes 7, running at 4.0 fused-domain uops/clock, maxing out the front-end bandwidth and saturating port 1 with one imul uop per cycle. (This is reloading each byte with movzx and storing the scalar result to a stack local every iteration. But not touching the HHMMSS bytes.)

$ taskset -c 3 perf stat --all-user -etask-clock,context-switches,cpu-migrations,page-faults,cycles,instructions,uops_issued.any,uops_executed.thread,idq.mite_uops,idq_uops_not_delivered.cycles_fe_was_ok -r1 ./a.out

Performance counter stats for './a.out':

1,221.82 msec task-clock # 1.000 CPUs utilized
0 context-switches # 0.000 /sec
0 cpu-migrations # 0.000 /sec
105 page-faults # 85.937 /sec
5,079,784,301 cycles # 4.158 GHz
16,002,910,115 instructions # 3.15 insn per cycle
15,004,354,053 uops_issued.any # 12.280 G/sec
18,003,922,693 uops_executed.thread # 14.735 G/sec
1,484,567 idq.mite_uops # 1.215 M/sec
5,079,431,697 idq_uops_not_delivered.cycles_fe_was_ok # 4.157 G/sec

1.222107519 seconds time elapsed

1.221794000 seconds user
0.000000000 seconds sys

Note that this is for 1G iterations, so 5.08G cycles means 5.08 cycles per iteration average throughput.

Removing the extra work to produce the HHMMSS part of the output (vpsrldq, vpmovzxwd, and vmovdqa store), just the 9-digit integer part, it runs at 4.0 cycles per iteration on Skylake. Or 3.5 without the scalar store at the end. (I edited GCC's asm output to comment that instruction, so I know it's still doing all the work.)

The fact that there's some kind of back-end bottleneck here (rather than front-end) is probably a good thing for overlapping this with independent work.

How to convert a string to integer in C?

There is strtol which is better IMO. Also I have taken a liking in strtonum, so use it if you have it (but remember it's not portable):

long long
strtonum(const char *nptr, long long minval, long long maxval,
const char **errstr);

You might also be interested in strtoumax and strtoimax which are standard functions in C99. For example you could say:

uintmax_t num = strtoumax(s, NULL, 10);
if (num == UINTMAX_MAX && errno == ERANGE)
/* Could not convert. */

Anyway, stay away from atoi:

The call atoi(str) shall be equivalent to:

(int) strtol(str, (char **)NULL, 10)

except that the handling of errors may differ. If the value cannot be
represented, the behavior is undefined
.

Easiest way to convert int to string in C++

C++11 introduces std::stoi (and variants for each numeric type) and std::to_string, the counterparts of the C atoi and itoa but expressed in term of std::string.

#include <string> 

std::string s = std::to_string(42);

is therefore the shortest way I can think of. You can even omit naming the type, using the auto keyword:

auto s = std::to_string(42);

Note: see [string.conversions] (21.5 in n3242)

How to input millions of integers quite fast in C++?

A few ideas:

  1. Read integers using std::scanf, not std::istream. The latter is known to be slower for multiple reasons, even with std::ios::sync_with_stdio(false) call.
  2. Read the file by mapping it into memory.
  3. Parse integers faster than scanf and strtol.

Example:

#include <cstdio>

int main() {
int n, m, a[1600000];
if(2 != std::scanf("%d %d", &n, &m))
throw;
for(int i = 0; i < n; ++i)
if(1 != std::scanf("%d", a + i))
throw;
}

You can also unroll that scanf loop to read multiple integers in one call. E.g.:

#include <cstdio>

constexpr int step = 64;
char const fmt[step * 3] =
"%d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d "
"%d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d "
"%d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d "
"%d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d"
;
void main() {
int a[1600000];
int n, m;
if(2 != std::scanf("%d %d", &n, &m))
throw;

for(int i = 0; i < n; i += step) {
int expected = step < n - i ? step : n - i;
int* b = a + i;
int read = scanf(fmt + 3 * (step - expected),
b + 0x00, b + 0x01, b + 0x02, b + 0x03, b + 0x04, b + 0x05, b + 0x06, b + 0x07,
b + 0x08, b + 0x09, b + 0x0a, b + 0x0b, b + 0x0c, b + 0x0d, b + 0x0e, b + 0x0f,
b + 0x10, b + 0x11, b + 0x12, b + 0x13, b + 0x14, b + 0x15, b + 0x16, b + 0x17,
b + 0x18, b + 0x19, b + 0x1a, b + 0x1b, b + 0x1c, b + 0x1d, b + 0x1e, b + 0x1f,
b + 0x20, b + 0x21, b + 0x22, b + 0x23, b + 0x24, b + 0x25, b + 0x26, b + 0x27,
b + 0x28, b + 0x29, b + 0x2a, b + 0x2b, b + 0x2c, b + 0x2d, b + 0x2e, b + 0x2f,
b + 0x30, b + 0x31, b + 0x32, b + 0x33, b + 0x34, b + 0x35, b + 0x36, b + 0x37,
b + 0x38, b + 0x39, b + 0x3a, b + 0x3b, b + 0x3c, b + 0x3d, b + 0x3e, b + 0x3f);
if(read != expected)
throw;
}
}

Another option is to parse integers manually (mapping file into memory would help here and there are much faster algorithms for parsing integers than this and standard atoi/strtol, see Fastware - Andrei Alexandrescu):

int main() {
int n, m, a[1600000];
if(2 != std::scanf("%d %d", &n, &m))
throw;

for(int i = 0; i < n; ++i) {
int r = std::getchar();
while(std::isspace(r))
r = std::getchar();
bool neg = false;
if('-' == r) {
neg = true;
r = std::getchar();
}
r -= '0';
for(;;) {
int s = std::getchar();
if(!std::isdigit(s))
break;
r = r * 10 + (s - '0');
}
a[i] = neg ? -r : r;
}
}

Yet another is to map the file into memory and parse it faster:

#include <boost/iostreams/device/mapped_file.hpp>

inline int find_and_parse_int(char const*& begin, char const* end) {
while(begin != end && std::isspace(*begin))
++begin;
if(begin == end)
throw;
bool neg = *begin == '-';
begin += neg;
int r = 0;
do {
unsigned c = *begin - '0';
if(c >= 10)
break;
r = r * 10 + static_cast<int>(c);
} while(++begin != end);
return neg ? -r : r;
}

void main() {
boost::iostreams::mapped_file f("random-1600000.txt", boost::iostreams::mapped_file::readonly);
char const* begin = f.const_data();
char const* end = begin + f.size();
int n = find_and_parse_int(begin, end);
int m = find_and_parse_int(begin, end);

int a[1600000];
for(int i = 0; i < n; ++i)
a[i] = find_and_parse_int(begin, end);
}

Benchmark source code.

Note that the results may differ considerably across different versions of compilers and standard libraries:

  • CentOS release 6.10, g++-6.3.0, Intel Core i7-4790 CPU @ 3.60GHz
---- Best times ----
seconds, percent, method
0.167985515, 100.0, getchar
0.147258495, 87.7, scanf
0.137161991, 81.7, iostream
0.118859546, 70.8, scanf-multi
0.034033769, 20.3, mmap-parse-faster
  • Ubuntu 18.04.2 LTS, g++-8.2.0, Intel Core i7-7700K CPU @ 4.20GHz
---- Best times ----
seconds, percent, method
0.133155952, 100.0, iostream
0.102128208, 76.7, scanf
0.082469185, 61.9, scanf-multi
0.048661004, 36.5, getchar
0.025320109, 19.0, mmap-parse-faster

What's the fastest way to convert hex to integer in C++?

Proposed Solutions that Render Faster than the OP's if-else:

  • Unordered Map Lookup Table

Provided that your input strings are always hex numbers you could define a lookup table as an unordered_map:

std::unordered_map<char, int> table {
{'0', 0}, {'1', 1}, {'2', 2},
{'3', 3}, {'4', 4}, {'5', 5},
{'6', 6}, {'7', 7}, {'8', 8},
{'9', 9}, {'a', 10}, {'A', 10},
{'b', 11}, {'B', 11}, {'c', 12},
{'C', 12}, {'d', 13}, {'D', 13},
{'e', 14}, {'E', 14}, {'f', 15},
{'F', 15}, {'x', 0}, {'X', 0}};

int hextoint(char number) {
return table[(std::size_t)number];
}
  • Lookup Table as user constexpr literal (C++14)

Or if you want something more faster instead of an unordered_map you could use the new C++14 facilities with user literal types and define your table as a literal type at compile time:

struct Table {
long long tab[128];
constexpr Table() : tab {} {
tab['1'] = 1;
tab['2'] = 2;
tab['3'] = 3;
tab['4'] = 4;
tab['5'] = 5;
tab['6'] = 6;
tab['7'] = 7;
tab['8'] = 8;
tab['9'] = 9;
tab['a'] = 10;
tab['A'] = 10;
tab['b'] = 11;
tab['B'] = 11;
tab['c'] = 12;
tab['C'] = 12;
tab['d'] = 13;
tab['D'] = 13;
tab['e'] = 14;
tab['E'] = 14;
tab['f'] = 15;
tab['F'] = 15;
}
constexpr long long operator[](char const idx) const { return tab[(std::size_t) idx]; }
} constexpr table;

constexpr int hextoint(char number) {
return table[(std::size_t)number];
}

Live Demo

Benchmarks:

I ran benchmarks with the code written by Nikos Athanasiou that was posted recently on isocpp.org as a proposed method for C++ micro-benchmarking.

The algorithms that were compared are:

1. OP's original if-else:

long long hextoint3(char number) {
if(number == '0') return 0;
if(number == '1') return 1;
if(number == '2') return 2;
if(number == '3') return 3;
if(number == '4') return 4;
if(number == '5') return 5;
if(number == '6') return 6;
if(number == '7') return 7;
if(number == '8') return 8;
if(number == '9') return 9;
if(number == 'a' || number == 'A') return 10;
if(number == 'b' || number == 'B') return 11;
if(number == 'c' || number == 'C') return 12;
if(number == 'd' || number == 'D') return 13;
if(number == 'e' || number == 'E') return 14;
if(number == 'f' || number == 'F') return 15;
return 0;
}

2. Compact if-else, proposed by Christophe:

long long hextoint(char number) {
if (number >= '0' && number <= '9') return number - '0';
else if (number >= 'a' && number <= 'f') return number - 'a' + 0x0a;
else if (number >= 'A' && number <= 'F') return number - 'A' + 0X0a;
else return 0;
}

3. Corrected ternary operator version that handles also capital letter inputs, proposed by g24l:

long long hextoint(char in) {
int const x = in;
return (x <= 57)? x - 48 : (x <= 70)? (x - 65) + 0x0a : (x - 97) + 0x0a;
}

4. Lookup Table (unordered_map):

long long hextoint(char number) {
return table[(std::size_t)number];
}

where table is the unordered map shown previously.

5. Lookup Table (user constexpr literal):

long long hextoint(char number) {
return table[(std::size_t)number];
}

Where table is user defined literal as shown above.

Experimental Settings

I defined a function that transforms an input hex string to an integer:

long long hexstrtoint(std::string const &str, long long(*f)(char)) {
long long ret = 0;
for(int j(1), i(str.size() - 1); i >= 0; --i, j *= 16) {
ret += (j * f(str[i]));
}
return ret;
}

I also defined a function that populates a vector of strings with random hex strings:

std::vector<std::string>
populate_vec(int const N) {
random_device rd;
mt19937 eng{ rd() };
uniform_int_distribution<long long> distr(0, std::numeric_limits<long long>::max() - 1);
std::vector<std::string> out(N);
for(int i(0); i < N; ++i) {
out[i] = int_to_hex(distr(eng));
}
return out;
}

I created vectors populated with 50000, 100000, 150000, 200000 and 250000 random hex strings respectively. Then for each algorithm I run 100 experiments and averaged the time results.

Compiler was GCC version 5.2 with optimization option -O3.

Results:

Sample Image

Sample Image

Sample Image

Sample Image

Sample Image

Sample Image

Discussion

From the results we can conclude that for these experimental settings the proposed table method out-performs all the other methods. The if-else method is by far the worst where as the unordered_map although it wins the if-else method it is significantly slower than the other proposed methods.

CODE

Edit:

Results for method proposed by stgatilov, with bitwise operations:

long long hextoint(char x) {
int b = uint8_t(x);
int maskLetter = (('9' - b) >> 31);
int maskSmall = (('Z' - b) >> 31);
int offset = '0' + (maskLetter & int('A' - '0' - 10)) + (maskSmall & int('a' - 'A'));
return b - offset;
}

Sample Image

Edit:

I also tested the original code from g24l against the table method:

long long hextoint(char in) {
long long const x = in;
return x < 58? x - 48 : x - 87;
}

Note that this method doesn't handle capital letters A, B, C, D, E and F.

Results:

Sample Image

Still the table method renders faster.



Related Topics



Leave a reply



Submit