How to Convert a Value from Host Byte Order to Little Endian

How do I convert a value from host byte order to little endian?

Something like the following:

unsigned short swaps( unsigned short val)
{
return ((val & 0xff) << 8) | ((val & 0xff00) >> 8);
}

/* host to little endian */

#define PLATFORM_IS_BIG_ENDIAN 1
#if PLATFORM_IS_LITTLE_ENDIAN
unsigned short htoles( unsigned short val)
{
/* no-op on a little endian platform */
return val;
}
#elif PLATFORM_IS_BIG_ENDIAN
unsigned short htoles( unsigned short val)
{
/* need to swap bytes on a big endian platform */
return swaps( val);
}
#else
unsigned short htoles( unsigned short val)
{
/* the platform hasn't been properly configured for the */
/* preprocessor to know if it's little or big endian */

/* use potentially less-performant, but always works option */

return swaps( htons(val));
}
#endif

If you have a system that's properly configured (such that the preprocessor knows whether the target id little or big endian) you get an 'optimized' version of htoles(). Otherwise you get the potentially non-optimized version that depends on htons(). In any case, you get something that works.

Nothing too tricky and more or less portable.

Of course, you can further improve the optimization possibilities by implementing this with inline or as macros as you see fit.

You might want to look at something like the "Portable Open Source Harness (POSH)" for an actual implementation that defines the endianness for various compilers. Note, getting to the library requires going though a pseudo-authentication page (though you don't need to register to give any personal details): http://hookatooka.com/poshlib/

Is there an architecture-independent method to create a little-endian byte stream from a value in C?

You can always serialize an uint64_t value to array of uint8_t in little endian order as simply

uint64_t source = ...;
uint8_t target[8];

target[0] = source;
target[1] = source >> 8;
target[2] = source >> 16;
target[3] = source >> 24;
target[4] = source >> 32;
target[5] = source >> 40;
target[6] = source >> 48;
target[7] = source >> 56;

or

for (int i = 0; i < sizeof (uint64_t); i++) {
target[i] = source >> i * 8;
}

and this will work anywhere where uint64_t and uint8_t exists.

Notice that this assumes that the source value is unsigned. Bit-shifting negative signed values will cause all sorts of headaches and you just don't want to do that.


Deserialization is a bit more complex if reading byte at a time in order:

uint8_t source[8] = ...;
uint64_t target = 0;

for (int i = 0; i < sizeof (uint64_t); i ++) {
target |= (uint64_t)source[i] << i * 8;
}

The cast to (uint64_t) is absolutely necessary, because the operands of << will undergo integer promotions, and uint8_t would always be converted to a signed int - and "funny" things will happen when you shift a set bit into the sign bit of a signed int.


If you write this into a function

#include <inttypes.h>

void serialize(uint64_t source, uint8_t *target) {
target[0] = source;
target[1] = source >> 8;
target[2] = source >> 16;
target[3] = source >> 24;
target[4] = source >> 32;
target[5] = source >> 40;
target[6] = source >> 48;
target[7] = source >> 56;
}

and compile for x86-64 using GCC 11 and -O3, the function will be compiled to

serialize:
movq %rdi, (%rsi)
ret

which just moves the 64-bit value of source into target array as is. If you reverse the indices (7 ... 0; big-endian), GCC will be clever enough to recognize that too and will compile it (with -O3) to

serialize:
bswap %rdi
movq %rdi, (%rsi)
ret

Converting network byte order (big endian) to little endian

The documentation specifically says that ntohl's netlong parameter is a 32-bit value:

netlong [in]

A 32-bit number in TCP/IP network byte order.


I have read that long and int are not the same i.e. long is not
guaranteed to be 32 bits or the same size of an integer INT_MAX.

You're right -- in Standard C++ a long is not guaranteed to be any particular size, except that it must be at least 32 bits.

However since we're talking about the endian conversion functions, we're talking about platform-specifics here. We need to drill down now in to what a long is under Windows. And under Windows, a long is 32-bits:

LONG

A 32-bit signed integer. The range is –2147483648 through 2147483647
decimal. This type is declared in WinNT.h as follows:

typedef long LONG;

Convert String in Host byte order (Little endian) to Network byte order (Big endian)

You can use pack / unpack functions to convert endianness:

/**
* Convert $endian hex string to specified $format
*
* @param string $endian Endian HEX string
* @param string $format Endian format: 'N' - big endian, 'V' - little endian
*
* @return string
*/
function formatEndian($endian, $format = 'N') {
$endian = intval($endian, 16); // convert string to hex
$endian = pack('L', $endian); // pack hex to binary sting (unsinged long, machine byte order)
$endian = unpack($format, $endian); // convert binary sting to specified endian format

return sprintf("%'.08x", $endian[1]); // return endian as a hex string (with padding zero)
}

$endian = '18000000';
$big = formatEndian($endian, 'N'); // string "00000018"
$little = formatEndian($endian, 'V'); // string "18000000"

To learn more about pack format take a look at http://www.php.net/manual/en/function.pack.php

How do you write (portably) reverse network byte order?

Warning: This only works on unsigned integers, because signed right shift is implementation defined and can lead to vulnerabilities (https://stackoverflow.com/a/7522498/395029)

C already provides an abstraction over the host's endianness: the number† or int†.

Producing output in a given endianness can be done portably by not trying to be clever: simply interpret the numbers as numbers and use bit shifts to extract each byte:

uint32_t value;
uint8_t lolo = (value >> 0) & 0xFF;
uint8_t lohi = (value >> 8) & 0xFF;
uint8_t hilo = (value >> 16) & 0xFF;
uint8_t hihi = (value >> 24) & 0xFF;

Then you just write the bytes in whatever order you desire.

When you are taking byte sequences with some endianness as input, you can reconstruct them in the host's endianness by again constructing numbers with bit operations:

uint32_t value = (hihi << 24)
| (hilo << 16)
| (lohi << 8)
| (lolo << 0);

† Only the representations of numbers as byte sequences have endianness; numbers (i.e. quantities) don't.

How do I convert between big-endian and little-endian values in C++?

If you're using Visual C++ do the following: You include intrin.h and call the following functions:

For 16 bit numbers:

unsigned short _byteswap_ushort(unsigned short value);

For 32 bit numbers:

unsigned long _byteswap_ulong(unsigned long value);

For 64 bit numbers:

unsigned __int64 _byteswap_uint64(unsigned __int64 value);

8 bit numbers (chars) don't need to be converted.

Also these are only defined for unsigned values they work for signed integers as well.

For floats and doubles it's more difficult as with plain integers as these may or not may be in the host machines byte-order. You can get little-endian floats on big-endian machines and vice versa.

Other compilers have similar intrinsics as well.

In GCC for example you can directly call some builtins as documented here:

uint32_t __builtin_bswap32 (uint32_t x)
uint64_t __builtin_bswap64 (uint64_t x)

(no need to include something). Afaik bits.h declares the same function in a non gcc-centric way as well.

16 bit swap it's just a bit-rotate.

Calling the intrinsics instead of rolling your own gives you the best performance and code density btw..

inet_netof() and returned host-byte order

0x00C0A832 is, as you said, 0.192.168.50.

It's in host byte order, and the correct value.

The phrasing in the documentation is a bit vague("returns the network number part of the Internet Address in"), but if I check an implementation, there's a right shift, so everything seems to be fine.



Related Topics



Leave a reply



Submit