How to Change Endianness Mid-Execution on Arm (Android/Linux)

Is it possible to change endianness mid-execution on ARM (Android/Linux)?

It is (on ARMv7 and older hardware at least) certainly possible, but your sentiment is entirely correct - anyone else, please, please, for the sake of sanity, forward-compatibility and angry kernel developers, don't do this in Linux/Android - use REV, REV16, REVSH or VREV on data as appropriate.

The SETEND instruction, introduced in ARMv6, allows switching the endianness of the current execution state at any privilege level, however from ARMv8 it is deprecated, disabled by default, and likely to disappear entirely in future. Supporting mixed-endianness in hardware is optional in ARMv8.

Despite being a terrible idea, it's apparently commonplace enough in Android apps currently in the wild (among possible other uses, it's supposedly the fastest way to implement strcmp() on ARM11, and maybe also Cortex-A8) that SETEND emulation for 32-bit tasks has recently had to be added to the arm64 kernel, so chances are your tools should at least be aware of it, too.

How do I convert between big-endian and little-endian values in C++?

If you're using Visual C++ do the following: You include intrin.h and call the following functions:

For 16 bit numbers:

unsigned short _byteswap_ushort(unsigned short value);

For 32 bit numbers:

unsigned long _byteswap_ulong(unsigned long value);

For 64 bit numbers:

unsigned __int64 _byteswap_uint64(unsigned __int64 value);

8 bit numbers (chars) don't need to be converted.

Also these are only defined for unsigned values they work for signed integers as well.

For floats and doubles it's more difficult as with plain integers as these may or not may be in the host machines byte-order. You can get little-endian floats on big-endian machines and vice versa.

Other compilers have similar intrinsics as well.

In GCC for example you can directly call some builtins as documented here:

uint32_t __builtin_bswap32 (uint32_t x)
uint64_t __builtin_bswap64 (uint64_t x)

(no need to include something). Afaik bits.h declares the same function in a non gcc-centric way as well.

16 bit swap it's just a bit-rotate.

Calling the intrinsics instead of rolling your own gives you the best performance and code density btw..

Data conversion for ARM platform (from x86/x64)

Endianess only matters for register <-> memory operations.

In a register there is no endianess. If you put

int nIntVal = 0x12345678

in your code it will have the same effect on any endianess machine.

all IEEE formats (float, double) are identical in all architectures, so this does not matter.

You only have to care about endianess in two cases:

a) You write integers to files that have to be transferable between the two architectures.
Solution: Use the hton*, ntoh* family of converters, use a non-binary file format (e.g. XML) or a standardised file format (e.g. SQLite).

b) You cast integer pointers.

int a = 0x1875824715;
char b = a;
char c = *(char *)&a;
if (b == c) {
// You are working on Little endian
}

The latter code by the way is a handy way of testing your endianess at runtime.

Arrays and the likes if you use write, fwrite falimies of calls to transfer them you will have no problems unless they contain integers: then look above.

int64_t: look above. Only care if you have to store them binary in files or cast pointers.

Macro definition to determine big endian or little endian machine?

Code supporting arbitrary byte orders, ready to be put into a file called order32.h:

#ifndef ORDER32_H
#define ORDER32_H

#include <limits.h>
#include <stdint.h>

#if CHAR_BIT != 8
#error "unsupported char size"
#endif

enum
{
O32_LITTLE_ENDIAN = 0x03020100ul,
O32_BIG_ENDIAN = 0x00010203ul,
O32_PDP_ENDIAN = 0x01000302ul, /* DEC PDP-11 (aka ENDIAN_LITTLE_WORD) */
O32_HONEYWELL_ENDIAN = 0x02030001ul /* Honeywell 316 (aka ENDIAN_BIG_WORD) */
};

static const union { unsigned char bytes[4]; uint32_t value; } o32_host_order =
{ { 0, 1, 2, 3 } };

#define O32_HOST_ORDER (o32_host_order.value)

#endif

You would check for little endian systems via

O32_HOST_ORDER == O32_LITTLE_ENDIAN

Compute clock cycle count on ARM Cortex-a8 BeagleBone Black

As there can be many obstacles along the way, below is the complete guide how to build that kernel module and user-space application.

Toolchain

First of all, you need to download and install 2 toolchains:

  1. Toolchain for building kernel (and kernel modules): bare-metal (EABI) toolchain
  2. Toolchain for building user-space application: GNU/Linux toolchain

I recommend you to use Linaro ARM toolchains, as they are free, reliable and well optimized for ARM. Here you can choose desired toolchains (in "Linaro Toolchain" section). On BeagleBone Black you have little-endian architecture by default (like on most ARMv7 processors), so download next two archives:

  1. linaro-toolchain-binaries (little-endian) Bare Metal
  2. linaro-toolchain-binaries (little-endian) Linux

Once downloaded, extract those archives into /opt directory.

Kernel sources

First of all, you need to find out which exactly kernel sources were used to build the kernel which flashed to your board. You can try to figure that out (by your board revision) from here. Or you can build your own kernel, flash it to your board, and now you know exactly which kernel version is in use.

Anyway, you need to download correct kernel sources (which correspond to kernel on your board). Those sources will be used further to build kernel module. If kernel version is incorrect, you will have "magic mismatch" error or something like that on module loading.

I will use stable kernel sources from kernel.org just for references (it should be sufficient at least to build the module).

Build kernel

Run next commands in your terminal to configure shell environment (bare-metal toolchain) for kernel building:

$ export PATH=/opt/gcc-linaro-5.1-2015.08-x86_64_arm-eabi/bin:$PATH
$ export CROSS_COMPILE=arm-eabi-
$ export ARCH=arm

Configure kernel using defconfig for your board (from arch/arm/configs/). I will use omap2plus_defconfig for example:

$ make omap2plus_defconfig

Now either build the whole kernel:

$ make -j4

or prepare needed kernel files for building external module:

$ make prepare
$ make modules_prepare

In second case the module will not have dependency list and probably you will need to use "force" option when loading it. So the preferred option is building the whole kernel.

Kernel module

NOTE: the code I'm gonna use further is from this answer.

First you need to enable ARM performance counter for user-space access (details are here). It can be done only in kernel-space. Here is the module code and Makefile you can use to do so:

perfcnt_enable.c:

#include <linux/module.h>

static int __init perfcnt_enable_init(void)
{

/* Enable user-mode access to the performance counter */
asm ("mcr p15, 0, %0, C9, C14, 0\n\t" :: "r"(1));

/* Disable counter overflow interrupts (just in case) */
asm ("mcr p15, 0, %0, C9, C14, 2\n\t" :: "r"(0x8000000f));

pr_debug("### perfcnt_enable module is loaded\n");
return 0;
}

static void __exit perfcnt_enable_exit(void)
{
}

module_init(perfcnt_enable_init);
module_exit(perfcnt_enable_exit);

MODULE_AUTHOR("Sam Protsenko");
MODULE_DESCRIPTION("Module for enabling performance counter on ARMv7");
MODULE_LICENSE("GPL");

Makefile:

ifneq ($(KERNELRELEASE),)

# kbuild part of makefile

CFLAGS_perfcnt_enable.o := -DDEBUG
obj-m := perfcnt_enable.o

else

# normal makefile

KDIR ?= /lib/modules/$(shell uname -r)/build

module:
$(MAKE) -C $(KDIR) M=$(PWD) modules

clean:
$(MAKE) -C $(KDIR) M=$(PWD) clean

.PHONY: module clean

endif

Build kernel module

Using configured shell environment from previous step, let's export one more environment variable:

$ export KDIR=/path/to/your/kernel/sources/dir

Now just run:

$ make

The module is built (perfcnt_enable.ko file).

User-space application

Once ARM performance counter is enabled in kernel-space (by kernel module), you can read its value in user-space application. Here is the example of such application.

perfcnt_test.c:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

static unsigned int get_cyclecount(void)
{
unsigned int value;

/* Read CCNT Register */
asm volatile ("mrc p15, 0, %0, c9, c13, 0\t\n": "=r"(value));

return value;
}

static void init_perfcounters(int32_t do_reset, int32_t enable_divider)
{
/* In general enable all counters (including cycle counter) */
int32_t value = 1;

/* Peform reset */
if (do_reset) {
value |= 2; /* reset all counters to zero */
value |= 4; /* reset cycle counter to zero */
}

if (enable_divider)
value |= 8; /* enable "by 64" divider for CCNT */

value |= 16;

/* Program the performance-counter control-register */
asm volatile ("mcr p15, 0, %0, c9, c12, 0\t\n" :: "r"(value));

/* Enable all counters */
asm volatile ("mcr p15, 0, %0, c9, c12, 1\t\n" :: "r"(0x8000000f));

/* Clear overflows */
asm volatile ("mcr p15, 0, %0, c9, c12, 3\t\n" :: "r"(0x8000000f));
}

int main(void)
{
unsigned int overhead;
unsigned int t;

/* Init counters */
init_perfcounters(1, 0);

/* Measure the counting overhead */
overhead = get_cyclecount();
overhead = get_cyclecount() - overhead;

/* Measure ticks for some operation */
t = get_cyclecount();
sleep(1);
t = get_cyclecount() - t;

printf("function took exactly %d cycles (including function call)\n",
t - overhead);

return EXIT_SUCCESS;
}

Makefile:

CC = gcc
APP = perfcnt_test
SOURCES = perfcnt_test.c
CFLAGS = -Wall -O2 -static

default:
$(CROSS_COMPILE)$(CC) $(CFLAGS) $(SOURCES) -o $(APP)

clean:
-rm -f $(APP)

.PHONY: default clean

Notice that I added -static option just in case if you are using Android etc. If your distro has regular libc, you can probably remove that flag to reduce size of result binary.

Build user-space application

Prepare shell environment (Linux toolchain):

$ export PATH=/opt/gcc-linaro-5.1-2015.08-x86_64_arm-linux-gnueabihf/bin:$PATH
$ export CROSS_COMPILE=arm-linux-gnueabihf-

Build the application:

$ make

Output binary is perfcnt_test.

Testing

  1. Upload both kernel module and user-space application to your board.
  2. Load the module:

    # insmod perfcnt_enable.ko
  3. Run the application:

    # ./perfcnt_test


Related Topics



Leave a reply



Submit