Using Hardware Performance Counters in Linux

Using Hardware Performance Counters in Linux

You can use Perfctr or PAPI if you want to count hardware events on some part of the program internally (without starting any 3rd party tool).

Perfctr quickstart: http://www.ale.csce.kyushu-u.ac.jp/~satoshi/how_to_use_perfctr.htm

PAPI homepage: http://icl.cs.utk.edu/papi/

PerfSuite good doc: http://perfsuite.ncsa.illinois.edu/publications/LJ135/x27.html

If you can do this externally, there is a perf command of modern Linux.

perf wiki: https://perf.wiki.kernel.org/index.php/Main_Page

Best way to test a custom kernel for hardware performance counters

I found a solution : KVM + QEMU emulator.

To use PMU, I changed this parameter in the VM parameters (XML format) :

<cpu mode='host-passthrough'/>

Or you can add this option in cmd line :

-cpu host

I followed in part this page for building the kernel on qemu and for the counters this page.

Find out how many hardware performance counters a CPU has

Regarding Intel processors, you can:

  • look in this intel documentation manual chapter 18, but it's not so easy to read.

  • use the cpuid instruction - This will require to write assembly code to correctly set parameters and get results.

  • download and compile/install the papi library and run papi_avail | more. The result on my laptop is:

PAPI Version : 5.1.1.0

Vendor string and code : GenuineIntel (1)

Model string and code : Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz (42)

CPU Revision : 7.000000

CPUID Info : Family: 6 Model: 42 Stepping: 7

CPU Max Megahertz : 2494

CPU Min Megahertz : 2494

Hdw Threads per core : 2

Cores per Socket : 2

NUMA Nodes : 1

CPUs per Node : 4

Total CPUs : 4

Running in a VM : no

Number Hardware Counters : 11

Max Multiplex Counters : 64

Monitoring performance counters during execution of a specific function

I could only find the implementation of the toggle events feature in the /perf/core_toggle repo, which is maintained by the developer of the feature. You can probably compile that code and play with the feature yourself. You can find examples on how to use it here. However, I don't think it has been accepted yet in the main Linux repo for any version of the kernel.

If you want to measure the number of one or more events, then there are alternatives that are easy to use, but require adding a few lines of code to your codebase. You can programmatically use the perf interface or other third-party tools that offer such APIs such as PAPI and LIKWID.

system call hardware performance counters ubuntu

There is small wrapper library https://github.com/castl/easyperf for perf_event_open in counting mode, just like used by perf stat (its output was quoted by you).

You can setup hw event counting (with in kernel counting enabled - PERFMON_EVENTSEL_OS flag in easyperf), then read current values of counters perf_read_all in the wrapper, then run your function (syscall) you want to profile, and then read new counter values. Difference between old and new values is estimation of target function cost. Check this test, target function is foo:

https://github.com/castl/easyperf/blob/master/test.c

You can't measure too small calls, because reading hw counters from perf_event_open is done via several read syscalls. So, do several similar syscalls (loop of 100 or 1000), or do syscalls which does more work, or try to measure overhead of reads to get hw counters (measure empty 'foo' function to get overhead; then measure your target short function, then compare differences.)

ARMv7 instructions to access performance counters directly from assembly language

There are some examples of direct PMU performance counters usage on ARM, for example

armv7: http://neocontra.blogspot.com/2013/05/user-mode-performance-counters-for.html

armv8: http://zhiyisun.github.io/2016/03/02/How-to-Use-Performance-Monitor-Unit-(PMU)-of-64-bit-ARMv8-A-in-Linux.html

So the first thing is to create a kernel module to enable user-mode access to PMU counters. Below is the code to set PMU register PMUSERENR_EL0 to enable user-mode access.

/*Enable user-mode access to counters. */
asm volatile("msr pmuserenr_el0, %0" : : "r"((u64)ARMV8_PMUSERENR_EN_EL0|ARMV8_PMUSERENR_ER|ARMV8_PMUSERENR_CR));

/* Performance Monitors Count Enable Set register bit 30:0 disable, 31 enable. Can also enable other event counters here. */
asm volatile("msr pmcntenset_el0, %0" : : "r" (ARMV8_PMCNTENSET_EL0_ENABLE));

/* Enable counters */
u64 val=0;
asm volatile("mrs %0, pmcr_el0" : "=r" (val));
asm volatile("msr pmcr_el0, %0" : : "r" (val|ARMV8_PMCR_E));

But performance counters are privileged part of system, by default they are only accessible from kernel mode. You can't just use assembly instructions in user space code to use them, and only result you will get is SIGSEGV or other variant of permission denied. To enable access from user-space, some work should be done in kernel mode. It can be any of existing PMU driver: perf or oprofile (older pmu access tool), or it can be some custom kernel module which will enable user-space access to PMU registers. But to compile your module you still need most of kernel development infrastructure for your kernel (I expect that standard chromebook kernel has no kernel includes "kbuild" to do module build, and this kernel may not accept unsigned modules in standard configuration).

What can you do:

  • Use another machine, something more recent than your outdated chromebook. Your project may have some machines in remote access, you can try to buy some small and popular ARM single-board computer with linux (like raspberry pi 3/4). That popular board will have more recent arm cpu core, and it will have ubuntu or debian
  • Check oprofile subsystem, it may be enabled in your kernel. Oprofile tools are older than perf but can access PMU counters too.
  • Recompile linux kernel with perf_events subsystem enabled. You need only correct kernel which will boot on your chromebook, and any compiler to rebuild perf out-of-tree from https://mirrors.edge.kernel.org/pub/linux/kernel/tools/perf/ (use any version of perf). Or use perf_event_open syscall directly
  • Check for /lib/modules/`uname -r`/build directory. If it exists, you can try to build custom kernel module to enable user-space direct access

TRM on pmcr_el0 and other PMU registers: https://developer.arm.com/documentation/100442/0100/debug-registers/aarch64-pmu-registers/pmcr-el0--performance-monitors-control-register--el0 https://developer.arm.com/docs/ddi0595/h/aarch64-system-registers/pmcr_el0 https://developer.arm.com/docs/ddi0595/h/aarch32-system-registers/pmccntr https://developer.arm.com/documentation/ddi0535/c/performance-monitoring-unit and some overview https://people.inf.ethz.ch/markusp/teaching/263-2300-ETH-spring14/slides/08-perfcounters.pdf



Related Topics



Leave a reply



Submit