ARMv7 instructions to access performance counters directly from assembly language
There are some examples of direct PMU performance counters usage on ARM, for example
armv7: http://neocontra.blogspot.com/2013/05/user-mode-performance-counters-for.html
armv8: http://zhiyisun.github.io/2016/03/02/How-to-Use-Performance-Monitor-Unit-(PMU)-of-64-bit-ARMv8-A-in-Linux.html
So the first thing is to create a kernel module to enable user-mode access to PMU counters. Below is the code to set PMU register PMUSERENR_EL0 to enable user-mode access.
/*Enable user-mode access to counters. */
asm volatile("msr pmuserenr_el0, %0" : : "r"((u64)ARMV8_PMUSERENR_EN_EL0|ARMV8_PMUSERENR_ER|ARMV8_PMUSERENR_CR));
/* Performance Monitors Count Enable Set register bit 30:0 disable, 31 enable. Can also enable other event counters here. */
asm volatile("msr pmcntenset_el0, %0" : : "r" (ARMV8_PMCNTENSET_EL0_ENABLE));
/* Enable counters */
u64 val=0;
asm volatile("mrs %0, pmcr_el0" : "=r" (val));
asm volatile("msr pmcr_el0, %0" : : "r" (val|ARMV8_PMCR_E));
But performance counters are privileged part of system, by default they are only accessible from kernel mode. You can't just use assembly instructions in user space code to use them, and only result you will get is SIGSEGV or other variant of permission denied. To enable access from user-space, some work should be done in kernel mode. It can be any of existing PMU driver: perf or oprofile (older pmu access tool), or it can be some custom kernel module which will enable user-space access to PMU registers. But to compile your module you still need most of kernel development infrastructure for your kernel (I expect that standard chromebook kernel has no kernel includes "kbuild" to do module build, and this kernel may not accept unsigned modules in standard configuration).
What can you do:
- Use another machine, something more recent than your outdated chromebook. Your project may have some machines in remote access, you can try to buy some small and popular ARM single-board computer with linux (like raspberry pi 3/4). That popular board will have more recent arm cpu core, and it will have ubuntu or debian
- Check oprofile subsystem, it may be enabled in your kernel. Oprofile tools are older than perf but can access PMU counters too.
- Recompile linux kernel with perf_events subsystem enabled. You need only correct kernel which will boot on your chromebook, and any compiler to rebuild perf out-of-tree from https://mirrors.edge.kernel.org/pub/linux/kernel/tools/perf/ (use any version of perf). Or use
perf_event_open
syscall directly - Check for
/lib/modules/`uname -r`/build
directory. If it exists, you can try to build custom kernel module to enable user-space direct access
TRM on pmcr_el0 and other PMU registers: https://developer.arm.com/documentation/100442/0100/debug-registers/aarch64-pmu-registers/pmcr-el0--performance-monitors-control-register--el0 https://developer.arm.com/docs/ddi0595/h/aarch64-system-registers/pmcr_el0 https://developer.arm.com/docs/ddi0595/h/aarch32-system-registers/pmccntr https://developer.arm.com/documentation/ddi0535/c/performance-monitoring-unit and some overview https://people.inf.ethz.ch/markusp/teaching/263-2300-ETH-spring14/slides/08-perfcounters.pdf
How to read PMC(Performance Monitoring Counter) of x86 intel processor
I know there is perf tool to get a list of statistics of a program. But what I am trying to do is read performance counter directly without using the perf tool.
If you do not want to use perf
tool, you can try to use oprofile
tool or intel vtune or https://github.com/RRZE-HPC/likwid or https://github.com/opcm/pcm. Or you can use perf_event_open syscall which is how perf tool works (you can study or modify perf tool sources from https://mirrors.edge.kernel.org/pub/linux/kernel/tools/perf/ - and perf tool version may not be equal to kernel version).
If you want to access msr registers as root, use modprobe msr
(this is standard kernel module, already compiled for your kernel in ubuntu) and wrmsr
and rdmsr
tools (msr-tools deb/ubuntu package, by intel), like in slide 27 of Performance Monitoring Chris Dahnken Intel SSG EMEA HPCTC presentation.
I don't understand why do you want to work with performance counters without perf tool. If you want to get counter readings from inside of your program, for example before and after some loops, you can use perf_event_open syscall (with specific ioctls) directly. (Or try to use perf stat
+ same ioctls PERF_EVENT_IOC_* or try to learn perf + JIT integration)
Or you can use existing kernel module which will export msr register access to root user - the msr.ko. And msr tools - https://01.org/msr-tools. Or with this msr+pmc example https://technicalandstuff.wordpress.com/2015/05/15/using-intels-pcm-in-linux-and-inside-c/ + https://software.intel.com/en-us/articles/intel-performance-counter-monitor (https://github.com/opcm/pcm)
There are also some examples of perf counters usage in https://github.com/RRZE-HPC/likwid.
You can also use PAPI library to access counters from your code, it will handle most of perf_event_open stuff for you. http://icl.cs.utk.edu/projects/papi/wiki/PAPITopics:Getting_Started
First Questions is I downloaded this code https://github.com/softdevteam/user_rdpmc ... "insmod" the .ko file, the system hangs.
There are too low "Stars" rating and the code is too old (2016) to really doing any investigations on the hang. Direct access of PMC may interfere with NMI watchdog (do echo 0 > /proc/sys/kernel/nmi_watchdog
as root) or other perf
session. It is safer to use perf_event_open syscall.
Second question ... discovered that core.c file under linux-5.5.3/arch/x86/events/intel directory actually does setting and reading the performance counters
This file is part of perf_event_open syscall implementation (perf_events subsystem of the kernel, https://github.com/torvalds/linux/tree/master/kernel/events + https://github.com/torvalds/linux/tree/master/arch/x86/events).
To use this code you can use the perf
tool or perf_event_open syscall.
You should not compile the perf_events subsystem of the kernel as separate module because it is already compiled into your kernel (intel/amd specific part can be partially ko) and the Subsystem itself does not support compilation as module:
https://github.com/torvalds/linux/tree/master/kernel/events
Makefile: obj-y := core.o ring_buffer.o callchain.o
How can I make my Ubuntu kernel use all the files linked to core.c from kernel.org and build the .ko file?
Your ubuntu kernel already have all perf_events subsystem files compiled, some are linked into the kernel image and other are .ko files already installed like intel-rapl-perf.ko
$ grep _PERF_ /boot/config-`uname -r`
$ ls -l /lib/modules/`uname -r`/kernel/arch/x86/events/intel
How to Configure and Sample Intel Performance Counters In-Process
It seems the best way -- for Linux at least -- is to use the msr device node.
You simply open a device node, seek to the address of the MSR required, and read or write 8 bytes.
OpenBSD is harder, since (at the time of writing) there is no user-space proxy to the MSRs. So you would need to write a kernel module or implement a sysctl by hand.
Hardware performance counter APIs for Windows
You can use RDPMC instruction or __readpmc MSVC compiler intrinsic, which is the same thing.
However, Windows prohibits user-mode applications to execute this instruction by setting CR4.PCE to 0. Presumably, this is done because the meaning of each counter is determined by MSR registers, which are only accessible in kernel mode. In other words, unless you're a kernel-mode module (e.g. a device driver), you are going to get "privileged instruction" trap if you attempt to execute this instruction.
If you're writing a user-mode application, your only option is (as @Christopher mentioned in comments) to write a kernel module which would execute this instruction for you (you'll incur user->kernel call penalty) and enable test signing on your machine so your presumably self-signed "driver" can be loaded. This means you can't easily distribute this app, but that'll work for in-house tuning.
Related Topics
Problems Building Libcurl 7.21.2 on Ubuntu 11.10 (Hiphop)
Difference Between Different Ways of Running Shell Script
How to Run a Cron Job with Arguments and Pass Results to a Log
How Existing Kernel Driver Should Be Initialized as Pci Memory-Mapped
Tasklist.Exe Equivalent in Linux
How to Add My Scheduler to Linux Kernel
Linux Command 'Ll' Is Not Working
Restart Process on File Change in Linux
Kubernetes Pods Terminated - Exit Code 137
How to Find All Image Tags of a Running Docker Container
How to Use Valgrind for Memory Profile
Create/Delete Users from Text File Using Bash Script
How to Zip Folder That Contains More Than 12Gb Data