Maximum sampling frequency supported by perf
Linux kernel tracks of how long perf's non-maskable interrupt(NMI) handler is performing. If the sample duration exceeds a configurable threshold(perf_cpu_time_max_percent), it drops the sample rate. This allows to prevent system from hanging because it spends all of its time handling sampling process. In this case you will see the following message in kernel logs:
perf samples too long (2506 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
You can disable this throttling mechanism by setting perf_cpu_time_max_percent to 0:
sysctl -w kernel.perf_cpu_time_max_percent=0
Useful links:
- Documentation for the sysctl files perf_cpu_time_max_percent
- Linux kernel sources
- See also this patch which introduces this throttling mechanism
How can I sample at constant rate with perf_event_open?
I have solved the issue.
In order to be sure to be effectively sampling at a constant period of time using the perf_event system call, it is required to:
- Specify the event sampling period with the
sample_period
field inside theperf_event_attr struct
. As documented in perf wiki tutorial you should pay attention on that number exceeding a value representable as 32 bits, otherwise the system will silently truncates it. - Do not specify the sampling period with the
sample_freq
field inside theperf_event_attr struct
. If you do so, the kernel will dynamically change the sampling period in order to achieve the desired frequency, as documented in perf wiki tutorial - Check with
dmesg
the kernel log: you should not get any kind of message. Otherwise, as suggested by Arnabjyoti Kalita, the sampling period value may be too near to the maximum set in the kernel variableperf_event_max_sample_rate
. In that case the best approach is to modify the kernel variable usingsudo sysctl -w kernel/perf_cpu_time_max_percent=0
, such that the kernel won't by default modify the sampling period even if this may bring the computer to get stuck.
The kernel implements this functionality to avoid becoming stuck when too many PMI (Performance Monitoring Interrupts) take too much computation time. Those interrupt are hooked by the perf_event functionality and will be triggered when the sampling event overflows out of the specified sampling period value.
For further information about the perf_event kernel variables, consult the documentation you can find in the kernel source code at linux-VERSION/Documentation/sysctl/kernel.txt - Set the cpu frequency to be constant with
sudo cpufreq-set -g powersave -c $((i))
such that your cpu's measurements won't be affected
by frequency scaling. - Compute the differences of values of the sampling event between samples and verify it is (within a small margin error) equal to the specified sampling period.
I asked this question because I believed that getting a different number of samples is related to the kernel modifying the sampling period.
It is not. (It may be but not in my case)
The reason I get a different number of samples is because any kind of monitored thread may take, during its execution, a different time to execute.
To those 'longer' lines indeed I obtain a bigger time value.
This is due to the fact that the process may have occasionally some cache misses or different mispredicted branches.
Some futher work to be done in this direction may be:
- Using other time events to sample the process and monitor the differences
- Using other counters, detect what's the cause of a longer execution time for certain runs
If someone finds out new informations about this, please comment/modify the answer or contact me.
perf_event_open and PERF_COUNT_HW_INSTRUCTIONS
The perf_event_open manpage http://man7.org/linux/man-pages/man2/perf_event_open.2.html
says about PERF_COUNT_HW_INSTRUCTIONS:
PERF_COUNT_HW_INSTRUCTIONS
Retired instructions. Be careful, these can be affected by various issues, most notably hardware interrupt counts.
I think this means that COUNT_HW_INSTRUCTIONS can be used (and it is supported almost everywhere). But exact values of COUNT_HW_INSTRUCTIONS for some code fragment may be slightly different in several runs due to noise from interrupts or another logic.
So it is safe to use events PERF_COUNT_HW_INSTRUCTIONS and PERF_COUNT_HW_CPU_CYCLES on most CPU. perf_events subsystem in Linux kernel will map COUNT_HW_CPU_CYCLES to some raw events more suitable to currently used CPU and its PMU.
Depending on your goals you should try to get some statistics on PERF_COUNT_HW_INSTRUCTIONS values for your code fragment. You can also check stability of this counter with several runs of perf stat
with some simple program:
perf stat -e cycles:u,instructions:u /bin/echo 123
perf stat -e cycles:u,instructions:u /bin/echo 123
perf stat -e cycles:u,instructions:u /bin/echo 123
Or use integrated repeat function of perf stat:
perf stat --repeat 10 -e cycles:u,instructions:u /bin/echo 123
I have +-10 instructions events variation (less than 0.1%) for 200 thousands total instructions executed, so it is very stable. For cycles I have 5% variation, so it should be cycles event marked with careful warning.
perf_event_open always returns -1
From the lack of anything relevant in dmesg and sysfs, it should hopefully now be apparent that the PMU isn't being described to the kernel. Thus perf events doesn't know anything about the hardware event you're asking for, so it's little surprise that it fails to open it. What you need to do is make sure the kernel does know about the PMU, so that the driver picks it up - said driver should already be built-in via CONFIG_HW_PERF_EVENTS, which is on by default with CONFIG_PERF_EVENTS and doesn't look to be disabled in your config, but it might be worth double-checking.
It looks like the PMU is described in the devicetree in their 3.18 kernel, so my best guess is that your board might be booting using the legacy boardfile rather than FDT. I don't know much about Raspberry Pi specifics, but judging by this fairly exhaustive article (I'd say skip directly to section 3.1), it seems relatively straightforward to reconfigure the bootloader to use FDT.
What is the default behavior of perf record?
The default event is cycles
, as can be seen by running perf script
after perf record
. There, you can also see that the default sampling behavior is time-based, since the number of cycles is not constant. The default frequency is 4000 Hz, which can be seen in the source code and checked by comparing the file size or number of samples to a recording where -F 4000
was specified.
The perf wiki says that the rate is 1000 Hz, but this is not true anymore for kernels newer than 3.4.
Related Topics
How to Make a Bash Script Portable Between Linux and Freebsd
How to Apply Password to Sudo in One Line Command and Execute Su Root
How to Make Perl Wait for Child Processes Started in the Background with System()
How to Sort with Multiple Lines in Bash
Creating a Self-Extracting Zip Archive on a Linux Box
How to Redirect Nohup Output to a Specified File
Elf Dynamic Loader Symbol Lookup Ordering
What Does the Gcc Error Message, "Error: Unsupported for 'Mov'", Mean
How to Check If a Files Exists in a Specific Directory in a Bash Script
How to Delete Duplicated Rows Based in a Column Value
Assigning Output of a Command to a Variable(Bash)
Mpi_Send Takes Huge Part of Virtual Memory
Cannot Install Extensions in Visual Studio Code
How to Do Versioning of Shared Library
Effects of Removing All Symbol Table and Relocation Information from an Executable
How to Get Eclipse Swt Browser Component Running on Ubuntu 11.04 (Natty Narwhal) with Webkit