How to Change Perf_Event_Open Max Sample Rate

Maximum sampling frequency supported by perf

Linux kernel tracks of how long perf's non-maskable interrupt(NMI) handler is performing. If the sample duration exceeds a configurable threshold(perf_cpu_time_max_percent), it drops the sample rate. This allows to prevent system from hanging because it spends all of its time handling sampling process. In this case you will see the following message in kernel logs:

perf samples too long (2506 > 2500), lowering kernel.perf_event_max_sample_rate to 50000

You can disable this throttling mechanism by setting perf_cpu_time_max_percent to 0:

sysctl -w kernel.perf_cpu_time_max_percent=0

Useful links:

  • Documentation for the sysctl files perf_cpu_time_max_percent
  • Linux kernel sources
  • See also this patch which introduces this throttling mechanism

How can I sample at constant rate with perf_event_open?

I have solved the issue.

In order to be sure to be effectively sampling at a constant period of time using the perf_event system call, it is required to:

  • Specify the event sampling period with the sample_period field inside the perf_event_attr struct. As documented in perf wiki tutorial you should pay attention on that number exceeding a value representable as 32 bits, otherwise the system will silently truncates it.
  • Do not specify the sampling period with the sample_freq field inside the perf_event_attr struct. If you do so, the kernel will dynamically change the sampling period in order to achieve the desired frequency, as documented in perf wiki tutorial
  • Check with dmesg the kernel log: you should not get any kind of message. Otherwise, as suggested by Arnabjyoti Kalita, the sampling period value may be too near to the maximum set in the kernel variable perf_event_max_sample_rate. In that case the best approach is to modify the kernel variable using sudo sysctl -w kernel/perf_cpu_time_max_percent=0, such that the kernel won't by default modify the sampling period even if this may bring the computer to get stuck.
    The kernel implements this functionality to avoid becoming stuck when too many PMI (Performance Monitoring Interrupts) take too much computation time. Those interrupt are hooked by the perf_event functionality and will be triggered when the sampling event overflows out of the specified sampling period value.
    For further information about the perf_event kernel variables, consult the documentation you can find in the kernel source code at linux-VERSION/Documentation/sysctl/kernel.txt
  • Set the cpu frequency to be constant with sudo cpufreq-set -g powersave -c $((i)) such that your cpu's measurements won't be affected
    by frequency scaling.
  • Compute the differences of values of the sampling event between samples and verify it is (within a small margin error) equal to the specified sampling period.

I asked this question because I believed that getting a different number of samples is related to the kernel modifying the sampling period.

It is not. (It may be but not in my case)

The reason I get a different number of samples is because any kind of monitored thread may take, during its execution, a different time to execute.
To those 'longer' lines indeed I obtain a bigger time value.
This is due to the fact that the process may have occasionally some cache misses or different mispredicted branches.

Some futher work to be done in this direction may be:

  • Using other time events to sample the process and monitor the differences
  • Using other counters, detect what's the cause of a longer execution time for certain runs

If someone finds out new informations about this, please comment/modify the answer or contact me.

perf_event_open and PERF_COUNT_HW_INSTRUCTIONS

The perf_event_open manpage http://man7.org/linux/man-pages/man2/perf_event_open.2.html
says about PERF_COUNT_HW_INSTRUCTIONS:

PERF_COUNT_HW_INSTRUCTIONS Retired instructions. Be careful, these can be affected by various issues, most notably hardware interrupt counts.

I think this means that COUNT_HW_INSTRUCTIONS can be used (and it is supported almost everywhere). But exact values of COUNT_HW_INSTRUCTIONS for some code fragment may be slightly different in several runs due to noise from interrupts or another logic.

So it is safe to use events PERF_COUNT_HW_INSTRUCTIONS and PERF_COUNT_HW_CPU_CYCLES on most CPU. perf_events subsystem in Linux kernel will map COUNT_HW_CPU_CYCLES to some raw events more suitable to currently used CPU and its PMU.

Depending on your goals you should try to get some statistics on PERF_COUNT_HW_INSTRUCTIONS values for your code fragment. You can also check stability of this counter with several runs of perf stat with some simple program:

perf stat -e cycles:u,instructions:u /bin/echo 123
perf stat -e cycles:u,instructions:u /bin/echo 123
perf stat -e cycles:u,instructions:u /bin/echo 123

Or use integrated repeat function of perf stat:

perf stat --repeat 10 -e cycles:u,instructions:u /bin/echo 123

I have +-10 instructions events variation (less than 0.1%) for 200 thousands total instructions executed, so it is very stable. For cycles I have 5% variation, so it should be cycles event marked with careful warning.

perf_event_open always returns -1

From the lack of anything relevant in dmesg and sysfs, it should hopefully now be apparent that the PMU isn't being described to the kernel. Thus perf events doesn't know anything about the hardware event you're asking for, so it's little surprise that it fails to open it. What you need to do is make sure the kernel does know about the PMU, so that the driver picks it up - said driver should already be built-in via CONFIG_HW_PERF_EVENTS, which is on by default with CONFIG_PERF_EVENTS and doesn't look to be disabled in your config, but it might be worth double-checking.

It looks like the PMU is described in the devicetree in their 3.18 kernel, so my best guess is that your board might be booting using the legacy boardfile rather than FDT. I don't know much about Raspberry Pi specifics, but judging by this fairly exhaustive article (I'd say skip directly to section 3.1), it seems relatively straightforward to reconfigure the bootloader to use FDT.

What is the default behavior of perf record?

The default event is cycles, as can be seen by running perf script after perf record. There, you can also see that the default sampling behavior is time-based, since the number of cycles is not constant. The default frequency is 4000 Hz, which can be seen in the source code and checked by comparing the file size or number of samples to a recording where -F 4000 was specified.

The perf wiki says that the rate is 1000 Hz, but this is not true anymore for kernels newer than 3.4.



Related Topics



Leave a reply



Submit