Can Perf-Stat Results Be Generated from a Perf.Data File

perf.data to text or csv

You should use perf record -e <event-name> ... to sample events every 1ms. It seems you are trying to read the perf.data file and organize it into human-readable data. You should use perf report if you are not aware of it. The perf report command reads the perf.data file and generates a concise execution profile. The below link should help you -

Sample analysis with perf report

You can modify the perf report output to your requirements. You can also use perf report -F to specify multiple columns in csv format.

However, in addition, perf stat does have a mechanism to collect information in a csv format using the perf stat -x command.

Edit #1:

(I am using Linux-Kernel 4.14.3 for evaluation.)

Since you want the number of events per sample taken, there are couple of things to be noted. To count the number of events per sample, you will need to know the sampling period. The sampling period gives you the number of events after which the performance counter will overflow and the kernel will record a sample. So essentially, in your case,

sampling period = number of events per sample

Now there are two ways of specifying this sampling period. Either you specify it or you do not specify it.

If while doing a perf record, you specify the sampling period.. something like this :-

perf record -e <some_event_name> -c 1000 ...

Here -c 1000 means that the sampling period is 1000. In this case, you purposefully force the system to record 1000 events per sample because the sampling period is fixed by you.

On the other hand, if you do not specify the sampling period, the system will try to record events at a default frequency of 1000 samples/sec. This means that the system will automatically change the sampling period, if need be, to maintain the frequency of 1000 samples/sec. In such a case, to determine the sampling period, you need to observe the perf.data file.

Specifically, you need to open the perf.data file using the command :

perf script -D

The output will very well look like this :-

0 0 0x108 [0x38]: PERF_RECORD_FORK(1:1):(0:0)

0x140 [0x30]: event: 3
.
. ... raw event: size 48 bytes
. 0000: 03 00 00 00 00 00 30 00 01 00 00 00 01 00 00 00 ......0.........
. 0010: 73 79 73 74 65 6d 64 00 00 00 00 00 00 00 00 00 systemd.........
. 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................

0 0 0x140 [0x30]: PERF_RECORD_COMM: systemd:1/1

0x170 [0x38]: event: 7
.
. ... raw event: size 56 bytes
. 0000: 07 00 00 00 00 00 38 00 02 00 00 00 00 00 00 00 ......8.........
. 0010: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
. 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
. 0030: 00 00 00 00 00 00 00 00 ........

You can see different types of records like PERF_RECORD_FORK and PERF_RECORD_COMM and even PERF_RECORD_MMAP. You need to specifically look out for records that begin with PERF_RECORD_SAMPLE inside the file. Like this:

14 173826354106096 0x10d40 [0x28]: PERF_RECORD_SAMPLE(IP, 0x1): 28179/28179: 0xffffffffa500d3b5 period: 3000 addr: 0
... thread: perf:28179
...... dso: [kernel.kallsyms]
perf 28179 [014] 173826.354106: cache-misses: ffffffffa500d3b5 [unknown] ([kernel.kallsyms])

As you can see, in this case the period is 3000 i.e. number of events collected between the previous sampling event and this sampling event is 3000. (i.e. number of events per sample is 3000) Note that, as I mentioned above this period might be tuned. So you need to collect all of the PERF_RECORD_SAMPLE records from the perf.data file.

Perf Stat vs Perf Record

First of all, your test case of using sleep and page-faults is not an ideal test case. There should be no page fault events during the sleep duration, you you can't really expect anything interesting. For the sake of easier reasoning I suggest to use the ref-cycles (hardware) event and a busy workload such as awk 'BEGIN { while(1){} }'.

Question 1: It is my understanding that perf stat gets a "summary" of
counts but when used with the -I option gets the counts at the
specified millisecond interval. With this option does it sum up the
counts over the interval or get the average over the interval, or
something else entirely? I assume it is summed up.

Yes. The values are just summed up. You can confirm that by testing:

$ perf stat -e ref-cycles -I 1000 timeout 10s awk 'BEGIN { while(1){} }'
# time counts unit events
1.000105072 2,563,666,664 ref-cycles
2.000267991 2,577,462,550 ref-cycles
3.000415395 2,577,211,936 ref-cycles
4.000543311 2,577,240,458 ref-cycles
5.000702131 2,577,525,002 ref-cycles
6.000857663 2,577,156,088 ref-cycles

[ ... snip ... ]
[ Note that it may not be as nicely consistent on all systems due dynamic frequency scaling ]

$ perf stat -e ref-cycles -I 3000 timeout 10s awk 'BEGIN { while(1){} }'
# time counts unit events
3.000107921 7,736,108,718 ref-cycles
6.000265186 7,732,065,900 ref-cycles
9.000372029 7,728,302,192 ref-cycles

Question 2: Why doesn't perf stat -e <event1> -I 1000 sleep 5 give
about the same counts as if I summed up the counts over each second
for the following command perf record -e <event1> -F 1000 sleep 5?

perf stat -I is in milliseconds, whereas perf record -F is in HZ (1/s), so the corresponding command to perf stat -I 1000 is perf record -F 1. In fact with our more stable event/workload, this looks better:

$ perf stat -e ref-cycles -I 1000 timeout 10s awk 'BEGIN { while(1){} }'
# time counts unit events
1.000089518 2,578,694,534 ref-cycles
2.000203872 2,579,866,250 ref-cycles
3.000294300 2,579,857,852 ref-cycles
4.000390273 2,579,964,842 ref-cycles
5.000488375 2,577,955,536 ref-cycles
6.000587028 2,577,176,316 ref-cycles
7.000688250 2,577,334,786 ref-cycles
8.000785388 2,577,581,500 ref-cycles
9.000876466 2,577,511,326 ref-cycles
10.000977965 2,577,344,692 ref-cycles
10.001195845 466,674 ref-cycles

$ perf record -e ref-cycles -F 1 timeout 10s awk 'BEGIN { while(1){} }'
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.008 MB perf.data (17 samples) ]

$ perf script -F time,period
3369070.273722: 1
3369070.273755: 1
3369070.273911: 3757
3369070.273916: 3015133
3369070.274486: 1
3369070.274556: 1
3369070.274657: 1778
3369070.274662: 2196921
3369070.275523: 47192985748
3369072.663696: 2578692405
3369073.663547: 2579122382
3369074.663609: 2580015300
3369075.664085: 2579873741
3369076.664433: 2578638211
3369077.664379: 2578378119
3369078.664175: 2578166440
3369079.663896: 2579238122

So you see, eventually the results are stable also for perf record -F. Unfortunately the documentation of perf record is very poor. You can learn what the settings -c and -F mean by looking at the documentation of the underlying system call man perf_event_open:

sample_period, sample_freq A "sampling" event is one that
generates an overflow notification every N events, where N is given by
sample_period. A sampling event has sample_period > 0. When
an overflow occurs, requested data is recorded in the mmap buffer.
The sample_type field controls what data is recorded on each
overflow.

sample_freq can be used if you wish to use frequency rather than
period. In this case, you set the freq flag. The kernel will
adjust the sampling period to try and achieve the desired rate.
The
rate of adjustment is a timer tick.

So while perf stat uses an internal timer to read the value of the counter every -i milliseconds, perf record sets an event overflow counter to take a sample every -c events. That means it takes a sample every N events (e.g. every N page-fault or cycles). With -F, it it tries to regulate this overflow value to achieve the desired frequency. It tries different values and tunes it up/down accordingly. This eventually works for counters with a stable rate, but will get erratic results for dynamic events.

Understanding the Frequency printed by Linux Perf Stat

It's just cycles / task-clock.

So it can be lower than you'd expect with perf stat --all-user or cycles:u that means cycles only counts in user-space (not interrupts or system calls), but task-clock is from the kernel's software accounting of how long the thread(s) of this process were scheduled onto CPU core(s) for.

This means it's a weighted average of the actual CPU core clock on the core(s) your thread(s) were running on.

(Related: what's the meaning of cycles annotation in perf stat but that answer suggests it should be close to your CPU's rated frequency or max turbo, which is incorrect for short processes which don't get the core ramped up to max. Or when only counting user-space cycles.)



Related Topics



Leave a reply



Submit