Can Perf-Stat Results Be Generated from a Perf.Data File

perf.data to text or csv

You should use perf record -e <event-name> ... to sample events every 1ms. It seems you are trying to read the perf.data file and organize it into human-readable data. You should use perf report if you are not aware of it. The perf report command reads the perf.data file and generates a concise execution profile. The below link should help you -

Sample analysis with perf report

You can modify the perf report output to your requirements. You can also use perf report -F to specify multiple columns in csv format.

However, in addition, perf stat does have a mechanism to collect information in a csv format using the perf stat -x command.

Edit #1:

(I am using Linux-Kernel 4.14.3 for evaluation.)

Since you want the number of events per sample taken, there are couple of things to be noted. To count the number of events per sample, you will need to know the sampling period. The sampling period gives you the number of events after which the performance counter will overflow and the kernel will record a sample. So essentially, in your case,

sampling period = number of events per sample

Now there are two ways of specifying this sampling period. Either you specify it or you do not specify it.

If while doing a perf record, you specify the sampling period.. something like this :-

perf record -e <some_event_name> -c 1000 ...

Here -c 1000 means that the sampling period is 1000. In this case, you purposefully force the system to record 1000 events per sample because the sampling period is fixed by you.

On the other hand, if you do not specify the sampling period, the system will try to record events at a default frequency of 1000 samples/sec. This means that the system will automatically change the sampling period, if need be, to maintain the frequency of 1000 samples/sec. In such a case, to determine the sampling period, you need to observe the perf.data file.

Specifically, you need to open the perf.data file using the command :

perf script -D

The output will very well look like this :-

0 0 0x108 [0x38]: PERF_RECORD_FORK(1:1):(0:0)

0x140 [0x30]: event: 3
.
. ... raw event: size 48 bytes
.  0000:  03 00 00 00 00 00 30 00 01 00 00 00 01 00 00 00  ......0.........
.  0010:  73 79 73 74 65 6d 64 00 00 00 00 00 00 00 00 00  systemd.........
.  0020:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

0 0 0x140 [0x30]: PERF_RECORD_COMM: systemd:1/1

0x170 [0x38]: event: 7
.
. ... raw event: size 56 bytes
.  0000:  07 00 00 00 00 00 38 00 02 00 00 00 00 00 00 00  ......8.........
.  0010:  02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
.  0020:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
.  0030:  00 00 00 00 00 00 00 00                          ........

You can see different types of records like PERF_RECORD_FORK and PERF_RECORD_COMM and even PERF_RECORD_MMAP. You need to specifically look out for records that begin with PERF_RECORD_SAMPLE inside the file. Like this:

14 173826354106096 0x10d40 [0x28]: PERF_RECORD_SAMPLE(IP, 0x1): 28179/28179: 0xffffffffa500d3b5 period: 3000 addr: 0
 ... thread: perf:28179
 ...... dso: [kernel.kallsyms]
            perf 28179 [014] 173826.354106: cache-misses:  ffffffffa500d3b5 [unknown] ([kernel.kallsyms])

As you can see, in this case the period is 3000 i.e. number of events collected between the previous sampling event and this sampling event is 3000. (i.e. number of events per sample is 3000) Note that, as I mentioned above this period might be tuned. So you need to collect all of the PERF_RECORD_SAMPLE records from the perf.data file.

Perf Stat vs Perf Record

First of all, your test case of using sleep and page-faults is not an ideal test case. There should be no page fault events during the sleep duration, you you can't really expect anything interesting. For the sake of easier reasoning I suggest to use the ref-cycles (hardware) event and a busy workload such as awk 'BEGIN { while(1){} }'.

Question 1: It is my understanding that perf stat gets a "summary" of
counts but when used with the -I option gets the counts at the
specified millisecond interval. With this option does it sum up the
counts over the interval or get the average over the interval, or
something else entirely? I assume it is summed up.

Yes. The values are just summed up. You can confirm that by testing:

$ perf stat -e ref-cycles -I 1000 timeout 10s awk 'BEGIN { while(1){} }'
#           time             counts unit events
 1.000105072      2,563,666,664      ref-cycles                                                  
 2.000267991      2,577,462,550      ref-cycles                                                  
 3.000415395      2,577,211,936      ref-cycles                                                  
 4.000543311      2,577,240,458      ref-cycles                                                  
 5.000702131      2,577,525,002      ref-cycles                                                  
 6.000857663      2,577,156,088      ref-cycles                                                  

[ ... snip ... ]
[ Note that it may not be as nicely consistent on all systems due dynamic frequency scaling ]

$ perf stat -e ref-cycles -I 3000 timeout 10s awk 'BEGIN { while(1){} }' 
#           time             counts unit events
 3.000107921      7,736,108,718      ref-cycles                                                  
 6.000265186      7,732,065,900      ref-cycles                                                  
 9.000372029      7,728,302,192      ref-cycles

Question 2: Why doesn't perf stat -e <event1> -I 1000 sleep 5 give
about the same counts as if I summed up the counts over each second
for the following command perf record -e <event1> -F 1000 sleep 5?

perf stat -I is in milliseconds, whereas perf record -F is in HZ (1/s), so the corresponding command to perf stat -I 1000 is perf record -F 1. In fact with our more stable event/workload, this looks better:

$ perf stat -e ref-cycles -I 1000 timeout 10s awk 'BEGIN { while(1){} }'
#           time             counts unit events
 1.000089518      2,578,694,534      ref-cycles                                                  
 2.000203872      2,579,866,250      ref-cycles                                                  
 3.000294300      2,579,857,852      ref-cycles                                                  
 4.000390273      2,579,964,842      ref-cycles                                                  
 5.000488375      2,577,955,536      ref-cycles                                                  
 6.000587028      2,577,176,316      ref-cycles                                                  
 7.000688250      2,577,334,786      ref-cycles                                                  
 8.000785388      2,577,581,500      ref-cycles                                                  
 9.000876466      2,577,511,326      ref-cycles                                                  
10.000977965      2,577,344,692      ref-cycles                                                  
10.001195845            466,674      ref-cycles    

$ perf record -e ref-cycles -F 1 timeout 10s awk 'BEGIN { while(1){} }'
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.008 MB perf.data (17 samples) ]

$ perf script -F time,period        
3369070.273722:          1 
3369070.273755:          1 
3369070.273911:       3757 
3369070.273916:    3015133 
3369070.274486:          1 
3369070.274556:          1 
3369070.274657:       1778 
3369070.274662:    2196921 
3369070.275523: 47192985748 
3369072.663696: 2578692405 
3369073.663547: 2579122382 
3369074.663609: 2580015300 
3369075.664085: 2579873741 
3369076.664433: 2578638211 
3369077.664379: 2578378119 
3369078.664175: 2578166440 
3369079.663896: 2579238122

So you see, eventually the results are stable also for perf record -F. Unfortunately the documentation of perf record is very poor. You can learn what the settings -c and -F mean by looking at the documentation of the underlying system call man perf_event_open:

sample_period, sample_freq A "sampling" event is one that
generates an overflow notification every N events, where N is given by
sample_period. A sampling event has sample_period > 0. When
an overflow occurs, requested data is recorded in the mmap buffer.
The sample_type field controls what data is recorded on each
overflow.

sample_freq can be used if you wish to use frequency rather than
period. In this case, you set the freq flag. The kernel will
adjust the sampling period to try and achieve the desired rate. The
rate of adjustment is a timer tick.

So while perf stat uses an internal timer to read the value of the counter every -i milliseconds, perf record sets an event overflow counter to take a sample every -c events. That means it takes a sample every N events (e.g. every N page-fault or cycles). With -F, it it tries to regulate this overflow value to achieve the desired frequency. It tries different values and tunes it up/down accordingly. This eventually works for counters with a stable rate, but will get erratic results for dynamic events.

Understanding the Frequency printed by Linux Perf Stat

It's just cycles / task-clock.

So it can be lower than you'd expect with perf stat --all-user or cycles:u that means cycles only counts in user-space (not interrupts or system calls), but task-clock is from the kernel's software accounting of how long the thread(s) of this process were scheduled onto CPU core(s) for.

This means it's a weighted average of the actual CPU core clock on the core(s) your thread(s) were running on.

(Related: what's the meaning of cycles annotation in perf stat but that answer suggests it should be close to your CPU's rated frequency or max turbo, which is incorrect for short processes which don't get the core ramped up to max. Or when only counting user-space cycles.)

Can Perf-Stat Results Be Generated from a Perf.Data File

perf.data to text or csv

Perf Stat vs Perf Record

Understanding the Frequency printed by Linux Perf Stat

Related Topics

Leave a reply