using Perf stat to profile both process and system-wide events simultaneously
The system-wide option -a
is an attribute of the perf command itself rather than an attribute of an event to be profiled, so you can't collect event counts per-process and system-wide using a single perf command.
However, you can run multiple perf processes simultaneously with overlapping events. For example, if you want to profile a specific process and the entire system, you can launch two perf stat
commands, one for each purpose. Internally, the perf_event subsystem schedules all the enabled events on each core. If all specified events cannot be scheduled at the same time, either because there are conflicts between the constraints of some events or the number of hardware events is larger than the number of performance monitoring countering registers, multiplexing is used. Each event from each perf stat
command requires a dedicated PMC.
On modern Intel processors, you can have 8 general-purpose PMCs per logical core instead of just 4 if you disable hyperthreading (or if HT is not supported by the processor model). This may help eliminate or reduce multiplexing.
Otherwise, I don't think there is an easy to go about this.
What does perf stat output mean?
[root@root test]# perf stat -a -e "r81d0","r82d0" -v ./a
r81d0: 71800964 1269047979 1269006431
r82d0: 26655201 1284214869 1284214869
This is output from verbose option, as defined in tools/perf/builtin-stat.c file of the kernel:
391 /*
392 * Read out the results of a single counter:
393 * aggregate counts across CPUs in system-wide mode
394 */
395 static int read_counter_aggr(struct perf_evsel *counter)
408 if (verbose) {
409 fprintf(output, "%s: %" PRIu64 " %" PRIu64 " %" PRIu64 "\n",
410 perf_evsel__name(counter), count[0], count[1], count[2]);
411 }
count is from struct perf_counts_values
, defined as http://lxr.free-electrons.com/source/tools/perf/util/evsel.h?v=3.18#L12 with array of three uint64_t values, named as val
, ena
, run
Three count
values are filled by kernel and are read from fd, opened with perf_event_open()
syscall. There is related part of man perf_event_open
: http://man7.org/linux/man-pages/man2/perf_event_open.2.html
read_format
This field specifies the format of the data returned by
read(2) on a perf_event_open() file descriptor.
PERF_FORMAT_TOTAL_TIME_ENABLED
Adds the 64-bit time_enabled field. This can be used
to calculate estimated totals if the PMU is
overcommitted and multiplexing is happening.
PERF_FORMAT_TOTAL_TIME_RUNNING
Adds the 64-bit time_running field. This can be used
to calculate estimated totals if the PMU is
overcommitted and multiplexing is happening. ...
perf stat
enables all TIME flags if scale
is true -
298 if (scale)
299 attr->read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
300 PERF_FORMAT_TOTAL_TIME_RUNNING;
So, first counter is raw event count; second is proportional to the time when this event was collected and last is proportional to the total running time. This is needed when you asks perf
to stat on high number of events, which can't be monitored at once (hardware usually has up to 5-7 performance monitors). In such case in-kernel perf will run subsets of required event for some parts of execution; and subsets will be changed several times. With ena
and run
counts, perf can estimate how inaccurate event monitoring was in case of multiplexing.
Performance counter stats for './a':
71,800,964 r81d0 [100.00%]
26,655,201 r82d0
And in your case two events were mapped in the same time without needs of multiplexing; your ena
and run
counters are close. And print_aggr
function prints their ratio:
1137 val += counter->counts->cpu[cpu].val;
1138 ena += counter->counts->cpu[cpu].ena;
1139 run += counter->counts->cpu[cpu].run;
Print_noise will output in case of -r N
option to rerun task N times to get statistics (man: --repeat=<n>
repeat command and print average + stddev (max: 100))
1176 print_noise(counter, 1.0);
And there is [100.00%]
printer:
1178 if (run != ena)
1179 fprintf(output, " (%.2f%%)",
1180 100.0 * run / ena);
It will not print 100% if both run and ena times are equal, and your r82d0 event have equal. Your r81d0 event have slightly different run and ena, so 100% is printed in one line.
I know that perf stat -d
can be inaccurate, because it asks for too much events; and there will be not 100% mulitplexing, but something like 53%. It means "this event was counted only in 53% of the program runtime in some random parts of it"; and if you program have several separated computing stages, events with low run/ena ratio will be less accurate.
Using perf probe to monitor performance stats during a particular function
I think that the instructions you are following are not yet included into the mainline Linux kernel. As a consequence, perf is telling you that the events are not supported: perf doesn't know the "toggle" mechanism mentioned on this page.
I can see two workarounds:
- If you have access to the source code you want to profile you can use the perf_event_open system call directly from your source code to start and stop counting on function entry and exit.
- Clone jolsa repository
git clone https://kernel.googlesource.com/pub/scm/linux/kernel/git/jolsa/perf
switch the core_toggle branchgit co remotes/origin/perf/core_toggle
and then compile and run the kernel with this support.
Regarding 2, I am not familiar at all with kernel versions and development and I think that this solution may be quie complex to use and maintain. Maybe you should ask on the perf users mailing list if there are any plans for the toggle feature to be integrated into the mainline kernel.
Related Topics
Shell Script Get Ctrl+Z with Trap
Accessing the Gpio (Of a Raspberry Pi) Without ''Sudo''
What Scheduling Algorithms Does Linux Kernel Use
How to Send Multicast Packets via a Specfic Interface in Linux
Check That There Are at Least Two Arguments Given in a Bash Script
Linux Run Kernel Probe Systemtap Script Failed with Semantic Error: No Match"
How to Batch Resize Millions of Images to Fit a Max Width and Height
Executing Script on Receiving Incoming Connection with Xinetd
Get Link Speed Programmatically
Keep Meteor Running on Amazon Ec2
How to Find the Main Function's Entry Point of Elf Executable File Without Any Symbolic Information
Linux: How to Detect That Ftp File Upload Is Finished
Compiling Out-Of-Tree Kernel Module Against Any Kernel Source Tree on the Filesystem
Service Doesn't Support Chkconfig
Error: Clgetplatformids -1001 When Running Opencl Code (Linux)