Collecting the Data for a Partiulcar Process from Pmu for Every 1 Milli Second

perf_event_open - how to monitoring multiple events

That's a bit tricky.

We create first counter as usual. Additionally, we pass PERF_FORMAT_GROUP and PERF_FORMAT_ID to be able to work with multiple counters simultaneously. This counter will be our group leader.

struct perf_event_attr pea;
int fd1, fd2;
uint64_t id1, id2;

memset(&pea, 0, sizeof(struct perf_event_attr));
pea.type = PERF_TYPE_HARDWARE;
pea.size = sizeof(struct perf_event_attr);
pea.config = PERF_COUNT_HW_CPU_CYCLES;
pea.disabled = 1;
pea.exclude_kernel = 1;
pea.exclude_hv = 1;
pea.read_format = PERF_FORMAT_GROUP | PERF_FORMAT_ID;
fd1 = syscall(__NR_perf_event_open, &pea, 0, -1, -1, 0);

Next, we retrieve identifier for the first counter:

ioctl(fd1, PERF_EVENT_IOC_ID, &id1);

Second (and all further counters) are created in the same fashion with only one exception: we pass fd1 value as group leader argument:

memset(&pea, 0, sizeof(struct perf_event_attr));
pea.type = PERF_TYPE_SOFTWARE;
pea.size = sizeof(struct perf_event_attr);
pea.config = PERF_COUNT_SW_PAGE_FAULTS;
pea.disabled = 1;
pea.exclude_kernel = 1;
pea.exclude_hv = 1;
pea.read_format = PERF_FORMAT_GROUP | PERF_FORMAT_ID;
fd2 = syscall(__NR_perf_event_open, &pea, 0, -1, fd1, 0); // <-- here
ioctl(fd2, PERF_EVENT_IOC_ID, &id2);

Next we need to declare a data structure to read multiple counters at once. You have to declare different set of fields depending on what flags you pass to perf_event_open. Manual page mentions all possible fields. In our case, we passed PERF_FORMAT_ID flag which adds id field. This will allow us to distinguish between different counters.

struct read_format {
uint64_t nr;
struct {
uint64_t value;
uint64_t id;
} values[/*2*/];
};

Now we call standard profiling ioctls:

ioctl(fd1, PERF_EVENT_IOC_RESET, PERF_IOC_FLAG_GROUP);
ioctl(fd1, PERF_EVENT_IOC_ENABLE, PERF_IOC_FLAG_GROUP);
do_something();
ioctl(fd1, PERF_EVENT_IOC_DISABLE, PERF_IOC_FLAG_GROUP);

Finally, we read the counters from group leader file descriptor. Both counters are returned in single read_format structure that we declared:

char buf[4096];
struct read_format* rf = (struct read_format*) buf;
uint64_t val1, val2;

read(fd1, buf, sizeof(buf));
for (i = 0; i < rf->nr; i++) {
if (rf->values[i].id == id1) {
val1 = rf->values[i].value;
} else if (rf->values[i].id == id2) {
val2 = rf->values[i].value;
}
}
printf("cpu cycles: %"PRIu64"\n", val1);
printf("page faults: %"PRIu64"\n", val2);

Below is the full program listing:

#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <string.h>
#include <sys/ioctl.h>
#include <linux/perf_event.h>
#include <linux/hw_breakpoint.h>
#include <asm/unistd.h>
#include <errno.h>
#include <stdint.h>
#include <inttypes.h>

struct read_format {
uint64_t nr;
struct {
uint64_t value;
uint64_t id;
} values[];
};

void do_something() {
int i;
char* ptr;

ptr = malloc(100*1024*1024);
for (i = 0; i < 100*1024*1024; i++) {
ptr[i] = (char) (i & 0xff); // pagefault
}
free(ptr);
}

int main(int argc, char* argv[]) {
struct perf_event_attr pea;
int fd1, fd2;
uint64_t id1, id2;
uint64_t val1, val2;
char buf[4096];
struct read_format* rf = (struct read_format*) buf;
int i;

memset(&pea, 0, sizeof(struct perf_event_attr));
pea.type = PERF_TYPE_HARDWARE;
pea.size = sizeof(struct perf_event_attr);
pea.config = PERF_COUNT_HW_CPU_CYCLES;
pea.disabled = 1;
pea.exclude_kernel = 1;
pea.exclude_hv = 1;
pea.read_format = PERF_FORMAT_GROUP | PERF_FORMAT_ID;
fd1 = syscall(__NR_perf_event_open, &pea, 0, -1, -1, 0);
ioctl(fd1, PERF_EVENT_IOC_ID, &id1);

memset(&pea, 0, sizeof(struct perf_event_attr));
pea.type = PERF_TYPE_SOFTWARE;
pea.size = sizeof(struct perf_event_attr);
pea.config = PERF_COUNT_SW_PAGE_FAULTS;
pea.disabled = 1;
pea.exclude_kernel = 1;
pea.exclude_hv = 1;
pea.read_format = PERF_FORMAT_GROUP | PERF_FORMAT_ID;
fd2 = syscall(__NR_perf_event_open, &pea, 0, -1, fd1 /*!!!*/, 0);
ioctl(fd2, PERF_EVENT_IOC_ID, &id2);

ioctl(fd1, PERF_EVENT_IOC_RESET, PERF_IOC_FLAG_GROUP);
ioctl(fd1, PERF_EVENT_IOC_ENABLE, PERF_IOC_FLAG_GROUP);
do_something();
ioctl(fd1, PERF_EVENT_IOC_DISABLE, PERF_IOC_FLAG_GROUP);

read(fd1, buf, sizeof(buf));
for (i = 0; i < rf->nr; i++) {
if (rf->values[i].id == id1) {
val1 = rf->values[i].value;
} else if (rf->values[i].id == id2) {
val2 = rf->values[i].value;
}
}

printf("cpu cycles: %"PRIu64"\n", val1);
printf("page faults: %"PRIu64"\n", val2);

return 0;
}

perf.data to text or csv

You should use perf record -e <event-name> ... to sample events every 1ms. It seems you are trying to read the perf.data file and organize it into human-readable data. You should use perf report if you are not aware of it. The perf report command reads the perf.data file and generates a concise execution profile. The below link should help you -

Sample analysis with perf report

You can modify the perf report output to your requirements. You can also use perf report -F to specify multiple columns in csv format.

However, in addition, perf stat does have a mechanism to collect information in a csv format using the perf stat -x command.

Edit #1:

(I am using Linux-Kernel 4.14.3 for evaluation.)

Since you want the number of events per sample taken, there are couple of things to be noted. To count the number of events per sample, you will need to know the sampling period. The sampling period gives you the number of events after which the performance counter will overflow and the kernel will record a sample. So essentially, in your case,

sampling period = number of events per sample

Now there are two ways of specifying this sampling period. Either you specify it or you do not specify it.

If while doing a perf record, you specify the sampling period.. something like this :-

perf record -e <some_event_name> -c 1000 ...

Here -c 1000 means that the sampling period is 1000. In this case, you purposefully force the system to record 1000 events per sample because the sampling period is fixed by you.

On the other hand, if you do not specify the sampling period, the system will try to record events at a default frequency of 1000 samples/sec. This means that the system will automatically change the sampling period, if need be, to maintain the frequency of 1000 samples/sec. In such a case, to determine the sampling period, you need to observe the perf.data file.

Specifically, you need to open the perf.data file using the command :

perf script -D

The output will very well look like this :-

0 0 0x108 [0x38]: PERF_RECORD_FORK(1:1):(0:0)

0x140 [0x30]: event: 3
.
. ... raw event: size 48 bytes
. 0000: 03 00 00 00 00 00 30 00 01 00 00 00 01 00 00 00 ......0.........
. 0010: 73 79 73 74 65 6d 64 00 00 00 00 00 00 00 00 00 systemd.........
. 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................

0 0 0x140 [0x30]: PERF_RECORD_COMM: systemd:1/1

0x170 [0x38]: event: 7
.
. ... raw event: size 56 bytes
. 0000: 07 00 00 00 00 00 38 00 02 00 00 00 00 00 00 00 ......8.........
. 0010: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
. 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
. 0030: 00 00 00 00 00 00 00 00 ........

You can see different types of records like PERF_RECORD_FORK and PERF_RECORD_COMM and even PERF_RECORD_MMAP. You need to specifically look out for records that begin with PERF_RECORD_SAMPLE inside the file. Like this:

14 173826354106096 0x10d40 [0x28]: PERF_RECORD_SAMPLE(IP, 0x1): 28179/28179: 0xffffffffa500d3b5 period: 3000 addr: 0
... thread: perf:28179
...... dso: [kernel.kallsyms]
perf 28179 [014] 173826.354106: cache-misses: ffffffffa500d3b5 [unknown] ([kernel.kallsyms])

As you can see, in this case the period is 3000 i.e. number of events collected between the previous sampling event and this sampling event is 3000. (i.e. number of events per sample is 3000) Note that, as I mentioned above this period might be tuned. So you need to collect all of the PERF_RECORD_SAMPLE records from the perf.data file.



Related Topics



Leave a reply



Submit