Use Perf Inside a Docker Container Without --Privileged

Use perf inside a docker container without --privileged

After some research, the problem is not with the perf_event_paranoid, but with the fact that perf_event_open (syscall) has been blacklisted in docker:
https://docs.docker.com/engine/security/seccomp/ "Docker v17.06: Seccomp security profiles for Docker"

Significant syscalls blocked by the default profile

perf_event_open Tracing/profiling syscall, which could leak a lot of information on the host.

My first work-around for this is to have a script that downloads the official seccomp file https://github.com/moby/moby/blob/master/profiles/seccomp/default.json, and adds perf_event_open to the list of white-listed syscalls.

I then start docker with --security-opt seccomp=my-seccomp.json

Can I run Docker-in-Docker without using the --privileged flag

Unfortunately no, you must use the --privileged flag to run Docker in Docker, you can take a look at the official announcement where they state this is one of the many purposes of the --privileged flag.

Basically, you need more access to the host system devices to run docker than you get when running without --privileged.

Docker Alpine and perf not getting along in docker container

The problem is that Docker by default blocks a list of system calls, including perf_event_open, which perf relies heavily on.

Official docker reference: https://docs.docker.com/engine/security/seccomp/

Solution:

  • Download the standard seccomp(secure compute) file for docker. It's a json file.
  • Find "perf_event_open", it only appears once, and delete it.
  • Add a new entry in syscalls section:

    { "names": [ "perf_event_open" ], "action": "SCMP_ACT_ALLOW" },

  • Add the following to your command to run the container:
    --security-opt seccomp=path/to/default.json

That did it for me.

How to use perf tool with docker running stress-ng?

Carrying on from comments by @osgx,

As is mentioned here, by default, the perf stat command will monitor not only all the threads of the process to be monitored, but also its child processes and threads.

The problem in this situation is that by running perf stat and monitoring the docker run stress-ng command, you are not monitoring the actual stress-ng process. It is important to note that, the processes running as part of the container, will actually not be started by the docker client, but rather by the docker-containerd-shim process (which is a grandchild process of the dockerd process).

If you run the docker command to run stress-ng inside the container and observe the process-tree, it becomes evident.

docker run -ti --name=stress-ng --rm polinux/stress-ng --cpu 2 --timeout 100

ps -elf | grep docker

0 S ubuntu 26379 114001 0 80 0 - 119787 futex_ 12:33 pts/3 00:00:00 docker run -ti --name=stress-ng --rm polinux/stress-ng --cpu 2 --timeout 10000
4 S root 26431 118477 0 80 0 - 2227 - 12:33 ? 00:00:00 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/72a8c2787390669ff4eeae6f343ab4f9f60434f39aae66b1a778e78b7e5e45d8 -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
0 S ubuntu 26610 26592 0 80 0 - 3236 pipe_w 12:34 pts/6 00:00:00 grep --color=auto docker
4 S root 118453 1 3 80 0 - 283916 - May02 ? 01:01:57 /usr/bin/dockerd -H fd://
4 S root 118477 118453 4 80 0 - 457853 - May02 ? 01:14:36 docker-containerd --config /var/run/docker/containerd/containerd.toml

----------------------------------------------------------------------

ps -elf | grep stress-ng

0 S ubuntu 26379 114001 0 80 0 - 119787 futex_ 12:33 pts/3 00:00:00 docker run -ti --name=stress-ng --rm polinux/stress-ng --cpu 2 --timeout 10000
4 S root 26455 26431 0 80 0 - 16621 - 12:33 pts/0 00:00:00 /usr/bin/stress-ng --cpu 2 --timeout 10000
1 R root 26517 26455 99 80 0 - 16781 - 12:33 pts/0 00:01:08 /usr/bin/stress-ng --cpu 2 --timeout 10000
1 R root 26518 26455 99 80 0 - 16781 - 12:33 pts/0 00:01:08 /usr/bin/stress-ng --cpu 2 --timeout 10000
0 S ubuntu 26645 26592 0 80 0 - 3236 pipe_w 12:35 pts/6 00:00:00 grep --color=auto stress-ng

The PPID of the first stress-ng process is 26431, which is not the docker run command, but actually the docker-containerd-shim process. Monitoring the docker run command will never reflect correct values, because the docker client is completely detached from the process of starting the stress-ng commands.

  • One way to get around this problem would be to attach the perf stat command to the PIDs of the stress-ng processes that are started by the docker runtime.

eg, as in the above case, once the docker run command is started, you can immediately start doing this -

perf stat -p 26455,26517,26518

Performance counter stats for process id '26455,26517,26518':

148171.516145 task-clock (msec) # 1.939 CPUs utilized
49 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
67 page-faults # 0.000 K/sec

You may increase the --timeout a little bit so that the command runs longer, since you are now starting perf stat post starting stress-ng. Also you have to account for a small fraction of the initial measuring time lost.

  • The other way would be to run perf stat inside the docker container, something like a docker run perf stat ..., but for that you would have to start providing privileges to your container, since, by default, the perf_event_open system call is blacklisted in docker. You can read this answer here.

Slow performance using /dev/random in docker desktop WSL2

Before applying any of these solutions, check if missing of entropy is your real problem ... To do that execute these commands (in your docker host and in your container):

cat /proc/sys/kernel/random/entropy_avail

It should return a number greater that 1000 ...

dd if=/dev/random of=/dev/null bs=1024 count=1 iflag=fullblock

It should return fast! (Sources: haveged and rng-tools)

Solutions:

For Windows Users (those of you that run DockerDestop for Windows):

  1. Keep using the WSL1 engine with Docker Desktop.
  2. If the previous solution is not possible, execute this:

docker pull harbur/haveged

docker run --privileged -d harbur/haveged

Explanation: This will run a docker container that executes the haveged daemon/process as CMD. Such process, plus --privileged flag, will feed your host /dev/random with entropy, avoiding blocking issues.

For Linux users (those running Linux as docker host):

  1. Map as a volume/mount-point your host's /dev/urandom to your container's /dev/random. This will trick your container, and when it use /dev/random, it will be using your host's /dev/urandom, which never blocks by design. People may argue that's insecure, but that is out the scope of this question.

  2. Install in your docker host, a software that increments the entropy pool, like haveged or rng-tools (if you have a hardware TRNG)

Final thoughts and conclusions:

  1. /dev/random and /dev/urandom in a docker container point to /dev/random and /dev/urandom of the docker host. I don't have any documentation that backups this, except these: Missing Entropy and How docker handles /dev/(u)random request ... and the experimental fact that if I access the WSL2 docker-desktop-distro (using wsl -d docker-desktop) and I execute the dd command described previously, I can see how the entropy is reduced both in the host and the container (and viceversa) ... This is why using solutions, like deploying the haveged container or installing haveged in the docker host, work.

  2. According to haveged link, such software is deprecated because its logic is now included in linux kernels v5.6 ... This could mean that if your docker host is running a Linux Kernel equals or greater to the version 5.6, you won't need to do anything of this because /dev/random will never block.

  3. I tried to install haveged in the WSL2 docker distro (docker-desktop), but such distro does not allow you to execute apt-get ...



Related Topics



Leave a reply



Submit