Run Command in Dockerfile Produces Different Result Than Manually Running Same Commands Inside Container

RUN command in dockerfile produces different result than manually running same commands inside container

First of all, a little bit of background: the platform detection script which runs during the build uses uname(1) utility (thus uname(2) system call) to identify the hardware it runs on:

root@6e4b69adfd4c:/gcc-4.8.5# grep 'uname -m' config.guess 
UNAME_MACHINE=`(uname -m) 2>/dev/null` || UNAME_MACHINE=unknown

On your 64-bit machine uname -m returns x86_64. However, there is a system call which allows overriding this result: personality(2). When the process calls personality(2), it and its subsequent forks (children) start seeing the fake results when calling uname(2). So, there is the possibility to ask the kernel to provide the fake hardware information in uname(2).

The base image you use (jnickborys/i386-ubuntu:12.04) contains the 32-bit binaries and defines the entrypoint /usr/bin/linux32, which calls personality(PER_LINUX32) to ask the kernel to pretend that it runs on 32-bit hardware and to return i686 in uname(2) (this may be checked using docker inspect and strace respectively). This makes possible to pretend that the containerized process runs in 32-bit environment.

What is the difference between executing the build in RUN directive and manually in the container?

When you execute the build in RUN, Docker does not use the entrypoint to run the commands. It uses what is specified in the SHELL directive instead (default is /bin/sh -c). This means that the personality of the shell running the build is not altered, and it (and the child processes) sees the real hardware information - x86_64, thus, you get x86_64-unknown-linux-gnu build system type in 32-bit environment and the build fails.

When you run the build manually in the container (e.g. after starting it using docker run -it jnickborys/i386-ubuntu:12.04 and then performing the same steps as in the Dockerfile), the entrypoint is called, thus, the personality is altered, and the kernel starts reporting that it runs on 32-bit hardware (i686), so you get i686-pc-linux-gnu build system type, and the build runs correctly.

How to fix this? Depends on what do you want. If your goal is to build gcc for 64-bit environment, just use the 64-bit base image. If you want to build for 32-bit environment, one of your options is to alter the SHELL being used for RUNs before these RUNs:

SHELL ["/usr/bin/linux32", "/bin/sh", "-c"]

This will make Docker execute RUNs with altered personality, so, the build system type will be detected correctly (i686-pc-linux-gnu) and the build will succeed. If required, you may change the SHELL back to /bin/sh -c after the build.

Multiple RUN vs. single chained RUN in Dockerfile, which is better?

When possible, I always merge together commands that create files with commands that delete those same files into a single RUN line. This is because each RUN line adds a layer to the image, the output is quite literally the filesystem changes that you could view with docker diff on the temporary container it creates. If you delete a file that was created in a different layer, all the union filesystem does is register the filesystem change in a new layer, the file still exists in the previous layer and is shipped over the networked and stored on disk. So if you download source code, extract it, compile it into a binary, and then delete the tgz and source files at the end, you really want this all done in a single layer to reduce image size.

Next, I personally split up layers based on their potential for reuse in other images and expected caching usage. If I have 4 images, all with the same base image (e.g. debian), I may pull a collection of common utilities to most of those images into the first run command so the other images benefit from caching.

Order in the Dockerfile is important when looking at image cache reuse. I look at any components that will update very rarely, possibly only when the base image updates and put those high up in the Dockerfile. Towards the end of the Dockerfile, I include any commands that will run quick and may change frequently, e.g. adding a user with a host specific UID or creating folders and changing permissions. If the container includes interpreted code (e.g. JavaScript) that is being actively developed, that gets added as late as possible so that a rebuild only runs that single change.

In each of these groups of changes, I consolidate as best I can to minimize layers. So if there are 4 different source code folders, those get placed inside a single folder so it can be added with a single command. Any package installs from something like apt-get are merged into a single RUN when possible to minimize the amount of package manager overhead (updating and cleaning up).


Update for multi-stage builds:

I worry much less about reducing image size in the non-final stages of a multi-stage build. When these stages aren't tagged and shipped to other nodes, you can maximize the likelihood of a cache reuse by splitting each command to a separate RUN line.

However, this isn't a perfect solution to squashing layers since all you copy between stages are the files, and not the rest of the image meta-data like environment variable settings, entrypoint, and command. And when you install packages in a linux distribution, the libraries and other dependencies may be scattered throughout the filesystem, making a copy of all the dependencies difficult.

Because of this, I use multi-stage builds as a replacement for building binaries on a CI/CD server, so that my CI/CD server only needs to have the tooling to run docker build, and not have a jdk, nodejs, go, and any other compile tools installed.

Difference between RUN and CMD in a Dockerfile

RUN is an image build step, the state of the container after a RUN command will be committed to the container image. A Dockerfile can have many RUN steps that layer on top of one another to build the image.

CMD is the command the container executes by default when you launch the built image. A Dockerfile will only use the final CMD defined. The CMD can be overridden when starting a container with docker run $image $other_command.

ENTRYPOINT is also closely related to CMD and can modify the way a container is started from an image.

Run a script in Dockerfile

RUN and ENTRYPOINT are two different ways to execute a script.

RUN means it creates an intermediate container, runs the script and freeze the new state of that container in a new intermediate image. The script won't be run after that: your final image is supposed to reflect the result of that script.

ENTRYPOINT means your image (which has not executed the script yet) will create a container, and runs that script.

In both cases, the script needs to be added, and a RUN chmod +x /bootstrap.sh is a good idea.

It should also start with a shebang (like #!/bin/sh)

Considering your script (bootstrap.sh: a couple of git config --global commands), it would be best to RUN that script once in your Dockerfile, but making sure to use the right user (the global git config file is %HOME%/.gitconfig, which by default is the /root one)

Add to your Dockerfile:

RUN /bootstrap.sh

Then, when running a container, check the content of /root/.gitconfig to confirm the script was run.

Running command with make gives different result from running it directly in shell

You are running the same command, but in different shells. Your interactive shell is probably bash. But the shell make uses is /bin/sh which is a POSIX standard shell (often).

The special handling of ~ in an argument is a shell feature: it's not embedded in programs like docker or ssh. And, it's not defined in POSIX; it's an additional feature that some shells, like bash.

On my system:

bash$ echo foo=~
foo=/home/me

bash$ /bin/sh

$ echo foo=~
foo=~

To be portable you should use the full pathname or $HOME instead (remember that in a make recipe you have to double the $ to escape it from make: $$HOME).

yum dependency resolution behaves differently in docker build vs docker run

The reason is that the package manager relies on the information provided by the kernel (via uname(2)) to decide which versions of packages (for which target architecture) should it install. Though your base image has i386 environment inside, you still run the build on x86_64 kernel, so things become a bit tricky.

When you run the container using docker run, you pass through the entrypoint linux32 - a small program which asks the kernel to pretend that it runs on i386 hardware. However, when you run docker build, the entrypoint is not used by RUNs, so yum sees that it runs on x86_64 kernel, hence the mess with platforms. You may check this answer for more detailed explanation; the issue is pretty similar.

To build your image correctly (installing only i386 packages), run yum and other architecture-sensitive commands in RUNs under linux32, e.g.:

RUN linux32 yum update -y

Docker -it command is related to a Docker image or Docker container

As it says in the documentation for docker run:

Docker runs processes in isolated containers. A container is a process which runs on a host. The host may be local or remote. When an operator executes docker run, the container process that runs is isolated in that it has its own file system, its own networking, and its own isolated process tree separate from the host.

After the process is finished, the container will be shut down.

As for your question whether it "refers to an image or a container" - you give the image as an argument to create the container, then runs the process in the created container.

The lifecycle of a Docker container is:

  • docker run imagename -> create container x from image imagename
  • docker exec x ls -> execute command ls in running container x
  • docker stop x -> stop container (but still visible in docker container ls -a)
  • docker start x -> restart container x
  • docker stop x -> stop container xagain
  • docker rm x -> remove container x (now also ls -a won't show it)


Related Topics



Leave a reply



Submit