unshare --pid /bin/bash - fork cannot allocate memory
The error is caused by the PID 1 process exits in the new namespace.
After bash start to run, bash will fork several new sub-processes to do somethings. If you run unshare without -f, bash will have the same pid as the current "unshare" process. The current "unshare" process call the unshare systemcall, create a new pid namespace, but the current "unshare" process is not in the new pid namespace. It is the desired behavior of linux kernel: process A creates a new namespace, the process A itself won't be put into the new namespace, only the sub-processes of process A will be put into the new namespace. So when you run:
unshare -p /bin/bash
The unshare process will exec /bin/bash, and /bin/bash forks several sub-processes, the first sub-process of bash will become PID 1 of the new namespace, and the subprocess will exit after it completes its job. So the PID 1 of the new namespace exits.
The PID 1 process has a special function: it should become all the orphan processes' parent process. If PID 1 process in the root namespace exits, kernel will panic. If PID 1 process in a sub namespace exits, linux kernel will call the disable_pid_allocation function, which will clean the PIDNS_HASH_ADDING flag in that namespace. When linux kernel create a new process, kernel will call alloc_pid function to allocate a PID in a namespace, and if the PIDNS_HASH_ADDING flag is not set, alloc_pid function will return a -ENOMEM error. That's why you got the "Cannot allocate memory" error.
You can resolve this issue by use the '-f' option:
unshare -fp /bin/bash
If you run unshare with '-f' option, unshare will fork a new process after it create the new pid namespace. And run /bin/bash in the new process. The new process will be the pid 1 of the new pid namespace. Then bash will also fork several sub-processes to do some jobs. As bash itself is the pid 1 of the new pid namespace, its sub-processes can exit without any problem.
unshare command doesn't create new PID namespace
Solution
you should add --fork
and --mount-proc
switch to unshare
as stated in the man page
-f, --fork
Fork the specified program as a child process of unshare rather than running it directly. This is useful
when creating a new PID namespace. Note that when unshare is waiting for the child process, then it
ignores SIGINT and SIGTERM and does not forward any signals to the child. It is necessary to send
signals to the child process.
Explanation (from man pid_namespaces
)
a process's PID namespace membership is determined when the process is created and cannot be changed thereafter.
what unshare
actually does when you supply --pid
is setting the file descriptor at /proc/[PID]/ns/pid_for_children
for the current process to the new PID namespace, causing children subsequently created by this process to be places in a different PID namespace (its children not itself!! important!).
So, when you supply --fork
to unshare
, it will fork your program (in this case busybox sh
) as a child process of unshare and place it in the new PID namespace.
Why do I need --mount-proc
?
Try running unshare with only --pid
and --fork
and let's see what happen.
wendel@gentoo-grill ~ λ sudo unshare --pid --fork busybox sh
/home/wendel # echo $$
1
/home/wendel # ps
PID USER TIME COMMAND
12443 root 0:00 unshare --pid --fork busybox sh
12444 root 0:00 busybox sh
24370 root 0:00 {ps} busybox sh
.
.
. // bunch more
from echo $$
we can see that the pid is actually 1 so we know that we must be in the new PID namespace, but when we run ps
we see other processes as if we are still in the parent PID namespace.
This is because of /proc
is a special filesystem called procfs
that kernel created in memory, and from the man page.
A
/proc
filesystem shows (in the/proc/[pid]
directories) only processes visible in the PID namespace of the process that performed the mount, even if the/proc
filesystem is viewed from processes in other namespaces.
So, in order for tools such as ps
to work correctly, we need to re-mount /proc
using a process in the new namespace.
But, assuming that your process is in the root mount namespace, if we re-mount /proc
, this will mess up many things for other processes in the same mount namespace, because now they can't see anything (in /proc
). So you should also put your process in new mount namespace too.
Good thing is unshare has --mount-proc
.
--mount-proc[=mountpoint]
Just before running the program, mount the proc filesystem at mountpoint (default is /proc). This is useful when creating a new PID namespace. It also implies creating a new mount namespace since the /proc mount would
otherwise mess up existing programs on the system. The new proc filesystem is explicitly mounted as private (with MS_PRIVATE|MS_REC).
Let's verify that --mount-proc
also put your process in new mount namespace.
bash outside:
wendel@gentoo-grill ~ λ ls -go /proc/$$/ns/{user,mnt,pid}
lrwxrwxrwx 1 0 Aug 9 10:05 /proc/17011/ns/mnt -> 'mnt:[4026531840]'
lrwxrwxrwx 1 0 Aug 9 10:10 /proc/17011/ns/pid -> 'pid:[4026531836]'
lrwxrwxrwx 1 0 Aug 9 10:10 /proc/17011/ns/user -> 'user:[4026531837]'
busybox:
wendel@gentoo-grill ~ λ doas ls -go /proc/16436/ns/{user,mnt,pid}
lrwxrwxrwx 1 0 Aug 9 10:05 /proc/16436/ns/mnt -> 'mnt:[4026533479]'
lrwxrwxrwx 1 0 Aug 9 10:04 /proc/16436/ns/pid -> 'pid:[4026533481]'
lrwxrwxrwx 1 0 Aug 9 10:17 /proc/16436/ns/user -> 'user:[4026531837]'
Notice that their user namespace is the same but mount and pid aren't.
Note: You can see that I cited a lot from man pages. If you want to learn more about linux namespaces (or anything unix really) first thing for you to do is to read the man page of each namespace. It is well written and really informative.
unshare user namespace, fork, map uid then execvp failing
Pretty sure you've already found the answer, but this is a minimal sample I could come up with:
// gcc -Wall -std=c11
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <sched.h>
#include <sys/wait.h>
#include <stdlib.h>
#include <stdarg.h>
void write_to_file(const char *which, const char *format, ...) {
FILE * fu = fopen(which, "w");
va_list args;
va_start(args, format);
if (vfprintf(fu, format, args) < 0) {
perror("cannot write");
exit(1);
}
fclose(fu);
}
int main(int argc, char ** argv) {
// array of strings, terminated with NULL entry
char **cmd_and_args = (char**) calloc(argc, sizeof(char*));
for (int i = 1 ; i < argc; i++) {
cmd_and_args[i-1] = argv[i];
}
uid_t uid = getuid();
gid_t gid = getgid();
// first unshare
if (0 != unshare(CLONE_NEWUSER)) {
fprintf(stderr, "%s\n", "USER unshare has failed");
exit(1);
}
// remap uid
write_to_file("/proc/self/uid_map", "0 %d 1", uid);
// deny setgroups (see user_namespaces(7))
write_to_file("/proc/self/setgroups", "deny");
// remap gid
write_to_file("/proc/self/gid_map", "0 %d 1", gid);
// exec the command
if (execvp(cmd_and_args[0], cmd_and_args) < 0) {
perror("cannot execvp");
exit(1);
}
// unreachable
free(cmd_and_args);
return 0;
}
Trying to awk a file, but cannot allocate sufficient memory. Any alternatives or adjustments?
It sounds like your grep command might not be able to deal with files larger than 2.4 GB because the 32 bit pointer can't access them.
Try running
split --line-bytes=2GB file1.pileup
This will split your file into two pieces that you should be able to process as you'd like.
bash: fork: Cannot allocate memory
I also faced this issue with my Ubuntu 14.04 desktop.
free -m
Even these basic command showed Can't allocate memory error.
On investigating, found that system is using all the memory for Caching and is not freeing up memory.
This is called Cache Ballooning and solved this by clearing the cache.
grantpt report error after unshare
Since I've had the same issue I have also looked into this. Here are my findings:
grantpt(3)
tries to ensure that the slave pseudo terminal has its group set to the special tty
group (or whatever TTY_GROUP
is when compiling glibc):
static int tty_gid = -1;
if (__glibc_unlikely (tty_gid == -1))
{
char *grtmpbuf;
struct group grbuf;
size_t grbuflen = __sysconf (_SC_GETGR_R_SIZE_MAX);
struct group *p;
/* Get the group ID of the special `tty' group. */
if (grbuflen == (size_t) -1L)
/* `sysconf' does not support _SC_GETGR_R_SIZE_MAX.
Try a moderate value. */
grbuflen = 1024;
grtmpbuf = (char *) __alloca (grbuflen);
__getgrnam_r (TTY_GROUP, &grbuf, grtmpbuf, grbuflen, &p);
if (p != NULL)
tty_gid = p->gr_gid;
}
gid_t gid = tty_gid == -1 ? __getgid () : tty_gid;
/* Make sure the group of the device is that special group. */
if (st.st_gid != gid)
{
if (__chown (buf, uid, gid) < 0)
goto helper;
}
See https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/grantpt.c;h=c04c85d450f9296efa506121bcee022afda3e2dd;hb=HEAD#l137.
On my system, the tty
group is 5. However, that group isn't mapped into your user namespace and the chown(2)
fails because the GID 5 doesn't exist. glibc then falls back to executing the pt_chown
helper, which also fails. I haven't looked into the details of why it fails, but I assume it's because it's setuid nobody unless you mapped the root user to your user namespace. Here's strace output that shows the failing operation:
[pid 30] chown("/dev/pts/36", 1000, 5) = -1 EINVAL (Invalid argument)
The gives you a couple of methods to work around this problem:
- Map the required groups (i.e.
tty
), which may not be possible withoutCAP_SYS_ADMIN
in the binary that opens the user namespace - Use subuids and subgids together with
newuidmap(1)
andnewgidmap(1)
to make these groups available (this might work, but I haven't tested it). - Make changes that avoid the failure of the
chown(2)
call, e.g. by using a mount namespace and changing the GID of thetty
group in/etc/groups
to your user's GID. - Avoid the
chown(2)
call, e.g. by making thest.st_gid != gid
check false; this should be possible by deleting thetty
group from your target mount namespace's/etc/groups
. Of course, that may cause other problems.
Related Topics
Signal Handling in Asm: Why am I Receiving Sigsegv When Invoking the Sys_Pause Syscall
Removing Sensitive Data from Git. "Fatal: Ambiguous Argument 'Rm'"
Compiler Can't Find Libxml/Parser.H
Multiplication with Expr in Shell Script
How to Call Accept() for One Socket from Several Threads Simultaneously
Why Using Pipe for Sort (Linux Command) Is Slow
Mixing Static Libraries and Shared Libraries
Using Bash Script to Feed Input to Command Line
Where Is the Stack Memory Allocated from for a Linux Process
Specifying Non-Standard Baud Rate for Ftdi Virtual Serial Port Under Linux
Read a File and Split Each Line into Multiple Variables
Less Gets Keyboard Input from Stderr
Match a String That Contains a Newline Using Sed
Aws Lambda Permission Denied When Trying to Use Ffmpeg
Linux Bash: Setting Iptables Rules to Allow Both Active and Passive Ftp