How to Disable Socket Creation for a Linux Process, for Sandboxing

How to disable socket creation for a Linux process, for sandboxing?

ptrace seems to be the most obvious tool but aside from that…

util-linux[-ng] has a command unshare, which uses the kernel's clone/unshare interfaces. If you run the new process throughunshare -n (or clone(CLONE_NEWNET)), any network sockets it creates are in a different namespace. That doesn't solve the kernel resource issue but it does sandbox the process.

The Linux kernel also supports seccomp, a mode entered with prctl(PR_SET_SECCOMP, 1) which prevents the process (well, thread, really) from calling any syscalls other than read, write, exit, and sigreturn. It's a pretty effective sandbox but difficult to use with unmodified code.

You can define a SELinux domain which disallows socket/bind/etc. calls, and perform a dynamic transition into that type. This (obviously) requires a system with an actively enforcing SELinux policy. (Possibly similar things are possible with AppArmor and TOMOYO, but I'm not very familiar with any of them.)

Sandboxing in Linux

Along with the other sugestions you might find this useful.

http://www.eelis.net/geordi/

This is from http://codepad.org/about, codepad.org's about page.

Run an untrusted C program in a sandbox in Linux that prevents it from opening files, forking, etc.?

I have used Systrace to sandbox untrusted programs both interactively and in automatic mode. It has a ptrace()-based backend which allows its use on a Linux system without special privileges, as well as a far faster and more poweful backend which requires patching the kernel.

It is also possible to create a sandbox on Unix-like systems using chroot(1), although that is not quite as easy or secure. Linux Containers and FreeBSD jails are a better alternative to chroot. Another alternative on Linux is to use a security framework like SELinux or AppArmor, which is what I would propose for production systems.

We would be able to help you more if you told as what exactly it is that you want to do.

EDIT:

Systrace would work for your case, but I think that something based on the Linux Security Model like AppArmor or SELinux is a more standard, and thus preferred, alternative, depending on your distribution.

EDIT 2:

While chroot(1) is available on most (all?) Unix-like systems, it has quite a few issues:

It can be broken out of. If you are going to actually compile or run untrusted C programs on your system, you are especially vulnerable to this issue. And if your students are anything like mine, someone WILL try to break out of the jail.
You have to create a full independent filesystem hierarchy with everything that is necessary for your task. You do not have to have a compiler in the chroot, but anything that is required to run the compiled programs should be included. While there are utilities that help with this, it's still not trivial.
You have to maintain the chroot. Since it is independent, the chroot files will not be updated along with your distribution. You will have to either recreate the chroot regularly, or include the necessary update tools in it, which would essentially require that it be a full-blown Linux distribution. You will also have to keep system and user data (passwords, input files e.t.c.) synchronized with the host system.
chroot() only protects the filesystem. It does not prevent a malicious program from opening network sockets or a badly-written one from sucking up every available resource.

The resource usage problem is common among all alternatives. Filesystem quotas will prevent programs from filling the disk. Proper ulimit (setrlimit() in C) settings can protect against memory overuse and any fork bombs, as well as put a stop to CPU hogs. nice(1) can lower the priority of those programs so that the computer can be used for any tasks that are deemed more important with no problem.

Prevent process from opening new file descriptor on Linux but allow receiving file descriptors via sockets

What you have here is exactly the use case of seccomp.

Using seccomp, you can filter syscalls in different ways. What you want to do in this situation is, right after fork(), to install a seccomp filter that disallows the use of open(2), openat(2), socket(2) (and more).
To accomplish this, you can do the following:

First, create a seccomp context using seccomp_init(3) with the default behavior of SCMP_ACT_ALLOW.
Then add a rule to the context using seccomp_rule_add(3) for each syscall that you want to deny. You can use SCMP_ACT_KILL to kill the process if the syscall is attempted, SCMP_ACT_ERRNO(val) to make the syscall fail returning the specified errno value, or any other action value defined in the manual page.
Load the context using seccomp_load(3) to make it effective.

Before continuing, NOTE that a blacklist approach like this one is in general weaker than a whitelist approach. It allows any syscall that is not explicitly disallowed, and could result in a bypass of the filter. If you believe that the child process you want to execute could be maliciously trying to avoid the filter, or if you already know which syscalls will be needed by the children, a whitelist approach is better, and you should do the opposite of the above: create filter with the default action of SCMP_ACT_KILL and allow the needed syscalls with SCMP_ACT_ALLOW. In terms of code the difference is minimal (the whitelist is probably longer, but the steps are the same).

Here's an example of the above (I'm doing exit(-1) in case of error just for simplicity's sake):

#include <stdlib.h>
#include <seccomp.h>

static void secure(void) {
    int err;
    scmp_filter_ctx ctx;

    int blacklist[] = {
        SCMP_SYS(open),
        SCMP_SYS(openat),
        SCMP_SYS(creat),
        SCMP_SYS(socket),
        SCMP_SYS(open_by_handle_at),
        // ... possibly more ...
    };

    // Create a new seccomp context, allowing every syscall by default.
    ctx = seccomp_init(SCMP_ACT_ALLOW);
    if (ctx == NULL)
        exit(-1);

    /* Now add a filter for each syscall that you want to disallow.
       In this case, we'll use SCMP_ACT_KILL to kill the process if it
       attempts to execute the specified syscall. */

    for (unsigned i = 0; i < sizeof(blacklist) / sizeof(blacklist[0]); i++) {
        err = seccomp_rule_add(ctx, SCMP_ACT_KILL, blacklist[i], 0);
        if (err)
            exit(-1);
    }

    // Load the context making it effective.
    err = seccomp_load(ctx);
    if (err)
        exit(-1);
}

Now, in your program, you can call the above function to apply the seccomp filter right after the fork(), like this:

child_pid = fork();
if (child_pid == -1)
    exit(-1);

if (child_pid == 0) {
    secure();

    // Child code here...

    exit(0);
} else {
    // Parent code here...
}

A few important notes on seccomp:

A seccomp filter, once applied, cannot be removed or altered by the process.
If fork(2) or clone(2) are allowed by the filter, any child processes will be constrained by the same filter.
If execve(2) is allowed, the existing filter will be preserved across a call to execve(2).
If the prctl(2) syscall is allowed, the process is able to apply further filters.

Socket locks up when killing a process ran with elevated permissions on Linux

You said it: "the IPC socket".

I guess that's not a TCP socket. If zeromq is creating a System V IPC object as root, the user cannot reuse it and that's why the permission error: IPC objects don't get destroyed by the process death and have user ownership and permissions.

You can list the existing IPC objects with the command ipcs, remove them with ipcrm.

Oh yes - take care not to delete IPC objects not related to your work...

If I failed my guess, you can use the command strace to inspect which system call is actually failing to find the real culprit.

Is socket creation-deletion very expensive process?

Creating socket is cheap. Connecting it actually creates the connection, which is more or less as expensive as creating the underlying connection, specially TCP connection. TCP connection establish requires the three-way TCP handshake steps. Keeping connections live costs mainly memory and connections. Network connections are a resource limited by the operation systems (for example number of sockets on a port).

If you are using thread model additional thread creation resources needed.

I could find a useful like to your answer "Network Programming: to maintain sockets or not?" on Stackoverflow. And a useful article Boost socket performance on Linux

I think helpful to you.

What is the safest way to run an executable on Linux?

Geordi uses a combination of chroot and interception of syscalls to compile and then sandbox arbitrary code.

On MacOS, how to sandbox a daemon process?

Chromium does still use sandbox_init(), because they say Apple never provided a suitable replacement. See seatbelt.cc

But I suspect the non-deprecated way to do this is to use codesign to embed a plist of entitlements into the binary. There's not much info online about doing this though, see Mac OS app, sandbox with command line tool? and How to sandbox a command line tool?

You could also use Xcode to create a command-line tool project, enable sandboxing on it, and see what it does.

faking a filesystem / virtual filesystem

Either a chroot jail or a higher-order security mechanism such as SELinux can be used to restrict access to specific resources.

How can Linux ptrace be unsafe or contain a race condition?

The major problem is that many syscall arguments, like filenames, are passed to the kernel as userspace pointers. Any task that is allowed to run simultaneously and has write access to the memory that the pointer points to can effectively modify these arguments after they are inspected by your supervisor and before the kernel acts on them. By the time the kernel follows the pointer, the pointed-to contents may have been deliberately changed by another schedulable task (process or thread) with access to that memory. For example:

Thread 1                           Supervisor             Thread 2
-----------------------------------------------------------------------------------------------------
strcpy(filename, "/dev/null");
open(filename, O_RDONLY);
                                   Check filename - OK
                                                          strcpy(filename, "/home/user/.ssh/id_rsa");
(in kernel) opens "/home/user/.ssh/id_rsa"

One way to stop this is to disallow calling clone() with the CLONE_VM flag, and in addition prevent any creation of writeable MAP_SHARED memory mappings (or at least keep track of them such that you deny any syscall that tries to directly reference data from such a mapping). You could also copy any such argument into a non-shared bounce-buffer before allowing the syscall to proceed. This will effectively prevent any threaded application from running in the sandbox.

The alternative is to SIGSTOP every other process in the traced group around every potentially dangerous syscall, wait for them to actually stop, then allow the syscall to proceed. After it returns, you then SIGCONT them (unless they were already stopped). Needless to say, this may have a significant performance impact.

(There are also analogous problems with syscall arguments that are passed on the stack, and with shared open file tables).

How to Disable Socket Creation for a Linux Process, for Sandboxing