Mmap Flag Map_Uninitialized Not Defined

mmap flag MAP_UNINITIALIZED not defined

In order to understand what to do about the fact that #include <sys/mman.h> does not define MAP_UNINITIALIZED, it is helpful to understand how the interface to the kernel is defined.

To build a kernel module, you will need the kernel headers used to build the kernel for the exact version of the kernel for which you wish to build the module. As you wish to run in userspace, you won't need these.

The headers that define the kernel API for userspace are largely in /usr/include/linux and /usr/include/asm (see this for how they are generated). One of the more important consumers of these headers is the C standard library, e.g., glibc, which must be built against some version of these headers. Since the linux kernel API is backwards compatible, you may have a glibc (or other library implementation) built against an older version of these headers than the kernel you are running. I'm by no means an expert on how all the various distros distribute glibc, but it is my impression that the kernel headers defining its userspace API are generally the version that glibc has been built against.

Finally, glibc defines its API through headers also installed under /usr/include such as /usr/include/sys. I don't know exactly what, if any, backward or forward compatibility is provided for applications built with older or newer glibc headers, but I'm guessing that the library .so version number gets bumped when backward comparability would be broken.

So now we can understand your problem to be that the glibc headers don't actually define MAP_UNINITIALIZED for the distros/versions that you tried.

However, the linux kernel API has exposed MAP_UNINITIALIZED, as this patch demonstrates. If the glibc headers don't define it for you, you can use the linux kernel API headers and #include <linux/mman.h> if this defines it. Note that you will still need to #include <sys/mman.h> in order to get the prototype for mmap, among other things.

If your linux kernel API headers don't define MAP_UNINITIALIZED but you have a kernel version that implements it, you can define it yourself:

 #define MAP_UNINITIALIZED 0x4000000

You don't have to worry that you are effectively using "newer" headers than your glibc was built with, because the glibc implementation of mmap is very thin:

#include <sys/types.h>
#include <sys/mman.h>
#include <errno.h>
#include <sysdep.h>

#ifndef MMAP_PAGE_SHIFT
#define MMAP_PAGE_SHIFT 12
#endif

__ptr_t
__mmap (__ptr_t addr, size_t len, int prot, int flags, int fd, off_t offset)
{
  if (offset & ((1 << MMAP_PAGE_SHIFT) - 1))
    {
      __set_errno (EINVAL);
      return MAP_FAILED;
    }
  return (__ptr_t) INLINE_SYSCALL (mmap2, 6, addr, len, prot, flags, fd,
                                   offset >> MMAP_PAGE_SHIFT);
}

weak_alias (__mmap, mmap)

It is just passing your flags straight through to the kernel.

Do I have to add the length of the mapping to a pointer returned by mmap with the MAP_GROWSDOWN and MAP_STACK flags?

Yes, you have to add 65536 to the resulting pointer. Note, not 65535. Most architectures implement push(x) as *--sp = x; so having the sp above the stack is ok to start with. More importantly it has to be aligned, and 65535 is not.
The documentation appears to be wrong. I think it intends "is one page higher than the...". That better aligns with the source implementation, and the result of the little sample program below:

  #include <signal.h>
  #include <stdio.h>
  #include <unistd.h>
  #include <sys/mman.h>

  volatile int sp;

  void segv(int signo) {
          char buf[80];
          int n = snprintf(buf, 80, "(%d): sp = %#x\n", signo, sp);
          write(1, buf, n);
           _exit(1);
  }

  int main(void) {
          int N = 65535;
          signal(SIGSEGV, segv);
          signal(SIGBUS, segv);
          char *stack = (char *)mmap(NULL,
                       N,
                       PROT_READ | PROT_WRITE,
                       MAP_PRIVATE | MAP_STACK |
                         MAP_GROWSDOWN | /*MAP_UNINITIALIZED |*/
                         MAP_ANONYMOUS,
                       -1,
                       0);
          printf("stack %p\n", stack);
          for (sp = 0; sp < N; sp += 4096) {
                  if (stack[sp]) {
                          printf("stack[%d] = %x\n", sp, stack[sp]);
                  }
          }
          for (sp = 0; sp > -N; sp -= 4096) {
                  if (stack[sp]) {
                          printf("stack[%d] = %x\n", sp, stack[sp]);
                  }
          }
          return 0;
  }

which prints out:

$ ./a.out
stack 0x7f805c5fb000
(11): sp = -4096

on my system:

$ uname -a
Linux u2 4.15.0-42-generic #45-Ubuntu SMP Thu Nov 15 19:32:57 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

higher page reclaims when using munmap

The important thing to remember about mmap is that the MAP_ANONYMOUS memory must be zeroed. So what happens usually is that a kernel will map a page frame with only zeroes in there - and only when a write hits the page, a read-write mapped zero page is mapped in place.

However, this is the reason why the kernel cannot reuse the originally mapped page right away - it does not know that only the first byte of the page is dirty - instead, it must zero all 4 kiB bytes on that page before it can be given back to the process in a new anonymous mapping. Hence in both examples there are at least 1024 page faults occurring.

If the memory would not need to be zeroed, Linux for example has an extra flag called MAP_UNINITIALIZED that tells kernel that the pages need not be zeroed, but it is only available in embedded devices:

MAP_UNINITIALIZED (since Linux 2.6.33)

Don't clear anonymous pages. This flag is intended to improve
performance on embedded devices. This flag is honored only if
the kernel was configured with the

CONFIG_MMAP_ALLOW_UNINITIALIZED
option. Because of the security implications, that option
is normally enabled only on embedded devices (i.e., devices
where one has complete control of the contents of user memory).

I guess the reason for its non-availability in generic Linux kernels is because the kernel does not keep track of the process that previously had mapped the page frame, hence the page could leak information from a sensitive process.

bzeroing the page yourself would not affect performance - the kernel would not know that it was zeroed because there is no architecture that would support it in hardware - and then it is cheaper to write zeroes over the page than to check if the page is full of all zeroes and then in 99.9999999 % cases to write zeroes over it anyway.

Change user space memory protection flags from kernel module

After some more research, I found a function called get_user_pages() (best documentation I've found is here) that returns a list of pages from userspace at a given address that can be mapped to kernel space with kmap() and written to that way (in my case, using kernel_read()). This can be used as a replacement for copy_to_user() because it allows forcing write permissions on the pages retrieved. The only drawback is that you have to write page by page, instead of all in one go, but it does solve the problem I described in my question.

Mmap Flag Map_Uninitialized Not Defined