What Is an Anonymous Inode in Linux

What is an anonymous inode in Linux?

At least in some contexts, an anonymous inode is an inode without an attached directory entry. The easiest way to create such an inode is as such:

int fd = open( "/tmp/file", O_CREAT | O_RDWR, 0666 );
unlink( "/tmp/file" );
// Note that the descriptor fd now points to an inode that has no filesystem entry; you
// can still write to it, fstat() it, etc. but you can't find it in the filesystem.

What are memory mapped page and anonymous page?

The correct terms are memory mapped files and anonymous mappings. When referring to memory mapping, one is usually referring to mmap(2). There are 2 categories for using mmap. One category is SHARED vs PRIVATE mappings. The other category is FILE vs ANONYMOUS mappings. Mixed together you get the 4 following combinations:

  1. PRIVATE FILE MAPPING
  2. SHARED FILE MAPPING
  3. PRIVATE ANONYMOUS MAPPING
  4. SHARED ANONYMOUS MAPPING

A File Mapping specifies a file, on disk, that will have N many bytes mapped into memory. The function mmap(2) takes, as its 4th argument a file descriptor to the file to be mapped into memory. The 5th argument is the number of bytes to be read in, as an offset. The typical process of using mmap to create a memory mapped file goes

  1. open(2) file to get a file descriptor.
  2. fstat(2) the file to get the size from the file descriptor data structure.
  3. mmap(2) the file using the file descriptor returned from open(2).
  4. close(2) the file descriptor.
  5. do whatever to the memory mapped file.

When a file is mapped in as PRIVATE, changes made are not committed to the underlying file. It is a PRIVATE, in-memory copy of the file. When a file is mapped SHARED, changes made are committed to the underlying file by the kernel automatically. Files mapped in as shared can be used for what is called Memory Mapped I/O, and IPC. You would use a memory mapped file for IPC instead of a shared memory segment if you need the persistence of the file

If you use strace(1) to watch a process initialize, you will notice that the different sections of the file are mapped in using mmap(2) as private file mappings. The same is true for system libs.

Examples of output from strace(1) where mmap(2) is being used to map in libraries to the process.

open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=42238, ...}) = 0
mmap(NULL, 42238, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7ff7ca71e000
close(3) = 0
open("/lib64/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p\356\341n8\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1926760, ...}) = 0
mmap(0x386ee00000, 3750152, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x386ee00000

Anonymous mappings are not backed by a file. To be specific, the 4th (file descriptor) and 5th (offset) argument of mmap(2) are not even used when the MAP_ANONYMOUS flag is used as the 3rd argument to mmap(2). An alternative to using the MAP_ANONYMOUS flag is to use /dev/zero as the file.

The word 'Anonymous' is, to me, a poor choice in that it sounds as if the file is mapped anonymously. Instead, it is the file that is anonymous, ie. there isn't a file specified.

Uses for private anonymous mappings are few in user land programming. You could use a shared anonymous mapping so that applications could share a region of memory, but I do not know the reason why you wouldn't use SYSV or POSIX shared memory instead.

Since memory mapped in using Anonymous mappings is guaranteed to be zero filled, it could be useful for some applications that expect/require zero filled regions of memory to use mmap(2) in this way instead of the malloc(2) + memset(2) combo.

Relinking an anonymous (unlinked but open) file

A patch for a proposed Linux flink() system call was submitted several years ago, but when Linus stated "there is no way in HELL we can do this securely without major other incursions", that pretty much ended the debate on whether to add this.

Update: As of Linux 3.11, it is now possible to create a file with no directory entry using open() with the new O_TMPFILE flag, and link it into the filesystem once it is fully formed using linkat() on /proc/self/fd/fd with the AT_SYMLINK_FOLLOW flag.

The following example is provided on the open() manual page:

    char path[PATH_MAX];
fd = open("/path/to/dir", O_TMPFILE | O_RDWR, S_IRUSR | S_IWUSR);

/* File I/O on 'fd'... */

snprintf(path, PATH_MAX, "/proc/self/fd/%d", fd);
linkat(AT_FDCWD, path, AT_FDCWD, "/path/for/file", AT_SYMLINK_FOLLOW);

Note that linkat() will not allow open files to be re-attached after the last link is removed with unlink().

creating inode while creating pipe, fifo or socket

No inode will be created for an anonymous pipe or a socket, as an inode is a property of a filesystem and neither of these two lives as a filesystem entity (they don't have a file path). They only have file descriptors.

However, for named pipes (aka fifo) an inode is created as it lives as an filesystem entity.

SMAPS Un-named segment of memory

The mappings with inode number 0 are anonymous mappings - essentially those that have been created with the MAP_ANONYMOUS flag to mmap().

This just means that they aren't associated with a disk file. The inode number isn't going to change though; it'll always stay as 0 for that mapping.

Anonymous mappings don't get converted to heap. In fact "[heap]" is just a convenience marker for the anonymous mapping that is set up by the kernel at exec time and altered by the brk() system call.

Understanding Linux /proc/pid/maps or /proc/self/maps

Each row in /proc/$PID/maps describes a region of contiguous virtual memory in a process or thread. Each row has the following fields:

address           perms offset  dev   inode   pathname
08048000-08056000 r-xp 00000000 03:0c 64593 /usr/sbin/gpm
  • address - This is the starting and ending address of the region in the process's address space
  • permissions - This describes how pages in the region can be accessed. There are four different permissions: read, write, execute, and shared. If read/write/execute are disabled, a - will appear instead of the r/w/x. If a region is not shared, it is private, so a p will appear instead of an s. If the process attempts to access memory in a way that is not permitted, a segmentation fault is generated. Permissions can be changed using the mprotect system call.
  • offset - If the region was mapped from a file (using mmap), this is the offset in the file where the mapping begins. If the memory was not mapped from a file, it's just 0.
  • device - If the region was mapped from a file, this is the major and minor device number (in hex) where the file lives.
  • inode - If the region was mapped from a file, this is the file number.
  • pathname - If the region was mapped from a file, this is the name of the file. This field is blank for anonymous mapped regions. There are also special regions with names like [heap], [stack], or [vdso]. [vdso] stands for virtual dynamic shared object. It's used by system calls to switch to kernel mode. Here's a good article about it: "What is linux-gate.so.1?"

You might notice a lot of anonymous regions. These are usually created by mmap but are not attached to any file. They are used for a lot of miscellaneous things like shared memory or buffers not allocated on the heap. For instance, I think the pthread library uses anonymous mapped regions as stacks for new threads.



Related Topics



Leave a reply



Submit