Data Pointers in Inode Data Structure

data pointers in inode data structure

It is up to a specific file system to access its data, so there's no "data pointers" in general (some file systems may be virtual, that means generating their data on the fly or retrieving it from network).

If you're interested in ext4, you can look up the ext4-specific inode structure (struct ext4_inode) in fs/ext4/ext4.h, where data of an inode is indeed referenced by indices of 12 direct blocks, 1 of single indirection, 1 of double indirection and 1 of triple indirection.

This means that blocks [0..11] of an inode's data have numbers e4inode->i_block[0/1/.../11], whereas e4inode->i_block[12] is a number of a block which is filled with data block numbers itself (so it holds indices of inode's data blocks in range [12, 12 + fs->block_size / sizeof(__le32)]. The same trick is applied to i_block[13], only it holds double-indirected indices (blocks filled with indices of blocks that hold list of blocks holding the actual data) starting from index 12 + fs->block_size / sizeof(__le32), and i_block[14] holds triple indirected indices.

Understanding the concept of Inodes

The pointers referred to are disk block addresses - each pointer contains the information necessary to identify a block on disk. Since each disk block is at least 512 bytes (sometimes 4096 or 8192 bytes), using 32-bit addresses the disk can address up to 512 * 4 * 10243 = 2 TiB (Tebibytes - more commonly called Terabytes) assuming 1/2 KiB blocks; correspondingly larger sizes as the block size grows (so 32 TiB at 8 KiB block size). For an addressing scheme for larger disks, you would have to move to larger block sizes or larger disk addresses - hence 48-bit or 64-bit addresses might be plausible.

So, to answer Q1, 32-bits is a common size for lots of things. Very often, when 32 bits are no longer big enough, the next sensible size is 64 bits.

Answering Q2:

  • With 8 KiB data blocks, if the file is 96 KiB or smaller, then it uses 12 blocks or less on disk, and all those block addresses are stored directly in the inode itself.

  • When the file grows bigger, the disk driver allocates a single indirect block, and records that in the inode. When the driver needs to get a block, it reads the indirect block into memory, and then finds the address for the block it needs from the indirect block. Thus, it requires (nominally) two reads to get to the data, though of course the indirect tends to be cached in memory.

  • With an 8 KiB block size and 4-byte disk addresses, you can fit 2048 disk addresses in the single indirect block. So, for files from 96 KiB + 1 byte to 16 MiB or so, there is only a single indirect block.

  • If a file grows still bigger, then the driver allocates a double indirect block. Each pointer in the double indirect block points to a single indirect block. So, you can have 2048 more indirect blocks, each of which can effectively point at 16 MiB, leading to files of up to 32 GiB (approx) being storable.

  • If a file grows still larger, then the driver allocates a triple indirect block. Each of the 2048 pointers in a triple indirect block points to a double block. So, under the 32-bit addressing scheme with 32-bit addresses, files up to about 64 TiB could be addressed. Except that you've run out of disk addresses before that (32 TiB maximum because of the 32-bit addresses to 8 KiB blocks).

So, the inode structure can handle files bigger than 32-bit disk addresses can handle.

I'll leave it as an exercise for the reader to see how things change with 64-bit disk addresses.

The concept of Inodes and block sizes

I'm sure google could have given you a very complete answer far more quickly and easily than asking it here, and indeed even here I get 3,145 search results for posts containing "inode indirect blocks", but since you did ask here, here goes a reply:

Well, an inode structure, on disk, has room for only a certain number of block addresses, along with all the other information it must contain, if it is to fit inside one block itself.

In the case of SysV inodes there's room for 40 bytes of data block addresses, and that was broken down into 13 3-byte addresses and one byte left over for the "file generation number" (which you can ignore here).

So, you have 13 addresses, how are you going to use them efficiently to address file data blocks for files that contain many more than just 13 data blocks?

The decision was to use the first 10 as direct addresses -- i.e. they directly identify which block is the 1-10'th data block of the file. The 11'th, 12'th, and 13'th addresses point to indirect blocks: a single indirect block, a double-indirect block, and a triple-indirect block respectively.

As the question notes, each indirect block can hold 256 addresses. So, you just have to multiply them out and add them up, considering that the first indirect block's set of addresses point directly at data blocks, and the double-indirect blocks point first at more blocks of data-block addresses, and the triple-indirect blocks point at blocks of pointers to more blocks of pointers to data-block addresses.

This page has a nice diagram, and in this case perhaps your confusion will not be cleared up without such a diagram. Note this page talks about details that differ slightly from the strict SysV on-disk format (it has more direct blocks, and :

Understanding Indirect Blocks in Unix File Systems

Are file names actually pointers to their respective data stored?

I might suggest reading the following links which explain the concepts of files, filesystems, and inodes.

To quickly summarize however, your intuition is correct that a filename is not directly linked to a file's data. A filename is associated with an inode, a data structure which contains metadata information about the file and points to where the file's the data can be found on disk. Renaming a file simply changes the name associated with the file's inode; the inode itself does not change and still points to the same data on disk.

struct inode in system has number of blocks & block numbers but no corresponding data offset byte in storage device.HowFilesystem evenWorks without it

Short sketch for finding inode number ii:

  • find the inode block where ii lives: ii/InodesPerBlock; use this as an index into the inodeblocks.
  • find the index into this block : ii%InodesPerBlock
  • treat (cast) this location as an Inode, and use the first entry in the blocks[] array as the number of the first data block.

For finding a file, this operation must be precededed by a similar operation for finding the directory entry and finding the file's inodeNumber

NOTE: there are a lot of manifest constants, these can all be found in the superblock

  • Block size
  • filesystem Type
  • size of a blockNumber
  • number of Inode Blocks (or: number of Inodes)
  • size of an inode (or: InodesPerBlock)
  • Number of Data Blocks
  • Location of the inode containing the root Directory
  • Location of the freelist containing the unused block numbers
  • State/backup/...
  • et cetera ...

NOTE2: this is a simplified scheme. Modern file systems may contain additional structures for efficiency or redundancy. Also: a filesystem may combine more than one physical block into one logical block (e.g. 8*512 -->> 4096)

NOTE3: the superblock is located at blocknumber=0 (at least in UFS) That means that 0 can be used as sentinel value for blocknumbers referring to actual (non-root) blocks. So, the blocknumber arrays inside the inodes can be initialized to all-zeros.

NOTE4: these structures all reside on disk. The kernel may maintain (it will!) additional, similar structures in memory. These structures will refer to both themselves (using pointers, or offsets, or indices) or they will refer to disk blocks (numbers).



Related Topics



Leave a reply



Submit