What Is the Fastest Way to Find All the File with the Same Inode

What is the fastest way to find all the file with the same inode?

Here's a way:

  • Use find -printf "%i:\t%p or similar to create a listing of all files prefixed by inode, and output to a temporary file
  • Extract the first field - the inode with ':' appended - and sort to bring duplicates together and then restrict to duplicates, using cut -f 1 | sort | uniq -d, and output that to a second temporary file
  • Use fgrep -f to load the second file as a list of strings to search and search the first temporary file.

(When I wrote this, I interpreted the question as finding all files which had duplicate inodes. Of course, one could use the output of the first half of this as a kind of index, from inode to path, much like how locate works.)

On my own machine, I use these kinds of files a lot, and keep them sorted. I also have a text indexer application which can then apply binary search to quickly find all lines that have a common prefix. Such a tool ends up being quite useful for jobs like this.

I have two same files in different directories with same inodes

I had a similar query. The Answer is long but hopefully, it will help you:

2 files can have the same inode, but only if they are part of different partitions.

Inodes are only unique on a partition level, not on the whole system.
On each partition, there is a superblock. This superblock tells the system which inodes are used, which are free, etc (I'll spare you the technical details).

Each item on the disk -so files, but also directories, Fifo pipes and special device files- each have their own inode. All the inodes are stored on the disk, right beside the superblock (normally).

For instance in the case of regular files, inodes simply contain some informations like the last access/modification times, the size of the files, the file permissions, the disk blocks it occupies, etc.

For directories, inodes tell the system where the blocks that contain the contents of the directory are stored, as well as the last access/modified dates and permission on the directory.

You can see this if you look at the size of a directory via "ls -ld dir". It is usually an multiple of the size of a disk block (512kB or 1MB usually). The contents of directory blocks are nothing more than a list of name-inode pairs (ie a filename + it's inode number). When you do an "ls", those contents get printed, without the need for the system to actually locate all files in the directory. If you access a file or subdirectory, the system simply looks up the inode number from the directory contents and then retrieves the inode in question, so that you can have fast access to the file/subdirectory in question.

So, you can immediately see that the inode information describes directly what's on a partition to the system. It is the core of the filesystem on your partition.

Unless you are using special software like LVM to make the system believe that multiple physical disks or multiple partitions act as one partition, each partition needs to know it's own contents. Otherwise, you would never be able to share disks for instance via NFS mounting (each computer that shares the disk must know it's contents).

Continuing this line of thought, it is logical that inodes are unique on the (logical) partition level.

To answer your question on hard links. You first need to know what the difference is between a hard and a soft link (or symbolic link). Let's say you want to link A to B, regardless of what A and B really are (files, directories, device files, etc). Thus creating two ways of accessing the same item on your filesystem.

Using a soft link is easy. A and B both have their own inodes and both are part of different directories.

However, A actually contains the full path and name of B. When your system tries to access A, it will see the reference to B (via the full path), locate B by following the path and then access it. Since the full filesystem path is used, soft links work across different partitions. If A is indeed a soft link to B on a different partition and B's partition is unmounted, then the link will continue to exist but will simply point to something unreachable. So you can't access A either. Same goes when B gets deleted. If A (the soft link) would be deleted, then B is still there, unaltered.

Hard links are a different story. As I explained, the contents of a directory are nothing more than pairs of inode numbers and names. The inodes are used to access the actual items in the directory. The names are just there for the ease-of-use, more or less. A hard link is nothing more than copying the inode number from one entry in some directory's contents into another entry. This second entry can be in a different directory's contents or even in the same directory (under a different name).

Since both directory entries have the same inode number, they point to the same item on the disk (ie the same physical file). Of course, inode numbers, as explained above, are partition-specific. So, duplicating an inode number on a different partition would not work as expected. That's why hard links cannot work across partitions.

Internally, the inodes of items such as files and directories also contain a link counter. This counter holds the number of (hard) links to the item. When you delete the item (using "rm" for instance), you internally "unlink" it (hence the term "unlink" instead of "delete" or "remove" that you see in some shells, like Perl). Unlinking simply decreases the link counter in the inode and deletes the entry in the directory's contents list. If the link counter drops to 0 (the last link is deleted), then the disk blocks occupied by the item get freed. When new items are created on the disk later on, they may use the freed blocks and overwrite them. In other words, when the last link is deleted, the item becomes unreachable (as if it was deleted from the disk). So, as long as the last link isn't deleted the contents of the item (file/directory) are still accessible and usable via the remaining hard links.

Symbolic links are created by "ln -s", hard links via simple "ln". See ln's man pages for details.

Find all hard links of a certain file

Go through each file in the directory and lstat() it. If its inode number (st_ino) is the same as the one of the file you're interested in, and they both have the same link count (st_nlink) which is greater than 1, then they're hard-linked together.

(The link count check isn't strictly necessary, but it's a good sanity check.)

How to check whether two file names point to the same physical file

On linux, open both files, and use fstat to check if st_ino (edit:) and st_dev are the same. open will follow symbolic links. Don't use stat directly, to prevent race conditions.

What is the fastest way to detect file size is not zero without knowing the file descriptor?

You should probably benchmark it for yourself.

I've measured

//Real-time System-time
272.58 ns(R) 170.11 ns(S) //lseek
366.44 ns(R) 366.28 ns(S) //fstat
812.77 ns(R) 711.69 ns(S) //stat("/etc/profile",&sb)

on my Linux laptop. It fluctuates a little between runs but lseek is usually a bunch of ns faster than fstat, but you also need a fd for it and opening is quite expensive at about 1.6µs, so stat is probably the best choice for your case.


As tom-karzes has noted, stat should dependent on the number of directory components in the path. I tried it on a PATH_MAX long "/foo/foo/.../foo" directory and there I'm getting about 80µs.

Fast way to find the number of files in one directory on Linux

Why should the data structure contain the number? A tree doesn't need to know its size in O(1), unless it's a requirement (and providing that, could require more locking and possibly a performance bottleneck)

By tree I don't mean including subdir contents, but files with -maxdepth 1 -- supposing they are not really stored as a list..

edit: ext2 stored them as a linked list.

modern ext3 implements hashed B-Trees

Having said that, /bin/ls does a lot more than counting, and actually scans all the inodes. Write your own C program or script using opendir() and readdir().

from here:

#include <stdio.h>
#include <sys/types.h>
#include <dirent.h>
int main()
{
int count;
struct DIR *d;
if( (d = opendir(".")) != NULL)
{
for(count = 0; readdir(d) != NULL; count++);
closedir(d);
}
printf("\n %d", count);
return 0;
}


Related Topics



Leave a reply



Submit