Regarding Hard Link

Regarding Hard Link

Back in the days of 7th Edition (or Version 7) UNIX, there were no system calls mkdir(2) and rmdir(2). The mkdir(1) program was SUID root, and used the mknod(2) system call to create the directory and the link(2) system call to make the entries for . and .. in the new directory. The link(2) system call only allowed root to do that. Consequently, way back then (circa 1978), it was possible for the superuser to create links to directories, but only the superuser was permitted to do so to ensure that there were no problems with cycles or other missing links. There were diagnostic programs to pick up the pieces if the system crashed while a directory was partly created, for example.


You can find the Unix 7th Edition manuals at Bell Labs. Sections 2 and 3 are devoid of mkdir(2) and rmdir(2). You used the mknod(2) system call to make the directory:

NAME


mknod – make a directory or a special file

SYNOPSIS

mknod(name, mode, addr)
char *name;

DESCRIPTION


Mknod creates a new file whose name is the null-terminated string pointed to by name. The mode of
the new file (including directory and special file bits) is initialized from mode. (The protection part of
the mode is modified by the process’s mode mask; see umask(2)). The first block pointer of the i-node
is initialized from addr. For ordinary files and directories addr is normally zero. In the case of a special
file, addr specifies which special file.

Mknod may be invoked only by the super-user.

SEE ALSO


mkdir(1), mknod(1), filsys(5)

DIAGNOSTICS


Zero is returned if the file has been made; – 1 if the file already exists or if the user is not the superuser.

The entry for link(2) states:

DIAGNOSTICS


Zero is returned when a link is made; – 1 is returned when name1 cannot be found; when name2 already
exists; when the directory of name2 cannot be written; when an attempt is made to link to a directory by
a user other than the super-user; when an attempt is made to link to a file on another file system; when a
file has too many links.

The entry for unlink(2) states:

DIAGNOSTICS


Zero is normally returned; – 1 indicates that the file does not exist, that its directory cannot be written,
or that the file contains pure procedure text that is currently in use. Write permission is not required on
the file itself. It is also illegal to unlink a directory (except for the super-user).

The manual page for the ln(1) command noted:

It is forbidden to link to a directory or to link across file systems.

The manual page for the mkdir(1) command notes:

Standard entries, '.', for the directory itself, and '..'
for its parent, are made automatically.

This would not be worthy of comment were it not that it was possible to create directories without those links.


Nowadays, the mkdir(2) and rmdir(2) system calls are standard and permit any user to create and remove directories, preserving the correct semantics. There is no longer a need to permit users to create hard links to directories. This is doubly true since symbolic links were introduced - they were not in 7th Edition UNIX, but were in the BSD versions of UNIX from quite early on.


With normal directories, the .. entry unambiguously links back to the (single, solitary) parent directory. If you have two hard links (two names) for the same directory in different directories, where does the .. entry point? Presumably, to the original parent directory - and presumably there is no way to get to the 'other' parent directory from the linked directory. That's an asymmetry that can cause trouble. Normally, if you do:

chdir("./subdir");
chdir("..");

(where ./subdir is not a symbolic link), then you will be back in the directory you started from. If ./subdir is a hard link to a directory somewhere else, then you will be in a different directory from where you started after the second chdir(). You'd have to show that with a pair of stat() calls before and after the chdir() operations shown.

What is the difference between a symbolic link and a hard link?

Underneath the file system, files are represented by inodes. (Or is it multiple inodes? Not sure.)

A file in the file system is basically a link to an inode.

A hard link, then, just creates another file with a link to the same underlying inode.

When you delete a file, it removes one link to the underlying inode. The inode is only deleted (or deletable/over-writable) when all links to the inode have been deleted.

A symbolic link is a link to another name in the file system.

Once a hard link has been made the link is to the inode. Deleting, renaming, or moving the original file will not affect the hard link as it links to the underlying inode. Any changes to the data on the inode is reflected in all files that refer to that inode.

Note: Hard links are only valid within the same File System. Symbolic links can span file systems as they are simply the name of another file.

Find all paths for hard-linked files with Java

The OS doesn't provide a way to do this efficiently. What you need to do is to traverse the file system and look for all file entries with the same inode number as the one you already have.

You would do it the same way in Java.

(This Q&A explains how to get the inode number for a file in Java: Uniquely identify file in Java)

How to determine if two file are hard-linked to the same data?

After following Bennett Yeo's suggestion, I found the following :

There's no direct way to check if two files are linked to the same data but we can make our own methods by comparing the file's unique id (or inode in UNIX-based system). In my understanding, this value serves as an index to the actual content on disk.

Bennett also linked This thread, which gave me two way to get a file's unique ID:

  1. The linked answer proposed calling GetFileInformationByHandle from kernel32.dll. As the method's name implies, I must first get a handle for the file but whenever I try to get one, an exception is thrown saying that the targeted file is used by another process.
  2. And, lastly, by using the command fsutil file queryfileid <filename> (Credit to this answer).

The second method work for me, so I wrote the following code:

private static string InvokeShellAndGetOutput(string fileName, string arguments) {
Process p = new Process();
p.StartInfo.UseShellExecute = false;
p.StartInfo.RedirectStandardOutput = true;
p.StartInfo.FileName = fileName;
p.StartInfo.Arguments = arguments;
p.Start();
string output = p.StandardOutput.ReadToEnd();
p.WaitForExit();
return output;
}

public static long GetFileId(this FileInfo fileInfo) {
// Call "fsutil" to get the unique file id
string output = InvokeShellAndGetOutput("fsutil", $"file queryfileid {fileInfo.FullName}");

// Remove the following characters: "File ID is " and the EOL at the end. The remaining string is an hex string with the "0x" prefix.
string parsedOutput = output.Remove(0, 11).Trim();
return Convert.ToInt64(parsedOutput, 16); ;
}

public static bool IsHardlinkedToSameData(this FileInfo fileInfo, FileInfo otherFileInfo) {
return fileInfo.GetFileId() == otherFileInfo.GetFileId();
}

It's patchy but I feel it's already more reliable than my previous ideas. As long as the host running the test has "fsutil" installed, it should work.

Any more reliable solutions are still welcomed.

How can I programmatically distinguish hard links from real files in Windows 7?

You can't, because all files are hard links. No. Really. A file is just a hard link to a data chunk -- a listing in a directory. (Perhaps you mean symlinks? You can distinguish those...)

Use the builtin methods Windows provides for calculating used space instead.

EDIT: Reference (emphasis mine)

The link itself is only a directory entry, and does not have a security descriptor. Therefore, when you change the security descriptor of a hard link, you a change the security descriptor of the underlying file, and all hard links that point to the file allow the newly specified access.

USN journal for hard links

The journal will get an entry when you add the hard-link USN_REASON_HARD_LINK_CHANGE. Then as time goes on, any of the hard links may be opened, and changes made. The subsequent USN entries will all reference the original file's FileReferenceNumber, but will contain FileName and ParentFileReferenceNumber that depend on which link was actually opened. This is what you have available to distinguish between links. Note that it might be tempting to distinguish using only the ParentFileReferenceNumber, but this isn't really safe. While the most widely used pattern is to have the same-named link in different directories, you could have a link in the same directory but with a different name.

Note on moved links: If you choose to read the USN in "summary mode" (your READ_USN_JOURNAL_DATA_V0 has ReturnOnlyOnClose = 1), where you only read the entries that have accumulated to the point of the file closing, you can miss the USN_REASON_RENAME_OLD_NAME entries...and lose track of which link the rename was made through. This kind of USN record doesn't accumulate into the file close event...I'm guessing because of the potential collision of ParentFileReferenceNumber and FileName.

Windows hard links disk usage inconsistency

According to Harry Johnston:

"Explorer isn't the file system. The fact that Explorer doesn't take hard links into account when calculating the total size of a group of files doesn't affect the amount of disk space available. (If you look at the properties of the drive rather than of a particular set of files, Explorer asks the file system for the actual amount of space used and available on the volume. Those figures are correct.)"

This was the answer I was looking for. Thanks!

Hard link and Symbolic links in Unix

Yes, and no :-)

In UNIX, the contents of a file are distinct from the directory entries for that file. You can have multiple directory entries point to the same contents (look up inode for a description of how this works) and, here's the tricky bit:

All those directory entries are equal. Even though one may have been created first, there's nothing special about it. If you remove it, the contents don't disappear, just the directory entry. The contents will disappear once the inode has zero directory entries pointing to it (and all processes close the file - I've been bitten before by trying to clear up disk space deleting log files only to find that, because a process still has the file open, the file contents aren't recovered even though no directory entries point to them).

That's for hard links.

Soft links are a bit trickier. They do create a "file" of sorts (a separate inode), containing the path to the target file. And those links are not equal. Deleting the original will leave you with a soft link pointing nowhere.

Because inodes are unique on a given filesystem, hard links cannot refer to data on a different filesystem.

Soft links do not have that limitation since they store the path to the target file, not its inode.

The following transcript may help:

$ echo hello >f1
$ ln f1 f2
$ ln -s f1 f3
$ ls -ial f*
7385 -rw-r--r-- 2 pax None 6 May 11 14:09 f1
7385 -rw-r--r-- 2 pax None 6 May 11 14:09 f2
4672 lrwxrwxrwx 1 pax None 6 May 11 14:09 f3 -> f1
$ cat f1
hello
$ cat f2
hello
$ cat f3
hello
$ rm f1
$ ls -ial f*
7385 -rw-r--r-- 2 pax None 6 May 11 14:09 f2
4672 lrwxrwxrwx 1 pax None 6 May 11 14:09 f3 -> f1
$ cat f1
cat: f1: No such file or directory
$ cat f2
hello
$ cat f3
cat: f3: No such file or directory

I've used only the last four digits of the inode number to keep the entry short (and not hit you with inode numbers like 43910096366994672) but you can see that f1 and f2 have the exact same inode whereas f3 is different. You can also see that the contents of the file created originally as f1 survive its deletion because f2 is still referencing it.

However, because f3 is referencing the f1 name rather than its inode, you get an error trying to use it.


Aside: You've gotta love it when UNIX toys with you like this:

$ ls f*
f2 f3
$ cat f3 # What the ...?
cat: f3: No such file or directory

Almost as much fun as creating a file called spacebackspacex and then watching somebody try to delete it :-)



Related Topics



Leave a reply



Submit