Get Canonical Path from Pathname

What's a canonical path?

The whole point of making anything "canonical" is so that you can compare two things. For example, both ../../here/bar/x and ./test/../../bar/x may refer to the same location, but you can't do a textual comparison on the two paths. However, if you turn them into their canonical representation, they both become ../bar/x, and we see that they actually refer to the same thing.

In short, it is often the case that you have many ways of referring to one thing, and in that case you may be able to define a canonical representation which is unique and which allows you to get a handle on col­lections of such things.

(If you're looking for more examples, all of mathematics is full of "canonical" constructions for all sorts of objects, and very much with the same purpose in mind. Maybe this Wikipedia article can provide some ad­ditional directions.)

Getting canonical path from Path in Java

See normalize and toRealPath.

What's the difference between getPath(), getAbsolutePath(), and getCanonicalPath() in Java?

Consider these filenames:

C:\temp\file.txt - This is a path, an absolute path, and a canonical path.

.\file.txt - This is a path. It's neither an absolute path nor a canonical path.

C:\temp\myapp\bin\..\\..\file.txt - This is a path and an absolute path. It's not a canonical path.

A canonical path is always an absolute path.

Converting from a path to a canonical path makes it absolute (usually tack on the current working directory so e.g. ./file.txt becomes c:/temp/file.txt). The canonical path of a file just "purifies" the path, removing and resolving stuff like ..\ and resolving symlinks (on unixes).

Also note the following example with nio.Paths:

String canonical_path_string = "C:\\Windows\\System32\\";
String absolute_path_string = "C:\\Windows\\System32\\drivers\\..\\";

System.out.println(Paths.get(canonical_path_string).getParent());
System.out.println(Paths.get(absolute_path_string).getParent());

While both paths refer to the same location, the output will be quite different:

C:\Windows
C:\Windows\System32\drivers

Generating a canonical path

I think you can use the URI class to do this; e.g. if the path contains no characters that need escaping in a URI path component, you can do this.

String normalized = new URI(path).normalize().getPath();

If the path contains (or might contain) characters that need escaping, the multi-argument constructors will escape the path argument, and you can provide null for the other arguments.

Notes:

  1. The above normalizes a file path by treating it as a relative URI. If you want to normalize an entire URI ... including the (optional) scheme, authority, and other components, don't call getPath()!

  2. URI normalization does not involve looking at the file system as File canonicalization does. But the flip side is that normalization behaves differently to canonicalization when there are symbolic links in the path.

How do I resolve a canonical filename in Windows?

Short answer: not really.

There is no simple way to get the canonical name of a file on Windows. Local files can be available via reparse points, via SUBST. Do you want to deal with NTFS junctions? Windows shortcuts? What about \\?\-escaped filenames

Remote files can be available via mapped drive letter or via UNC. Is that the UNC to the origin server? Are you using DFS? Is the server using reparse points, etc.? Is the server available by more than one name? What about the IP address? Does it have more than one IP address?

So, if you're looking for something like the inode number on Windows, it ain't there. See, for example, this page.

How can I determine the canonical path of a file without following symbolic links?

path.toAbsolutePath().normalize() actually did the trick.

Consider we have a /var/spool/mail symlink pointing to /var/mail:

final Path path = Paths.get("/var/./spool/../spool//mail/./");
System.out.println(path.toAbsolutePath().normalize());
System.out.println(path.toRealPath(NOFOLLOW_LINKS));

In the above example, in both cases the canonical path is printed with symlinks left unresolved:

/var/spool/mail
/var/spool/mail

what's the difference between canonicalpath and absolutepath?

The difference is that there is only one canonical path to a file[1], while there can be many absolute paths to a file (depending on the system). For instance, on a Unix system, /usr/local/../bin is the same as /usr/bin. getCanonicalPath() resolves those ambiguities and returns the (unique) canonical path. So if the current directory was /usr/local, then:

File file = new File("../bin");
System.out.println(file.getPath());
System.out.println(file.getAbsolutePath());
System.out.println(file.getCanonicalPath());

would print:

../bin

/usr/local/../bin

/usr/bin

Per Voo's suggestion: on Unix systems, getCanonicalPath() will also resolve symbolic links if the symbolic link exists. Hard links are treated like normal files (which is basically what they are). Note, however, that a file need not exist for these methods to succeed.

[1] Well, not quite. As @Tom Hale points out in a comment, if the file system supports hard-linked directories, there may be multiple canonical paths to a given file.

How to obtain the absolute path of a file via Shell (BASH/ZSH/SH)?

Use realpath

$ realpath example.txt
/home/username/example.txt


Related Topics



Leave a reply



Submit