Find Broken Symlinks with Python

Find broken symlinks with Python

A common Python saying is that it's easier to ask forgiveness than permission. While I'm not a fan of this statement in real life, it does apply in a lot of cases. Usually you want to avoid code that chains two system calls on the same file, because you never know what will happen to the file in between your two calls in your code.

A typical mistake is to write something like:

if os.path.exists(path):
    os.unlink(path)

The second call (os.unlink) may fail if something else deleted it after your if test, raise an Exception, and stop the rest of your function from executing. (You might think this doesn't happen in real life, but we just fished another bug like that out of our codebase last week - and it was the kind of bug that left a few programmers scratching their head and claiming 'Heisenbug' for the last few months)

So, in your particular case, I would probably do:

try:
    os.stat(path)
except OSError, e:
    if e.errno == errno.ENOENT:
        print 'path %s does not exist or is a broken symlink' % path
    else:
        raise e

The annoyance here is that stat returns the same error code for a symlink that just isn't there and a broken symlink.

So, I guess you have no choice than to break the atomicity, and do something like

if not os.path.exists(os.readlink(path)):
    print 'path %s is a broken symlink' % path

How to override bad symlink with python

For your first question.

Because your logic is incorrect. The root cause is because you need to find the correct original file with version, instead of rebuilding the softlink.

Thus...

/etc/ecm/zookeeper-conf-3.4.6  <== is not exist

ln -s /etc/ecm/zookeeper-conf-3.4.6 /etc/ecm/zookeeper-conf <== this is not incorrect

#find the exist version zookeeper-conf-x.x.x
ln -s /etc/ecm/zookeeper-conf-x.x.x /etc/ecm/zookeeper-conf <== this is correct

For your second question, this might use shell command(ls) to solve. ls -L $symbolic_link to check the that link is valid or not. It will print No such file or directory and return non-Zero value if the link's original file is not exist.

For example:

$ ls -al
symlink1 -> ./exist.txt
symlink2 -> ./not_exist.txt
exist.txt

$ ls -L symlink1
symlink1

$ ls -L symlink2
ls: symlink2: No such file or directory

Python command line solution

def check_symlink_exist(target_file):
    process = subprocess.Popen("ls -L " + target_file, shell=True, stdout=subprocess.PIPE)
    process.wait()
    return True if process.returncode == 0 else False

Python os solution

def check_symlink_exist(target_file):
    try:
        os.stat(target_file)
        return True
    except OSError:
        return False

Create broken symlink with Python

Such error is raised when you try to create a symlink in non-existent directory. For example, the following code will fail if /tmp/subdir doesn't exist:

os.symlink('/usr/bin/python', '/tmp/subdir/python')

But this should run successfully:

src = '/usr/bin/python'
dst = '/tmp/subdir/python'

if not os.path.isdir(os.path.dirname(dst)):
    os.makedirs(os.path.dirname(dst))
os.symlink(src, dst)

Python os.walk and symlinks

As already mentioned by Antti Haapala in a comment, The script does not break on symlinks, but on broken symlinks. One way to avoid that, taking the existing script as a starting point, is using try/except:

#! /usr/bin/python2
import os
import sys

space = 0L  # L means "long" - not necessary in Python 3
for root, dirs, files in os.walk(sys.argv[1]):
    for f in files:
        fpath = os.path.join(root, f)
        try:
            space += os.stat(fpath).st_size
        except OSError:
            print("could not read "+fpath)

sys.stdout.write("Total: {:d}\n".format(space))

As a side effect, it gives you information on possible broken links.

How to check if file is a symlink in Python?

To determine if a directory entry is a symlink use this:

os.path.islink(path)

Return True if path refers to a directory entry that is a symbolic
link. Always False if symbolic links are not supported.

For instance, given:

drwxr-xr-x   2 root root  4096 2011-11-10 08:14 bin/
drwxrwxrwx   1 root root    57 2011-07-10 05:11 initrd.img -> boot/initrd.img-2..

>>> import os.path
>>> os.path.islink('initrd.img')
True
>>> os.path.islink('bin')
False

Resolving symlinks in Python without failing with missing components (like readlink -m)

os.path.realpath() does this.

$ rm -rf -- /tmp/nonexisting.XYZ     # just to be extra clear
$ orig_path=/tmp/nonexisting.XYZ/foo # on MacOS, where /tmp is at /private/tmp
$ greadlink -m "$orig_path"          # demonstrate GNU readlink output for comparison...
/private/tmp/nonexisting.XYZ/foo
$ python -c 'import os.path, sys; print(os.path.realpath(sys.argv[1]))' "$orig_path"
/private/tmp/nonexisting.XYZ/foo

How to find all symlinks in directory and its subdirectories in python

You can use a code similar to this one to achieve what you need. Directories to search are passed as arguments or current directory taken as the default. You can modify this further with the os.walk method to make it recursive.

import sys, os

def lll(dirname):
    for name in os.listdir(dirname):
        if name not in (os.curdir, os.pardir):
            full = os.path.join(dirname, name)
            if os.path.isdir(full) and not os.path.islink(full):
                lll(full)
            elif os.path.islink(full):
                print(name, '->', os.readlink(full))
def main(args):
    if not args: args = [os.curdir]
    first = 1
    for arg in args:
        if len(args) > 1:
            if not first: print()
            first = 0
            print(arg + ':')
        lll(arg)

if __name__ == '__main__':
    main(sys.argv[1:])

Ref: https://github.com/python/cpython/blob/master/Tools/scripts/lll.py

How do I get a list of symbolic links, excluding broken links?

With find(1) use

$ find . -type l -not -xtype l

This command finds links that stop being links after following - that is, unbroken links. (Note that they may point to ordinary or special files, directories or other. Use -xtype f instead of -not -xtype l to find only links pointing to ordinary files.)

$ find . -type l -not -xtype l -ls

reports where they point to.

If you frequently encounter similar questions in your interactive shell usage, zsh is your friend:

% echo **/*(@^-@)

which follows the same idea: Glob qualifier @ restricts to links, and ^-@ means "not (^) a link (@) when following is enabled (-).

(See also this post about finding broken links in python, actually this demonstrates what you can do in all languages under Linux. See this blog post for a complete answer to the related question of "how to find broken links".)

Find Broken Symlinks with Python