A Way to "Listen" for Changes to a File System from Python on Linux

A way to listen for changes to a file system from Python on Linux?

pyinotify is IMHO the only way to get system changes without scanning the directory.

Detect File Change Without Polling

For linux, there is pyinotify.

From the homepage:

Pyinotify is a Python module for
monitoring filesystems changes.
Pyinotify relies on a Linux Kernel
feature (merged in kernel 2.6.13)
called inotify. inotify is an
event-driven notifier, its
notifications are exported from kernel
space to user space through three
system calls. pyinotify binds these
system calls and provides an
implementation on top of them offering
a generic and abstract way to
manipulate those functionalities.

Thus it is obviously not cross-platform and relies on a new enough kernel version. However, as far as I can see, requiring kernel support would be true about any non-polling mechanism.

How can I get changes in a directory in Python

For what it's worth, if you need to use a polling scanner, here is an implementation. Subject to the obvious caveats about performance, and it not noticing files that appear and disappear again between poll intervals.

import time
import pathlib
import logging

logging.basicConfig(level=logging.DEBUG)

def get_paths(path):
answer = {}
for x in pathlib.Path(path).rglob("*"):
try:
answer[str(x)] = (x.stat().st_ctime, x.is_dir())
except FileNotFoundError:
pass
return answer

def log(name, is_dir, action):
descrip = "Directory" if is_dir else "File"
logging.info("{} {}: {}".format(descrip, action, name))


def scan(top_dir, sleep_time):

old_paths = get_paths(top_dir)
s_old_paths = set(old_paths)

while True:
time.sleep(sleep_time)
new_paths = get_paths(top_dir)
s_new_paths = set(new_paths)
cre_names = s_new_paths - s_old_paths
del_names = s_old_paths - s_new_paths

for name in cre_names:
_, is_dir = new_paths[name]
log(name, is_dir, "created")

for name in del_names:
_, is_dir = old_paths[name]
log(name, is_dir, "deleted")

for name in s_old_paths & s_new_paths:
new_time, is_dir = new_paths[name]
old_time, _ = old_paths[name]
if new_time != old_time:
log(name, is_dir, "modified")

old_paths = new_paths
s_old_paths = s_new_paths


top_dir = "U:"
sleep_time = 10
scan(top_dir, sleep_time)

Reading from a frequently updated file

I would recommend looking at David Beazley's Generator Tricks for Python, especially Part 5: Processing Infinite Data. It will handle the Python equivalent of a tail -f logfile command in real-time.

# follow.py
#
# Follow a file like tail -f.

import time
def follow(thefile):
thefile.seek(0,2)
while True:
line = thefile.readline()
if not line:
time.sleep(0.1)
continue
yield line

if __name__ == '__main__':
logfile = open("run/foo/access-log","r")
loglines = follow(logfile)
for line in loglines:
print line,

watchdog monitoring file for changes

Instead of LoggingEventHandler define your handler:

#!/usr/bin/python
import time
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler

class MyHandler(FileSystemEventHandler):
def on_modified(self, event):
print(f'event type: {event.event_type} path : {event.src_path}')

if __name__ == "__main__":
event_handler = MyHandler()
observer = Observer()
observer.schedule(event_handler, path='/data/', recursive=False)
observer.start()

try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()

on_modified is called when a file or directory is modified.

Monitoring contents of files/directories?

For Unix/Linux based systems, you should use File Alteration Monitor Python bindings to libfam.

For Windows based systems, you should tie into the Win32 API FindFirstChangeNotification and related functions.

As for a cross platform way, I don't know about a good cross platform way. I think it would be best to build a module yourself that works on either OS that uses one of the 2 above methods after detecting what OS it is.



Related Topics



Leave a reply



Submit