A way to listen for changes to a file system from Python on Linux?
pyinotify is IMHO the only way to get system changes without scanning the directory.
Detect File Change Without Polling
For linux, there is pyinotify.
From the homepage:
Pyinotify is a Python module for
monitoring filesystems changes.
Pyinotify relies on a Linux Kernel
feature (merged in kernel 2.6.13)
called inotify. inotify is an
event-driven notifier, its
notifications are exported from kernel
space to user space through three
system calls. pyinotify binds these
system calls and provides an
implementation on top of them offering
a generic and abstract way to
manipulate those functionalities.
Thus it is obviously not cross-platform and relies on a new enough kernel version. However, as far as I can see, requiring kernel support would be true about any non-polling mechanism.
How can I get changes in a directory in Python
For what it's worth, if you need to use a polling scanner, here is an implementation. Subject to the obvious caveats about performance, and it not noticing files that appear and disappear again between poll intervals.
import time
import pathlib
import logging
logging.basicConfig(level=logging.DEBUG)
def get_paths(path):
answer = {}
for x in pathlib.Path(path).rglob("*"):
try:
answer[str(x)] = (x.stat().st_ctime, x.is_dir())
except FileNotFoundError:
pass
return answer
def log(name, is_dir, action):
descrip = "Directory" if is_dir else "File"
logging.info("{} {}: {}".format(descrip, action, name))
def scan(top_dir, sleep_time):
old_paths = get_paths(top_dir)
s_old_paths = set(old_paths)
while True:
time.sleep(sleep_time)
new_paths = get_paths(top_dir)
s_new_paths = set(new_paths)
cre_names = s_new_paths - s_old_paths
del_names = s_old_paths - s_new_paths
for name in cre_names:
_, is_dir = new_paths[name]
log(name, is_dir, "created")
for name in del_names:
_, is_dir = old_paths[name]
log(name, is_dir, "deleted")
for name in s_old_paths & s_new_paths:
new_time, is_dir = new_paths[name]
old_time, _ = old_paths[name]
if new_time != old_time:
log(name, is_dir, "modified")
old_paths = new_paths
s_old_paths = s_new_paths
top_dir = "U:"
sleep_time = 10
scan(top_dir, sleep_time)
Reading from a frequently updated file
I would recommend looking at David Beazley's Generator Tricks for Python, especially Part 5: Processing Infinite Data. It will handle the Python equivalent of a tail -f logfile
command in real-time.
# follow.py
#
# Follow a file like tail -f.
import time
def follow(thefile):
thefile.seek(0,2)
while True:
line = thefile.readline()
if not line:
time.sleep(0.1)
continue
yield line
if __name__ == '__main__':
logfile = open("run/foo/access-log","r")
loglines = follow(logfile)
for line in loglines:
print line,
watchdog monitoring file for changes
Instead of LoggingEventHandler
define your handler:
#!/usr/bin/python
import time
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
class MyHandler(FileSystemEventHandler):
def on_modified(self, event):
print(f'event type: {event.event_type} path : {event.src_path}')
if __name__ == "__main__":
event_handler = MyHandler()
observer = Observer()
observer.schedule(event_handler, path='/data/', recursive=False)
observer.start()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
on_modified
is called when a file or directory is modified.
Monitoring contents of files/directories?
For Unix/Linux based systems, you should use File Alteration Monitor Python bindings to libfam.
For Windows based systems, you should tie into the Win32 API FindFirstChangeNotification
and related functions.
As for a cross platform way, I don't know about a good cross platform way. I think it would be best to build a module yourself that works on either OS that uses one of the 2 above methods after detecting what OS it is.
Related Topics
How to Run Python Script on Usb Flash-Drive Insertion
Sharing Psycopg2/Libpq Connections Across Processes
How to Write Dataframe to Postgres Table
Copy Data from the Clipboard on Linux, MAC and Windows with a Single Python Script
Process Dies, If It Is Run via Paramiko Ssh Session and with "&" in the End
Conversion Text to Number in Python
How to Specify Table for Beautifulsoup to Find
Python Error - "Importerror: Cannot Import Name 'Dist'"
Cross Platform Numpy.Random.Seed()
How to Connect to Flask Local Server
Installing Python 2.7 Without Root
Running Python Script as Another User
How to Set File Permissions in Python3
How to Send Input on Stdin to a Python Script Defined Inside a Makefile
Trouble Importing Tabulate in Python 3.4
Google Cloud Sdk: Set Environment Variable_ Python --> Linux
Why Does Os.Path.Getsize() Return a Negative Number for a 10Gb File