What's the Ruby Equivalent of Python's Os.Walk

What's the Ruby equivalent of Python's os.walk?

The following will print all files recursively. Then you can use File.directory? to see if the it is a directory or a file.

Dir['**/*'].each { |f| print f }

benchmarks: does python have a faster way of walking a network folder?

The Ruby implementation for Dir is in C (the file dir.c, according to this documentation). However, the Python equivalent is implemented in Python.

It's not surprising that Python is less performant than C, but the approach used in Python gives a little more flexibility - for example, you could skip entire subtrees named e.g. '.svn', '.git', '.hg' while traversing a directory hierarchy.

Most of the time, the Python implementation is fast enough.

Update: The skipping of files/subdirs doesn't affect the traversal rate at all, but the overall time taken to process a directory tree could certainly be reduced because you avoid having to traverse potentially large subtrees of the main tree. The time saved is of course proportional to how much you skip. In your case, which looks like folders of images, it's unlikely you would save much time (unless the images were under revision control, when skipping subtrees owned by the revision control system might have some impact).

Additional update: Skipping folders is done by changing the dirs value in place:

for root, dirs, files in os.walk(path):
for skip in ('.hg', '.git', '.svn', '.bzr'):
if skip in dirs:
dirs.remove(skip)
# Now process other stuff at this level, i.e.
# in directory "root". The skipped folders
# won't be recursed into.

os.walk and read file, having some problems with walking the path

Remove chdir and do f = csv.reader(open(os.path.join(root, 'out.csv'),'rb'))

C++ vs Python vs Ruby Performance in Listing All Directories Recursively

By running both the C++ version and the Ruby version with strace we can get some clues why the C++ version is slower.

Using the Linux source code for testing (65000 files):

strace -o '|wc' cpp_recursion
86417 518501 9463879

strace -o '|wc' ruby -e 'Dir.glob("**/*")'
30563 180115 1827588

We see that the C++ version does almost 3x more operations than Ruby.

Looking more closely at the strace output you will find that both programs use getdents to retrieve directory entries, but the C++ version runs lstat on every single file, while the Ruby version does not.

I can only conclude that the C++ version is not implemented as efficiently (or it possibly serves a different purpose) as the Ruby version. The speed difference is not a language issue, but an implementation issue.

N.B. The C++ version with -O optimization runs in 0.347s, while the Ruby version runs in 0.304s. At least on Linux lstat seems to not incur much overhead. Perhaps the situation is different on Windows.

Python os.walk and japanese filename crash

It seems like all answers so far are from Unix people who assume the Windows console is like a Unix terminal, which it is not.

The problem is that you can't write Unicode output to the Windows console using the normal underlying file I/O functions. The Windows API WriteConsole needs to be used. Python should probably be doing this transparently, but it isn't.

There's a different problem if you redirect the output to a file: Windows text files are historically in the ANSI codepage, not Unicode. You can fairly safely write UTF-8 to text files in Windows these days, but Python doesn't do that by default.

I think it should do these things, but here's some code to make it happen. You don't have to worry about the details if you don't want to; just call ConsoleFile.wrap_standard_handles(). You do need PyWin installed to get access to the necessary APIs.

import os, sys, io, win32api, win32console, pywintypes

def change_file_encoding(f, encoding):
"""
TextIOWrapper is missing a way to change the file encoding, so we have to
do it by creating a new one.
"""

errors = f.errors
line_buffering = f.line_buffering
# f.newlines is not the same as the newline parameter to TextIOWrapper.
# newlines = f.newlines

buf = f.detach()

# TextIOWrapper defaults newline to \r\n on Windows, even though the underlying
# file object is already doing that for us. We need to explicitly say "\n" to
# make sure we don't output \r\r\n; this is the same as the internal function
# create_stdio.
return io.TextIOWrapper(buf, encoding, errors, "\n", line_buffering)

class ConsoleFile:
class FileNotConsole(Exception): pass

def __init__(self, handle):
handle = win32api.GetStdHandle(handle)
self.screen = win32console.PyConsoleScreenBufferType(handle)
try:
self.screen.GetConsoleMode()
except pywintypes.error as e:
raise ConsoleFile.FileNotConsole

def write(self, s):
self.screen.WriteConsole(s)

def close(self): pass
def flush(self): pass
def isatty(self): return True

@staticmethod
def wrap_standard_handles():
sys.stdout.flush()
try:
# There seems to be no binding for _get_osfhandle.
sys.stdout = ConsoleFile(win32api.STD_OUTPUT_HANDLE)
except ConsoleFile.FileNotConsole:
sys.stdout = change_file_encoding(sys.stdout, "utf-8")

sys.stderr.flush()
try:
sys.stderr = ConsoleFile(win32api.STD_ERROR_HANDLE)
except ConsoleFile.FileNotConsole:
sys.stderr = change_file_encoding(sys.stderr, "utf-8")

ConsoleFile.wrap_standard_handles()

print("English 漢字 Кири́ллица")

This is a little tricky: if stdout or stderr is the console, we need to output with WriteConsole; but if it's not (eg. foo.py > file), that's not going to work, and we need to change the file's encoding to UTF-8 instead.

The opposite in either case will not work. You can't output to a regular file with WriteConsole (it's not actually a byte API, but a UTF-16 one; PyWin hides this detail), and you can't write UTF-8 to a Windows console.

Also, it really should be using _get_osfhandle to get the handle to stdout and stderr, rather than assuming they're assigned to the standard handles, but that API doesn't seem to have any PyWin binding.



Related Topics



Leave a reply



Submit