What's the Ruby equivalent of Python's os.walk?
The following will print all files recursively. Then you can use File.directory? to see if the it is a directory or a file.
Dir['**/*'].each { |f| print f }
benchmarks: does python have a faster way of walking a network folder?
The Ruby implementation for Dir
is in C (the file dir.c
, according to this documentation). However, the Python equivalent is implemented in Python.
It's not surprising that Python is less performant than C, but the approach used in Python gives a little more flexibility - for example, you could skip entire subtrees named e.g. '.svn'
, '.git'
, '.hg'
while traversing a directory hierarchy.
Most of the time, the Python implementation is fast enough.
Update: The skipping of files/subdirs doesn't affect the traversal rate at all, but the overall time taken to process a directory tree could certainly be reduced because you avoid having to traverse potentially large subtrees of the main tree. The time saved is of course proportional to how much you skip. In your case, which looks like folders of images, it's unlikely you would save much time (unless the images were under revision control, when skipping subtrees owned by the revision control system might have some impact).
Additional update: Skipping folders is done by changing the dirs
value in place:
for root, dirs, files in os.walk(path):
for skip in ('.hg', '.git', '.svn', '.bzr'):
if skip in dirs:
dirs.remove(skip)
# Now process other stuff at this level, i.e.
# in directory "root". The skipped folders
# won't be recursed into.
os.walk and read file, having some problems with walking the path
Remove chdir
and do f = csv.reader(open(os.path.join(root, 'out.csv'),'rb'))
C++ vs Python vs Ruby Performance in Listing All Directories Recursively
By running both the C++ version and the Ruby version with strace
we can get some clues why the C++ version is slower.
Using the Linux source code for testing (65000 files):
strace -o '|wc' cpp_recursion
86417 518501 9463879
strace -o '|wc' ruby -e 'Dir.glob("**/*")'
30563 180115 1827588
We see that the C++ version does almost 3x more operations than Ruby.
Looking more closely at the strace output you will find that both programs use getdents
to retrieve directory entries, but the C++ version runs lstat
on every single file, while the Ruby version does not.
I can only conclude that the C++ version is not implemented as efficiently (or it possibly serves a different purpose) as the Ruby version. The speed difference is not a language issue, but an implementation issue.
N.B. The C++ version with -O
optimization runs in 0.347s, while the Ruby version runs in 0.304s. At least on Linux lstat
seems to not incur much overhead. Perhaps the situation is different on Windows.
Python os.walk and japanese filename crash
It seems like all answers so far are from Unix people who assume the Windows console is like a Unix terminal, which it is not.
The problem is that you can't write Unicode output to the Windows console using the normal underlying file I/O functions. The Windows API WriteConsole
needs to be used. Python should probably be doing this transparently, but it isn't.
There's a different problem if you redirect the output to a file: Windows text files are historically in the ANSI codepage, not Unicode. You can fairly safely write UTF-8 to text files in Windows these days, but Python doesn't do that by default.
I think it should do these things, but here's some code to make it happen. You don't have to worry about the details if you don't want to; just call ConsoleFile.wrap_standard_handles(). You do need PyWin installed to get access to the necessary APIs.
import os, sys, io, win32api, win32console, pywintypes
def change_file_encoding(f, encoding):
"""
TextIOWrapper is missing a way to change the file encoding, so we have to
do it by creating a new one.
"""
errors = f.errors
line_buffering = f.line_buffering
# f.newlines is not the same as the newline parameter to TextIOWrapper.
# newlines = f.newlines
buf = f.detach()
# TextIOWrapper defaults newline to \r\n on Windows, even though the underlying
# file object is already doing that for us. We need to explicitly say "\n" to
# make sure we don't output \r\r\n; this is the same as the internal function
# create_stdio.
return io.TextIOWrapper(buf, encoding, errors, "\n", line_buffering)
class ConsoleFile:
class FileNotConsole(Exception): pass
def __init__(self, handle):
handle = win32api.GetStdHandle(handle)
self.screen = win32console.PyConsoleScreenBufferType(handle)
try:
self.screen.GetConsoleMode()
except pywintypes.error as e:
raise ConsoleFile.FileNotConsole
def write(self, s):
self.screen.WriteConsole(s)
def close(self): pass
def flush(self): pass
def isatty(self): return True
@staticmethod
def wrap_standard_handles():
sys.stdout.flush()
try:
# There seems to be no binding for _get_osfhandle.
sys.stdout = ConsoleFile(win32api.STD_OUTPUT_HANDLE)
except ConsoleFile.FileNotConsole:
sys.stdout = change_file_encoding(sys.stdout, "utf-8")
sys.stderr.flush()
try:
sys.stderr = ConsoleFile(win32api.STD_ERROR_HANDLE)
except ConsoleFile.FileNotConsole:
sys.stderr = change_file_encoding(sys.stderr, "utf-8")
ConsoleFile.wrap_standard_handles()
print("English 漢字 Кири́ллица")
This is a little tricky: if stdout or stderr is the console, we need to output with WriteConsole; but if it's not (eg. foo.py > file), that's not going to work, and we need to change the file's encoding to UTF-8 instead.
The opposite in either case will not work. You can't output to a regular file with WriteConsole (it's not actually a byte API, but a UTF-16 one; PyWin hides this detail), and you can't write UTF-8 to a Windows console.
Also, it really should be using _get_osfhandle to get the handle to stdout and stderr, rather than assuming they're assigned to the standard handles, but that API doesn't seem to have any PyWin binding.
Related Topics
Using Beautifulsoup to Extract Text Without Tags
Pyqt: No Error Msg (Traceback) on Exit
Python Code to Remove HTML Tags from a String
Add Custom CSS Styling to Model Form Django
Equivalent of a Python Dict in R
Differencebetween Ruby and Python Versions Of"Self"
What Programming Language Features Are Well Suited for Developing a Live Coding Framework
The Difference Between Sys.Stdout.Write and Print
What Is the Problem with Shadowing Names Defined in Outer Scopes
How to Get Value Counts for Multiple Columns at Once in Pandas Dataframe
Preserving Styles Using Python's Xlrd,Xlwt, and Xlutils.Copy
Pandas: Replace Substring in String
Display a 'Loading' Message While a Time Consuming Function Is Executed in Flask
Best Way to Set Entry Background Color in Python Gtk3 and Set Back to Default
Dealing with the Class Imbalance in Binary Classification
Python's Equivalent for Ruby's Define_Method
Ruby Equivalent to Python's Help()
How to Map Numeric Data into Categories/Bins in Pandas Dataframe