Non-alphanumeric list order from os.listdir()
I think the order has to do with the way the files are indexed on your FileSystem.
If you really want to make it adhere to some order you can always sort the list after getting the files.
Order in which files are read using os.listdir?
You asked several questions:
- Is there an order in which Python loops through the files?
No, Python does not impose any predictable order. The docs say 'The list is in arbitrary order'. If order matters, you must impose it. Practically speaking, the files are returned in the same order used by the underlying operating system, but one mustn't rely on that.
- Is it alphabetical?
Probably not. But even if it were you mustn't rely upon that. (See above).
- How could I establish an order?
for file in sorted(os.listdir(path)):
Python - order of os.listdir
This question has been addressed on SO, for example, here:
Nonalphanumeric list order from os.listdir() in Python
Looks like Python returns the order that the native filesystem uses, and you have to sort them afterwards.
Order of filenames from os.listdir
No, the j-th position will (or at least CAN) vary. From the docs (emphasis mine)
os.listdir(path='.')
Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order, and does not include the special entries '.' and '..' even if they are present in the directory.
That said, if you want it sorted, sorted
produces a stable lexicographically sorted list. sorted(os.listdir("your/path/here"))[n]
should always point to the n-th file (unless your directory changes contents!)
List ordering with os.listdir - appending files in order
import re
def sorted_alphanumeric(data):
convert = lambda text: int(text) if text.isdigit() else text.lower()
alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ]
return sorted(data, key=alphanum_key)
def load_images_sorted(folder):
images = []
for filename in sorted_alphanumeric(os.listdir(folder)):
# do whatever with these sorted files
...
Files from directory being pulled in wrong order with python
os.listdir
doesn't guarantee any ordering of the contents of a directory. If you want the items to be sorted, just sort them using the builtin sorted
function (with an appropriate key
function if necessary).
os.listdir(folder) function printing out the wrong order of files
From the documentation of os.listdir():
Return a list containing the names of the entries in the directory
given by path. The list is in arbitrary order, and does not include
the special entries '.' and '..' even if they are present in the
directory.
To get the order by numbers you can use for example regex pattern:
import re
import os
files = os.listdir('data')
re_pattern = re.compile('.+?(\d+)\.([a-zA-Z0-9+])')
files_ordered = sorted(files, key=lambda x: int(re_pattern.match(x).groups()[0]))
output
In [1]: files
Out[1]: ['spam2.txt', 'spam3.txt', 'spam304.txt', 'spam3300.txt', 'spam34.txt']
In [2]: files_ordered
Out[2]: ['spam2.txt', 'spam3.txt', 'spam34.txt', 'spam304.txt', 'spam3300.txt']
Short explanation:
- The sorted() accepts a
key
argument that can be used to sort your list. Here we read the number before the dot in the filename. Note: You have to make yourself sure that the regex pattern matches with all your files. .+?
in the beginning matches anything, but is non-greedy (will match as little as possible)(\d+)
will match and capture the digits, as many as there are, which you can then read from.groups()[0]
.\.
will match the dot in the filename([a-zA-Z0-9]+)
will match the file extension (alphanumeric)
What method does os.listdir() use to obtain a list of files in a directory?
Answer:
This is intended behaviour for the os.listdir()
method.
More Information:
According to the Python Software Foundation Documentation:
os.listdir(path='.')
Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order, and does not include the special entries '.' and '..' even if they are present in the directory.
os.listdir()
is an implementation of a C module which is located in posixmodule.c of the Python source. The return is based on the structure of the filesystem that the files are stored on and has different implementation depending on the evaluation of a conditional statement determining the local operating system. The directory in which you are calling in os.listdir()
is opened with the following C code:
static PyObject *
_posix_listdir(path_t *path, PyObject *list) {
/* stuff */
dirp = opendir(name);
Which opens a stream for the directory name stored in name
, and returns a pointer to the directory stream with a position of the first directory entry.
Continuing on:
for (;;) {
errno = 0;
Py_BEGIN_ALLOW_THREADS
ep = readdir(dirp);
Py_END_ALLOW_THREADS
if (ep == NULL) {
if (errno == 0) {
break;
} else {
Py_DECREF(list);
list = path_error(path);
goto exit;
}
}
if (ep->d_name[0] == '.' &&
(NAMLEN(ep) == 1 ||
(ep->d_name[1] == '.' && NAMLEN(ep) == 2)))
continue;
if (return_str)
v = PyUnicode_DecodeFSDefaultAndSize(ep->d_name, NAMLEN(ep));
else
v = PyBytes_FromStringAndSize(ep->d_name, NAMLEN(ep));
if (v == NULL) {
Py_CLEAR(list);
break;
}
if (PyList_Append(list, v) != 0) {
Py_DECREF(v);
Py_CLEAR(list);
break;
}
Py_DECREF(v);
}
readdir()
is called, with the previously assigned pointer to the directory filestream passed as a function parameter. readdir()
on Linux returns a dirent structure which represents the next point in the directory stream that dirp
is pointing to.
As documented on the readdir()
Linux man page:
A directory stream is opened using opendir(3). The order in which filenames are read by successive calls to readdir() depends on the filesystem implementation; it is unlikely that the names will be sorted in any fashion.
So this behaviour is expected and a result of filesystem implementation.
References:
- Miscellaneous operating system interfaces -
os.listdir(path='')
cpython / Modules / posixmodule.c
on Github- Linux Programmer's Manual -
readdir()
- Open Group Base Specifications Issue 6 -
dirent.h
format
Related Topics
Using Lambda Expression to Connect Slots in Pyqt
Python Re.Sub Group: Number After \Number
How to Capitalize the First Letter of Each Word in a String
Secondary Axis with Twinx(): How to Add to Legend
Differences Between Distribute, Distutils, Setuptools and Distutils2
Python Threading Multiple Bash Subprocesses
What Rules Does Pandas Use to Generate a View VS a Copy
Why Does List.Append() Return None
Embedding a Pygame Window into a Tkinter or Wxpython Frame
How to Bind Self Events in Tkinter Text Widget After It Will Binded by Text Widget
Plot a Horizontal Line on a Given Plot
How Would You Make a Comma-Separated String from a List of Strings
Getting the Index of the Returned Max or Min Item Using Max()/Min() on a List
Putting a Simple If-Then-Else Statement on One Line