Non-Alphanumeric List Order from Os.Listdir()

Non-alphanumeric list order from os.listdir()

I think the order has to do with the way the files are indexed on your FileSystem.
If you really want to make it adhere to some order you can always sort the list after getting the files.

Order in which files are read using os.listdir?

You asked several questions:

  • Is there an order in which Python loops through the files?

No, Python does not impose any predictable order. The docs say 'The list is in arbitrary order'. If order matters, you must impose it. Practically speaking, the files are returned in the same order used by the underlying operating system, but one mustn't rely on that.

  • Is it alphabetical?

Probably not. But even if it were you mustn't rely upon that. (See above).

  • How could I establish an order?

for file in sorted(os.listdir(path)):

Python - order of os.listdir

This question has been addressed on SO, for example, here:
Nonalphanumeric list order from os.listdir() in Python

Looks like Python returns the order that the native filesystem uses, and you have to sort them afterwards.

Order of filenames from os.listdir

No, the j-th position will (or at least CAN) vary. From the docs (emphasis mine)

os.listdir(path='.')
Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order, and does not include the special entries '.' and '..' even if they are present in the directory.

That said, if you want it sorted, sorted produces a stable lexicographically sorted list. sorted(os.listdir("your/path/here"))[n] should always point to the n-th file (unless your directory changes contents!)

List ordering with os.listdir - appending files in order

import re 

def sorted_alphanumeric(data):
convert = lambda text: int(text) if text.isdigit() else text.lower()
alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ]
return sorted(data, key=alphanum_key)

def load_images_sorted(folder):
images = []
for filename in sorted_alphanumeric(os.listdir(folder)):
# do whatever with these sorted files
...

Files from directory being pulled in wrong order with python

os.listdir doesn't guarantee any ordering of the contents of a directory. If you want the items to be sorted, just sort them using the builtin sorted function (with an appropriate key function if necessary).

os.listdir(folder) function printing out the wrong order of files

From the documentation of os.listdir():

Return a list containing the names of the entries in the directory
given by path. The list is in arbitrary order, and does not include
the special entries '.' and '..' even if they are present in the
directory.

To get the order by numbers you can use for example regex pattern:

import re 
import os

files = os.listdir('data')
re_pattern = re.compile('.+?(\d+)\.([a-zA-Z0-9+])')
files_ordered = sorted(files, key=lambda x: int(re_pattern.match(x).groups()[0]))

output

In [1]: files
Out[1]: ['spam2.txt', 'spam3.txt', 'spam304.txt', 'spam3300.txt', 'spam34.txt']

In [2]: files_ordered
Out[2]: ['spam2.txt', 'spam3.txt', 'spam34.txt', 'spam304.txt', 'spam3300.txt']

Short explanation:

  • The sorted() accepts a key argument that can be used to sort your list. Here we read the number before the dot in the filename. Note: You have to make yourself sure that the regex pattern matches with all your files.
  • .+? in the beginning matches anything, but is non-greedy (will match as little as possible)
  • (\d+) will match and capture the digits, as many as there are, which you can then read from .groups()[0].
  • \. will match the dot in the filename
  • ([a-zA-Z0-9]+) will match the file extension (alphanumeric)

What method does os.listdir() use to obtain a list of files in a directory?

Answer:

This is intended behaviour for the os.listdir() method.

More Information:

According to the Python Software Foundation Documentation:

os.listdir(path='.')

Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order, and does not include the special entries '.' and '..' even if they are present in the directory.

os.listdir() is an implementation of a C module which is located in posixmodule.c of the Python source. The return is based on the structure of the filesystem that the files are stored on and has different implementation depending on the evaluation of a conditional statement determining the local operating system. The directory in which you are calling in os.listdir() is opened with the following C code:

static PyObject *
_posix_listdir(path_t *path, PyObject *list) {
/* stuff */
dirp = opendir(name);

Which opens a stream for the directory name stored in name, and returns a pointer to the directory stream with a position of the first directory entry.

Continuing on:

for (;;) {
errno = 0;
Py_BEGIN_ALLOW_THREADS
ep = readdir(dirp);
Py_END_ALLOW_THREADS
if (ep == NULL) {
if (errno == 0) {
break;
} else {
Py_DECREF(list);
list = path_error(path);
goto exit;
}
}
if (ep->d_name[0] == '.' &&
(NAMLEN(ep) == 1 ||
(ep->d_name[1] == '.' && NAMLEN(ep) == 2)))
continue;
if (return_str)
v = PyUnicode_DecodeFSDefaultAndSize(ep->d_name, NAMLEN(ep));
else
v = PyBytes_FromStringAndSize(ep->d_name, NAMLEN(ep));
if (v == NULL) {
Py_CLEAR(list);
break;
}
if (PyList_Append(list, v) != 0) {
Py_DECREF(v);
Py_CLEAR(list);
break;
}
Py_DECREF(v);
}

readdir() is called, with the previously assigned pointer to the directory filestream passed as a function parameter. readdir() on Linux returns a dirent structure which represents the next point in the directory stream that dirp is pointing to.

As documented on the readdir() Linux man page:

A directory stream is opened using opendir(3). The order in which filenames are read by successive calls to readdir() depends on the filesystem implementation; it is unlikely that the names will be sorted in any fashion.

So this behaviour is expected and a result of filesystem implementation.

References:

  • Miscellaneous operating system interfaces - os.listdir(path='')
  • cpython / Modules / posixmodule.c on Github
  • Linux Programmer's Manual - readdir()
  • Open Group Base Specifications Issue 6 - dirent.h format


Related Topics



Leave a reply



Submit