How Are Glob.Glob()'s Return Values Ordered

How are glob.glob()'s return values ordered?

It is probably not sorted at all and uses the order at which entries appear in the filesystem, i.e. the one you get when using ls -U. (At least on my machine this produces the same order as listing glob matches).

what is the order in which glob.glob reads files? if there is no specific order, can one be specified?

glob uses os.listdir to get filenames to match, and the doc for listdir reads: "Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order."

So you cannot count on an order - even if one appears to exist, it could be platform-specific and unreliable. You will need to sort the list of files yourself based on the criteria you want.

As far as I can see, glob does not contain any code that sorts its results. It's actually a pretty short module in case you want to read it to see what's going on under the hood.

glob.glob sorting - not as expected

That is because files as sorted based on their names (which are strings), and they are sorted in lexicographic order. Check [Python.Docs]: Sorting HOW TO for more sorting related details.
For things to work as you'd expect, the "faulty" file 9.bmp should be named 09.bmp (this applies to all such files). If you'd have more than 100 files, things would be even clearer (and desired file names would be 009.bmp, 035.bmp).

Anyway, there is an alternative (provided that all of the files follow the naming pattern), by converting the file's base name (without extension - check [Python.Docs]: os.path - Common pathname manipulations) to an int, and sort based on that (by providing key to [Python.Docs]: sorted(iterable, *, key=None, reverse=False))

files = sorted(glob.glob("../../Documents/ImageAnalysis.nosync/sliceImage/*.bmp"), key=lambda x: int(os.path.splitext(os.path.basename(x))[0]))

Python glob() returning list of paths in an Unexpected/Strange Pattern

The first sentence of the glob's documentation says:

The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order.

So, there is no order to the results you get from glob. You can sort it in any way you want, as shown in this answer.

How are the results from glob ordered?

As quoted from the man page here

The pathnames shall be in sort order as defined by the current setting of the LC_COLLATE category;

What are iterator, iterable, and iteration?

Iteration is a general term for taking each item of something, one after another. Any time you use a loop, explicit or implicit, to go over a group of items, that is iteration.

In Python, iterable and iterator have specific meanings.

An iterable is an object that has an __iter__ method which returns an iterator, or which defines a __getitem__ method that can take sequential indexes starting from zero (and raises an IndexError when the indexes are no longer valid). So an iterable is an object that you can get an iterator from.

An iterator is an object with a next (Python 2) or __next__ (Python 3) method.

Whenever you use a for loop, or map, or a list comprehension, etc. in Python, the next method is called automatically to get each item from the iterator, thus going through the process of iteration.

A good place to start learning would be the iterators section of the tutorial and the iterator types section of the standard types page. After you understand the basics, try the iterators section of the Functional Programming HOWTO.

Is there an equivalent of java.util.regex for glob type patterns?

There's nothing built-in, but it's pretty simple to convert something glob-like to a regex:

public static String createRegexFromGlob(String glob)
{
String out = "^";
for(int i = 0; i < glob.length(); ++i)
{
final char c = glob.charAt(i);
switch(c)
{
case '*': out += ".*"; break;
case '?': out += '.'; break;
case '.': out += "\\."; break;
case '\\': out += "\\\\"; break;
default: out += c;
}
}
out += '$';
return out;
}

this works for me, but I'm not sure if it covers the glob "standard", if there is one :)

Update by Paul Tomblin: I found a perl program that does glob conversion, and adapting it to Java I end up with:

    private String convertGlobToRegEx(String line)
{
LOG.info("got line [" + line + "]");
line = line.trim();
int strLen = line.length();
StringBuilder sb = new StringBuilder(strLen);
// Remove beginning and ending * globs because they're useless
if (line.startsWith("*"))
{
line = line.substring(1);
strLen--;
}
if (line.endsWith("*"))
{
line = line.substring(0, strLen-1);
strLen--;
}
boolean escaping = false;
int inCurlies = 0;
for (char currentChar : line.toCharArray())
{
switch (currentChar)
{
case '*':
if (escaping)
sb.append("\\*");
else
sb.append(".*");
escaping = false;
break;
case '?':
if (escaping)
sb.append("\\?");
else
sb.append('.');
escaping = false;
break;
case '.':
case '(':
case ')':
case '+':
case '|':
case '^':
case '$':
case '@':
case '%':
sb.append('\\');
sb.append(currentChar);
escaping = false;
break;
case '\\':
if (escaping)
{
sb.append("\\\\");
escaping = false;
}
else
escaping = true;
break;
case '{':
if (escaping)
{
sb.append("\\{");
}
else
{
sb.append('(');
inCurlies++;
}
escaping = false;
break;
case '}':
if (inCurlies > 0 && !escaping)
{
sb.append(')');
inCurlies--;
}
else if (escaping)
sb.append("\\}");
else
sb.append("}");
escaping = false;
break;
case ',':
if (inCurlies > 0 && !escaping)
{
sb.append('|');
}
else if (escaping)
sb.append("\\,");
else
sb.append(",");
break;
default:
escaping = false;
sb.append(currentChar);
}
}
return sb.toString();
}

I'm editing into this answer rather than making my own because this answer put me on the right track.



Related Topics



Leave a reply



Submit