Regular Expression Usage in Glob.Glob

Regular expression usage in glob.glob?

The easiest way would be to filter the glob results yourself. Here is how to do it using a simple loop comprehension:

import glob
res = [f for f in glob.glob("*.txt") if "abc" in f or "123" in f or "a1b" in f]
for f in res:
print f

You could also use a regexp and no glob:

import os
import re
res = [f for f in os.listdir(path) if re.search(r'(abc|123|a1b).*\.txt$', f)]
for f in res:
print f

(By the way, naming a variable list is a bad idea since list is a Python type...)

Regular expression and Python glob


glob regex doesn't support alternation pipe symbol (|), like you used, it's better to use some regex pattern (re) to create your desired file list on one line and then iterate over it. you have 3 range, so you need 3 for loop to do this! one of them using your mentioned regex will be as follow:

import re
import glob

dest_dir = "/tmp/folder3/"
for file in [f for f in glob.glob("/tmp/source/*.jpg") if re.search(r'([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|1000)\.jpg', f)]:
#shutil.copy(file, dest_dir)
print(file)

Regular expression, glob, Python

For this specific case, glob already supports what you need (see fnmatch docs for glob wildcards). You can just do:

for filename in glob.glob("pc[23456]??.txt"):

If you need to be extra specific that the two trailing characters are numbers (some files might have non-numeric characters there), you can replace the ?s with [0123456789], but otherwise, I find the ? a little less distracting.

In a more complicated scenario, you might be forced to resort to regular expressions, and you could do so here with:

import re

for filename in filter(re.compile(r'^pc_[2-6]\d\d\.txt$').match, os.listdir('.')):

but given that glob-style wildcards work well enough, you don't need to break out the big guns just yet.

Finding file name using regex from glob

regex solution:

import os
import re
res=[i for i in os.listdir(BASEDIR) if re.match(r'test\.[a-zA-Z0-9]{8}\.js',i)]
print(res)

NOTE: the solution would just be the name of file, you can use

os.join(BASEDIR,res[i])

to get full path

Python Glob regex file search with for single result from multiple matches

glob accepts Unix wildcards, not regexes. Those are less powerful but what you're asking can still be achieved. This:

glob.glob("/path/to/file/*[!0-9]3.txt")

filters the files containing 3 without digits before.

For other cases, you can use a list comprehension and regex:

[x for x in glob.glob("/path/to/file/*") if re.match(some_regex,os.path.basename(x))]

Filesystem independent way of using glob.glob and regular expressions with unicode filenames in Python

I'm assuming you want to match unicode equivalent filenames, e.g. you expect an input pattern of u'\xE9*' to match both filenames u'\xE9qui' and u'e\u0301qui' on any operating system, i.e. character-level pattern matching.

You have to understand that this is not the default on Linux, where bytes are taken as bytes, and where not every filename is a valid unicode string in the current system encoding (although Python 3 uses the 'surrogateescape' error handler to represent these as str anyway).

With that in mind, this is my solution:

def myglob(pattern, directory=u'.'):
pattern = unicodedata.normalize('NFC', pattern)
results = []
enc = sys.getfilesystemencoding()
for name in os.listdir(directory):
if isinstance(name, bytes):
try:
name = name.decode(enc)
except UnicodeDecodeError:
# Filenames that are not proper unicode won't match any pattern
continue
if fnmatch.filter([unicodedata.normalize('NFC', name)], pattern):
results.append(name)
return results

how can I use a particular regex with glob

I did the following:

local =  glob.glob('/Users/tp/Downloads/example/*/[A-Z]*-circle.txt')
for filePath in local:
matches = re.findall("[A-Z]{2,4}-circle.txt", filePath)
if matches:
print(filePath)

and that worked!



Related Topics



Leave a reply



Submit