How do i search directories and find files that match regex?
import os
import re
rootdir = "/mnt/externa/Torrents/completed"
regex = re.compile('(.*zip$)|(.*rar$)|(.*r01$)')
for root, dirs, files in os.walk(rootdir):
for file in files:
if regex.match(file):
print(file)
CODE BELLOW ANSWERS QUESTION IN FOLLOWING COMMENT
That worked really well, is there a way to do this if match is found on regex group 1 and do this if match is found on regex group 2 etc ? – nillenilsson
import os
import re
regex = re.compile('(.*zip$)|(.*rar$)|(.*r01$)')
rx = '(.*zip$)|(.*rar$)|(.*r01$)'
for root, dirs, files in os.walk("../Documents"):
for file in files:
res = re.match(rx, file)
if res:
if res.group(1):
print("ZIP",file)
if res.group(2):
print("RAR",file)
if res.group(3):
print("R01",file)
It might be possible to do this in a nicer way, but this works.
How do I search through a folder for the filename that matches a regular expression using Python?
This will find all files starting with two digits and ending in gif, you can add the files into a global list, if you wish:
import re
import os
r = re.compile(r'\d{2}.+gif$')
for root, dirs, files in os.walk('/home/vinko'):
l = [os.path.join(root,x) for x in files if r.match(x)]
if l: print l #Or append to a global list, whatever
How can I recursively find all files in current and subfolders based on regular expressions
To match whole paths that end in a filename matching a given regular expression, you could prepend .*/
to it, for example .*/f.+1$
. The .*/
should match the path preceding the filename.
Regular expression matching of the contents of text files in a directory
you need to read the files, you're just checking the patterns against the filenames.
for file in os.listdir('/home/ea/medical'):
contents = open(os.path.join('/home/ea/medical', file)).read()
status = 1
if re.search(pattern1, contents):
status += 1
if re.search(pattern2, contents):
status += 1
print(f"{file} Status: {status}")
How search for files using regex in linux shell script
Find all .py files.
find / -name '*.py'
Find files with the word "python" in the name.
find / -name '*python*'
Same as above but case-insensitive.
find / -iname '*python*'
Regex match, more flexible. Find both .py files and files with the word "python" in the name.
find / -regex '.*python.*\|.*\.py'
List (find) files with repeated pattern in their name
You can use
find . -type f -regextype posix-extended -regex '.*/(20190[0-9]{3})_fl_\1\.nc$'
The regex matches
.*/
- any chars up to the rightmost/
(necessary because the pattern used withfind
requires a full string match)(20190[0-9]{3})
- Group 1:2019
and any three digits_fl_
- a fixed substring\1
- backreference to Group 1 value\.nc
-.nc
string$
- end of input.
The -regextype posix-extended
option is necessary since the pattern above is POSIX ERE compliant.
Trying to use GNU find to search recursively for filenames only (not directories) containing a string in any portion of the file name
specification:
- match "rain"
- in filename
- only at start of a word
- case-insensitive
assumptions:
- define "word" to be sequence of letters (no punctuation, digits, etc)
- paths have form
prefix/name
whereprefix
can have one or more levels delimited by/
and name does not contain/
constraints:
find -iregex
matches against entire path (-name
only matches filename)find -iregex
must match entirety of path (eg. "c" is only a partial match and does not match path "a/b/c")
method:
find
can return matches against non-files (eg. directories). Given definition 6, we would be unable to tell if name
is a directory or an ordinary file. To satisfy 2, we can exclude non-files using find
's -type f
predicate.
We can compare paths found by find
against our specification by using find
's case-insensitive regex matching predicate (-iregex
). The "grep" flavour (-regextype grep
) is sufficiently expressive.
Just using 1, a suitable regex is: rain
2+6+7 says we must forbid /
after "rain": rain[^/]*$
[/]
matches character in set (ie./
)[^/]
:^
inverts match: ie. character that is not/
*
matches preceding match zero or more times$
constrains preceding match to occur at end of input
3+5 says there must be no immediately preceding word characters: [^a-z]rain[^/]*$
a-z
is a shortcut for the rangea
toz
8 requires matching the prefix explicitly: ^.*[^a-z]rain[^/]*$
^
outside of[...]
constrains subsequent match to occur at beginning of input.
matches anything[^a-z]
matches a non-alphabetic
Final command-line:
find . -type f -regextype grep -iregex '^.*[^a-z]rain[^/]*$'
Note: The leading ^
and trailing $
are not actually required, given 8, and could be elided.
exercise for the reader:
- extend "word" to non-ASCII characters (eg. UTF-8)
Related Topics
Installing Python Modules on Ubuntu
Conda Command Will Prompt Error: "Bad Interpreter: No Such File or Directory"
Why Is a List Comprehension So Much Faster Than Appending to a List
Python Functions Call by Reference
How to Append a New Row to an Old CSV File in Python
Typeerror: '<=' Not Supported Between Instances of 'Str' and 'Int'
Convert Unicode to Ascii Without Errors in Python
Why Does Using 'Arg=None' Fix Python's Mutable Default Argument Issue
Add Text to Existing PDF Using Python
Python/Ipython Importerror: No Module Named Site
How to Directly Send a Python Output to Clipboard
How Is the 'Is' Keyword Implemented in Python
Combine Several Images Horizontally with Python
Pandas Dataframe: Replace Nan Values with Average of Columns