How to Search Directories and Find Files That Match Regex

How do i search directories and find files that match regex?

import os
import re

rootdir = "/mnt/externa/Torrents/completed"
regex = re.compile('(.*zip$)|(.*rar$)|(.*r01$)')

for root, dirs, files in os.walk(rootdir):
  for file in files:
    if regex.match(file):
       print(file)

CODE BELLOW ANSWERS QUESTION IN FOLLOWING COMMENT

That worked really well, is there a way to do this if match is found on regex group 1 and do this if match is found on regex group 2 etc ? – nillenilsson

import os
import re

regex = re.compile('(.*zip$)|(.*rar$)|(.*r01$)')
rx = '(.*zip$)|(.*rar$)|(.*r01$)'

for root, dirs, files in os.walk("../Documents"):
  for file in files:
    res = re.match(rx, file)
    if res:
      if res.group(1):
        print("ZIP",file)
      if res.group(2):
        print("RAR",file)
      if res.group(3):
        print("R01",file)

It might be possible to do this in a nicer way, but this works.

How do I search through a folder for the filename that matches a regular expression using Python?

This will find all files starting with two digits and ending in gif, you can add the files into a global list, if you wish:

import re
import os
r = re.compile(r'\d{2}.+gif$')
for root, dirs, files in os.walk('/home/vinko'):
  l = [os.path.join(root,x) for x in files if r.match(x)]
  if l: print l #Or append to a global list, whatever

How can I recursively find all files in current and subfolders based on regular expressions

To match whole paths that end in a filename matching a given regular expression, you could prepend .*/ to it, for example .*/f.+1$. The .*/ should match the path preceding the filename.

Regular expression matching of the contents of text files in a directory

you need to read the files, you're just checking the patterns against the filenames.

for file in os.listdir('/home/ea/medical'):
    contents = open(os.path.join('/home/ea/medical', file)).read()
    status = 1
    if re.search(pattern1, contents):
        status += 1
    if re.search(pattern2, contents):
        status += 1
    print(f"{file} Status: {status}")

How search for files using regex in linux shell script

Find all .py files.

find / -name '*.py'

Find files with the word "python" in the name.

find / -name '*python*'

Same as above but case-insensitive.

find / -iname '*python*'

Regex match, more flexible. Find both .py files and files with the word "python" in the name.

find / -regex '.*python.*\|.*\.py'

List (find) files with repeated pattern in their name

You can use

find . -type f -regextype posix-extended -regex '.*/(20190[0-9]{3})_fl_\1\.nc$'

The regex matches

.*/ - any chars up to the rightmost / (necessary because the pattern used with find requires a full string match)
(20190[0-9]{3}) - Group 1: 2019 and any three digits
_fl_ - a fixed substring
\1 - backreference to Group 1 value
\.nc - .nc string
$ - end of input.

The -regextype posix-extended option is necessary since the pattern above is POSIX ERE compliant.

Trying to use GNU find to search recursively for filenames only (not directories) containing a string in any portion of the file name

specification:

match "rain"
in filename
only at start of a word
case-insensitive

assumptions:

define "word" to be sequence of letters (no punctuation, digits, etc)
paths have form prefix/name where prefix can have one or more levels delimited by / and name does not contain /

constraints:

find -iregex matches against entire path (-name only matches filename)
find -iregex must match entirety of path (eg. "c" is only a partial match and does not match path "a/b/c")

method:

find can return matches against non-files (eg. directories). Given definition 6, we would be unable to tell if name is a directory or an ordinary file. To satisfy 2, we can exclude non-files using find's -type f predicate.

We can compare paths found by find against our specification by using find's case-insensitive regex matching predicate (-iregex). The "grep" flavour (-regextype grep) is sufficiently expressive.

Just using 1, a suitable regex is: rain

2+6+7 says we must forbid / after "rain": rain[^/]*$

[/] matches character in set (ie. /)
[^/]: ^ inverts match: ie. character that is not /
* matches preceding match zero or more times
$ constrains preceding match to occur at end of input

3+5 says there must be no immediately preceding word characters: [^a-z]rain[^/]*$

a-z is a shortcut for the range a to z

8 requires matching the prefix explicitly: ^.*[^a-z]rain[^/]*$

^ outside of [...] constrains subsequent match to occur at beginning of input
. matches anything
[^a-z] matches a non-alphabetic

Final command-line:

find . -type f -regextype grep -iregex '^.*[^a-z]rain[^/]*$'

Note: The leading ^ and trailing $ are not actually required, given 8, and could be elided.

exercise for the reader:

extend "word" to non-ASCII characters (eg. UTF-8)