How to Exclude Directories from Grep -R

How can I exclude directories from grep -R?

SOLUTION 1 (combine find and grep)

The purpose of this solution is not to deal with grep performance but to show a portable solution : should also work with busybox or GNU version older than 2.5.

Use find, for excluding directories foo and bar :

find /dir \( -name foo -prune \) -o \( -name bar -prune \) -o -name "*.sh" -print

Then combine find and the non-recursive use of grep, as a portable solution :

find /dir \( -name node_modules -prune \) -o -name "*.sh" -exec grep --color -Hn "your text to find" {} 2>/dev/null \;

SOLUTION 2 (using the --exclude-dir option of grep):

You know this solution already, but I add it since it's the most recent and efficient solution. Note this is a less portable solution but more human-readable.

grep -R --exclude-dir=node_modules 'some pattern' /path/to/search

To exclude multiple directories, use --exclude-dir as:

--exclude-dir={node_modules,dir1,dir2,dir3}

SOLUTION 3 (Ag)

If you frequently search through code, Ag (The Silver Searcher) is a much faster alternative to grep, that's customized for searching code. For instance, it automatically ignores files and directories listed in .gitignore, so you don't have to keep passing the same cumbersome exclude options to grep or find.

Excluding directories with grep

This will work well

grep -rio --exclude-dir={ece,pytorch,sys,proc} 'hello' /

Note: This will also exclude other directories with same name.

Explanation:

Man page of grep gives below snippet

   --exclude-dir=GLOB
Skip any command-line directory with a name suffix that matches the pattern GLOB. When
searching recursively, skip any subdirectory whose base name matches GLOB. Ignore any
redundant trailing slashes in GLOB.

This means given pattern (GLOB) will be applied only to the actual name of the directory, and since a directory name don't contain / in its name, a pattern like /proc will never match.

Hence, we need to use --exclude-dir=proc or --exclude-dir=sys (or --exclude-dir={proc,sys}) just names for directories to be excluded without '/'.

How to exclude a directory in a recursive search using grep?

are you looking for this?

from grep man page:

--exclude-dir=DIR
Exclude directories matching the pattern DIR from recursive searches.

grep how to exclude sub directory

Using find you can exclude a whole path with slashes in it:

find . -path ./application/res -prune -o -type f -exec grep -l super {} +

Despite being more portable, this will be slower than grep -r. But as far as I'm concerned, GNU grep doesn't provide a mechanism for excluding paths.

exclude dir option in grep does not work as expected

The --exclude-dir flag of GNU grep takes a glob expression as an argument. The glob expression with more than items then becomes a brace expansion sequence which is expanded by the shell.

The expansion involves words separated by a comma character and doesn't like spaces between the words. So ideally it should have been

--exclude-dir={folder1,folder2}

You can see this as a simple brace expansion in your shell by running

echo {a,b}   # produces 'a b'
echo {a, b} # this doesn't undergo expansion by shell
echo --exclude-dir={folder1, folder2}
--exclude-dir={folder1, folder2}

so, your original command becomes

grep -r '--exclude-dir={folder1,' 'folder2}' foo

i.e. the exclude-dir takes a unexpanded glob expansion string as {folder1,' and 'folder2}' becomes the content that you are trying to search for, leaving foo as an unwanted extra argument, which the argparser of grep doesn't like throwing a command line parse error.

Remember brace expansion is a feature of the shell (e.g. bash), and not grep. In shells that don't support the feature, putting directories between {..} will be treated literally and might not work desirably.

How to exclude multiple directories that match a glob pattern in grep -R ?

you could use

grep -R hill *-2013

using --exclude-dir should work, too:

grep -R hill --exclude-dir=".*201[0-2]" .

without the quotes the asterisk would be expanded by bash. Additionally the wildcard for regular expressions is .*

. - matches any character
* - match any number of repetitions of the previous character, including none


Related Topics



Leave a reply



Submit