Finding Human-Readable Files on Unix

Finding human-readable files on Unix

find and file are your friends here:

find /dir/to/search -type f -exec sh -c 'file -b {} | grep text &>/dev/null' \; -print

This will find any files (NOTE: it will not find symlinks directories sockets, etc., only regular files) in /dir/to/search and run sh -c 'file -b {} | grep text &>/dev/null' ; which looks at the type of file and looks for text in the description. If this returns true (i.e., text is in the line) then it prints the filename.

NOTE: using the -b flag to file means that the filename is not printed and therefore cannot create any issues with the grep. E.g., without the -b flag the binary file gettext would erroneously be detected as a textfile.

For example,

root@osdevel-pete# find /bin -exec sh -c 'file -b {} |  grep text &>/dev/null' \; -print
/bin/gunzip
/bin/svnshell.sh
/bin/unicode_stop
/bin/unicode_start
/bin/zcat
/bin/redhat_lsb_init
root@osdevel-pete# find /bin -type f -name *text*
/bin/gettext

If you want to look in compressed files use the --uncompress flag to file. For more information and flags to file see man file.

Viewing all directories that are world readable

You can use find -perm like this:

find /base/path -type d -perm +o+r

+o+r will only list directories with word (others) read bit on.

Linux command: How to 'find' only text files?

I know this is an old thread, but I stumbled across it and thought I'd share my method which I have found to be a very fast way to use find to find only non-binary files:

find . -type f -exec grep -Iq . {} \; -print

The -I option to grep tells it to immediately ignore binary files and the . option along with the -q will make it immediately match text files so it goes very fast. You can change the -print to a -print0 for piping into an xargs -0 or something if you are concerned about spaces (thanks for the tip, @lucas.werkmeister!)

Also the first dot is only necessary for certain BSD versions of find such as on OS X, but it doesn't hurt anything just having it there all the time if you want to put this in an alias or something.

EDIT: As @ruslan correctly pointed out, the -and can be omitted since it is implied.

Recursively find files that are not publicly readable

Use the find command:

find . ! -perm -o=r

Will search for files within the current directory and subdirectories that has a file permission so that the "others" group cannot read that file.

The manual page for find gives some examples of these options.

You can run this command as the www-data user:

find . ! -readable

To find all files that are NOT readable by the web server.

How to search for a text in specific files in unix

It might be better to use find, since grep's include/exclude can get a bit confusing:

find -type f -name "*.xml" -exec grep -l 'hello' {} +

This looks for files whose name finishes with .xml and performs a grep 'hello' on them. With -l (L) we make the file name to be printed, without the matched line.

Explanation

  • find -type f this finds files in the given directory structure.
  • -name "*.xml" selects those files whose name finishes with .xml.
  • -exec execute a command on every result of the find command.
  • -exec grep -l 'hello' {} + execute grep -l 'hello' on the given file. With {} + we are refering to the matched name (it is like doing grep 'hello' file but refering to the name of the file provided by the find command). Also, grep -l (L) returns the file name, not the match itself.

How to find all files containing specific text (string) on Linux?

Do the following:

grep -rnw '/path/to/somewhere/' -e 'pattern'
  • -r or -R is recursive,
  • -n is line number, and
  • -w stands for match the whole word.
  • -l (lower-case L) can be added to just give the file name of matching files.
  • -e is the pattern used during the search

Along with these, --exclude, --include, --exclude-dir flags could be used for efficient searching:

  • This will only search through those files which have .c or .h extensions:

    grep --include=\*.{c,h} -rnw '/path/to/somewhere/' -e "pattern"
  • This will exclude searching all the files ending with .o extension:

    grep --exclude=\*.o -rnw '/path/to/somewhere/' -e "pattern"
  • For directories it's possible to exclude one or more directories using the --exclude-dir parameter. For example, this will exclude the dirs dir1/, dir2/ and all of them matching *.dst/:

    grep --exclude-dir={dir1,dir2,*.dst} -rnw '/path/to/search/' -e "pattern"

This works very well for me, to achieve almost the same purpose like yours.

For more options, see man grep.



Related Topics



Leave a reply



Submit