How to Find a File Name That Contains a String "Abcd " in the File Content from Multiple Files in the Directory Using Awk

how to find a file name that contains a string abcd in the file content from multiple files in the directory using awk

with awk

$ awk -v search_string="$name" '$0~search_string{print FILENAME; exit}' bookrecords/*

however, I think grep is better if you're not structurally searching

$ grep -lF "$name" bookrecords/*

How to find a file and delete containing some string in the body using awk command from multiple files?


sfind='abcd' awk '
BEGIN { sfind = ENVIRON["sfind"] }
FNR == 1 { secondPass = seen[FILENAME]++ }
secondPass { print FILENAME, $0; next }
index($2,sfind) {
ARGV[ARGC++] = FILENAME
nextfile # for efficiency if using GNU gawk.
}
' ./Record/*.txt

The above makes 2 passes of the input files - the first pass to identify those that contain the value of the string stored in sfind in $2 and add them back into the and of ARGV[] so they'll be processed again later, the second to print the contents of those files identified on the first pass. If you don't want the input file name printed at the start of each output line then just change print FILENAME, $0 to print.

The above will work for any number of matches in any number of files (0, 1, 2, whatever), for any file names, even if they contain spaces, globbing characters, etc., and for any characters in sfind including backslash escapes and regexp metcharacters like . or *.

The above does partial string matching. Here are your options:

  • Partial string: index($2,sfind) (as shown)
  • Full field string: $2 == sfind
  • Partial regexp: $2 ~ sfind
  • Full field regexp: $2 ~ ("^" sfind "$")

Full word matching gets trickier, depends on your definition of a "word", and can be served by implementation-specific constructs so I'll leave that out unless you need it.

List a count of similar filenames in a folder

If you want to extract the first 5 characters you can use

ls | cut -c1-5 | sort | uniq -c |awk '{ print $2,$1 }'

which prints for the first example from the question

file1 3
file2 3

If you want to have a different number of characters, change the cut command as necessary, e.g. cut -c1-6 for the first 6 characters.

If you want to separate the fields with a TAB character instead of a space, change the awk command to

awk -vOFS=\\t '{ print $2,$1 }'

This would result in

file1   3
file2 3

Other solutions that work with the first example that shows file names with a date and time string, but don't work with the additional example added later:

With your first example files, the command

ls | sed 's/_[0-9]\{8\}_[0-9]\{6\}/_*/' | sort | uniq -c

prints

      3 file1_*.csv
3 file2_*.csv

Explanation:

  • The sed command replaces the sequence of a _, 8 digits, another _ and another 6 digits with _*.

    With your first example file names, you will get file1_*.csv or
    file2_*.csv 3 times each.
  • sort sorts the lines.
  • uniq -c counts the number of subsequent lines that are equal.

Or if you want to strip everything from the first _ up to the end, you can use

ls | sed 's/_.*//' | sort | uniq -c

which will print

      3 file1
3 file2

You can add the awk command from the first solution to change the output format.

Print all file names and append 1 to each name if there is a particular string else append 0

I would do it like this:

word=abc
for f in *
do
grep -vqswF "$word" "$f"
label=$?
echo "$f $word $label"
done

-v makes the exit code of grep to be 1 if the word is not in the file, 0 if it is. -q ensures that grep does not output anything to stdout. In my example, I also used -s which supresses error messages from unreadable files. You don't have to use this, but errors like this will show up in the exit code (usually 2 in such a case). -F ensures that your code will still work if you set the word to something containing special characters for grep.

awk onliner script to search files under a dir for 2 strings

This is not exactly an one-liner but you can delete the newlines and the problem is solved :)

for file in $(ls) ; do
awk "/str1/{found=1}/str2/{if(found) print \"$file\"}" $file
done

What does it do: for each file listed by ls, if str1 appears in it, the script marks it in a variable found:

/str1/{found=1}

then, when str2 appears in a line, it verifies if found is set. If so, prints the file name:

/str2/{
if (found)
print "$file"
}

EDIT: there is still a more concise way to solve your problem, using find and xargs:

find . -print0 -maxdepth 1 | \
xargs -0 -I{} awk '/str1/{found=1}/str2/{if(found) print "{}"}' "{}"

It is safer, too, because it handles files with spaces in their names. Also, you can extend it to search in subdirectories just removing the -maxdepth 1 option. Note that the awk script was not changed.

(There always is a good solution using find and xargs but this solution is always a bit hard to find :D )

HTH!

How to find all files containing specific text (string) on Linux

Do the following:

grep -rnw '/path/to/somewhere/' -e 'pattern'
  • -r or -R is recursive,
  • -n is line number, and
  • -w stands for match the whole word.
  • -l (lower-case L) can be added to just give the file name of matching files.
  • -e is the pattern used during the search

Along with these, --exclude, --include, --exclude-dir flags could be used for efficient searching:

  • This will only search through those files which have .c or .h extensions:
grep --include=\*.{c,h} -rnw '/path/to/somewhere/' -e "pattern"
  • This will exclude searching all the files ending with .o extension:
grep --exclude=\*.o -rnw '/path/to/somewhere/' -e "pattern"
  • For directories it's possible to exclude one or more directories using the --exclude-dir parameter. For example, this will exclude the dirs dir1/, dir2/ and all of them matching *.dst/:
grep --exclude-dir={dir1,dir2,*.dst} -rnw '/path/to/somewhere/' -e "pattern"

This works very well for me, to achieve almost the same purpose like yours.

For more options, see man grep.



Related Topics



Leave a reply



Submit