how to find a file name that contains a string abcd in the file content from multiple files in the directory using awk
with awk
$ awk -v search_string="$name" '$0~search_string{print FILENAME; exit}' bookrecords/*
however, I think grep
is better if you're not structurally searching
$ grep -lF "$name" bookrecords/*
How to find a file and delete containing some string in the body using awk command from multiple files?
sfind='abcd' awk '
BEGIN { sfind = ENVIRON["sfind"] }
FNR == 1 { secondPass = seen[FILENAME]++ }
secondPass { print FILENAME, $0; next }
index($2,sfind) {
ARGV[ARGC++] = FILENAME
nextfile # for efficiency if using GNU gawk.
}
' ./Record/*.txt
The above makes 2 passes of the input files - the first pass to identify those that contain the value of the string stored in sfind
in $2
and add them back into the and of ARGV[] so they'll be processed again later, the second to print the contents of those files identified on the first pass. If you don't want the input file name printed at the start of each output line then just change print FILENAME, $0
to print
.
The above will work for any number of matches in any number of files (0, 1, 2, whatever), for any file names, even if they contain spaces, globbing characters, etc., and for any characters in sfind
including backslash escapes and regexp metcharacters like .
or *
.
The above does partial string matching. Here are your options:
- Partial string:
index($2,sfind)
(as shown) - Full field string:
$2 == sfind
- Partial regexp:
$2 ~ sfind
- Full field regexp:
$2 ~ ("^" sfind "$")
Full word matching gets trickier, depends on your definition of a "word", and can be served by implementation-specific constructs so I'll leave that out unless you need it.
List a count of similar filenames in a folder
If you want to extract the first 5 characters you can use
ls | cut -c1-5 | sort | uniq -c |awk '{ print $2,$1 }'
which prints for the first example from the question
file1 3
file2 3
If you want to have a different number of characters, change the cut
command as necessary, e.g. cut -c1-6
for the first 6 characters.
If you want to separate the fields with a TAB character instead of a space, change the awk
command to
awk -vOFS=\\t '{ print $2,$1 }'
This would result in
file1 3
file2 3
Other solutions that work with the first example that shows file names with a date and time string, but don't work with the additional example added later:
With your first example files, the command
ls | sed 's/_[0-9]\{8\}_[0-9]\{6\}/_*/' | sort | uniq -c
prints
3 file1_*.csv
3 file2_*.csv
Explanation:
- The
sed
command replaces the sequence of a_
, 8 digits, another_
and another 6 digits with_*
.
With your first example file names, you will getfile1_*.csv
orfile2_*.csv
3 times each. sort
sorts the lines.uniq -c
counts the number of subsequent lines that are equal.
Or if you want to strip everything from the first _
up to the end, you can use
ls | sed 's/_.*//' | sort | uniq -c
which will print
3 file1
3 file2
You can add the awk
command from the first solution to change the output format.
Print all file names and append 1 to each name if there is a particular string else append 0
I would do it like this:
word=abc
for f in *
do
grep -vqswF "$word" "$f"
label=$?
echo "$f $word $label"
done
-v
makes the exit code of grep to be 1 if the word is not in the file, 0 if it is. -q
ensures that grep
does not output anything to stdout. In my example, I also used -s
which supresses error messages from unreadable files. You don't have to use this, but errors like this will show up in the exit code (usually 2 in such a case). -F
ensures that your code will still work if you set the word to something containing special characters for grep.
awk onliner script to search files under a dir for 2 strings
This is not exactly an one-liner but you can delete the newlines and the problem is solved :)
for file in $(ls) ; do
awk "/str1/{found=1}/str2/{if(found) print \"$file\"}" $file
done
What does it do: for each file listed by ls
, if str1
appears in it, the script marks it in a variable found
:
/str1/{found=1}
then, when str2
appears in a line, it verifies if found
is set. If so, prints the file name:
/str2/{
if (found)
print "$file"
}
EDIT: there is still a more concise way to solve your problem, using find
and xargs
:
find . -print0 -maxdepth 1 | \
xargs -0 -I{} awk '/str1/{found=1}/str2/{if(found) print "{}"}' "{}"
It is safer, too, because it handles files with spaces in their names. Also, you can extend it to search in subdirectories just removing the -maxdepth 1
option. Note that the awk
script was not changed.
(There always is a good solution using find
and xargs
but this solution is always a bit hard to find :D )
HTH!
How to find all files containing specific text (string) on Linux
Do the following:
grep -rnw '/path/to/somewhere/' -e 'pattern'
-r
or-R
is recursive,-n
is line number, and-w
stands for match the whole word.-l
(lower-case L) can be added to just give the file name of matching files.-e
is the pattern used during the search
Along with these, --exclude
, --include
, --exclude-dir
flags could be used for efficient searching:
- This will only search through those files which have .c or .h extensions:
grep --include=\*.{c,h} -rnw '/path/to/somewhere/' -e "pattern"
- This will exclude searching all the files ending with .o extension:
grep --exclude=\*.o -rnw '/path/to/somewhere/' -e "pattern"
- For directories it's possible to exclude one or more directories using the
--exclude-dir
parameter. For example, this will exclude the dirs dir1/, dir2/ and all of them matching *.dst/:
grep --exclude-dir={dir1,dir2,*.dst} -rnw '/path/to/somewhere/' -e "pattern"
This works very well for me, to achieve almost the same purpose like yours.
For more options, see man grep
.
Related Topics
Error: Could Not Find Tiller' When Running 'Helm Version'
Concurrency of Posix Threads in Multiprocessor MAChine
How to Bake Credential into Docker Image for Git
Linux: Writes Are Split into 512K Chunks
Mechanism of Clipboard of Xwindow
Mongodb Data Directory /Data/Db Not Found
Show Special Characters in Unix While Using 'Less' Command
How to Install SQL * Plus Client in Linux
How Can Objdump Emit Intel Syntax
Copy Files from Windows to Windows Subsystem for Linux (Wsl)
Docker Networking Namespace Not Visible in Ip Netns List
How to Configure a Systemd Service to Restart Periodically
Delete Files with String Found in File - Linux Cli