Find String Inside a Gzipped File in a Folder

find string inside a gzipped file in a folder

zgrep will look in gzipped files, has a -R recursive option, and a -H show me the filename option:

zgrep -R --include=*.gz -H "pattern match" .

OS specific commands as not all arguments work across the board:

Mac 10.5+: zgrep -R --include=\*.gz -H "pattern match" .

Ubuntu 16+: zgrep -i -H "pattern match" *.gz

how to search for a particular string from a .gz file?


gunzip -c mygzfile.gz | grep "string to be searched"

But this would only work if the .gz file contains text file which is true in your case.

Recursive grep for gz files search string from an output string

you can pipe the find results through a second grep:

find . -name "*.gz" -exec zgrep -H "PATTERN1" {} \; | grep "PATTERN2"

How to seach for a string in .gz file?

The statement

    if 'Alas!':

merely checks if the string value 'Alas!' is "truthy" (it is, by definition); you want to check if the variable line contains this substring;

    if 'Alas!' in line:

Another problem is that you are opening the output file multiple times, overwriting any results from previous input files. You want to open it only once, at the beginning (or open for appending; but repeatedly opening and closing the same file is unnecessary and inefficient).

A better design altogether might be to simply print to standard output, and let the user redirect the output to a file if they like. (Also, probably accept the input files as command-line arguments, rather than hardcoding a fugly complex relative path.)

A third problem is that the input line already contains a newline, but print() will add another. Either strip the newline before printing, or tell print not to supply another (or switch to write which doesn't add one).

import gzip
import glob

with open('file1.txt', 'w') as o:
for file in glob.glob('myfiles/all*/input.gz'):
with gzip.open(file, 'rt') as f:
for line in f:
if 'Alas!' in line:
print(line, file=o, end='')

Demo: https://ideone.com/rTXBSS

Iterate over *.gz files and return where contents do NOT contain string

Add the -v flag to invert the match:

find . -name \*.gz -print0 | xargs -0 zgrep -v "STRING"

Search a String in folder containing zips of text files

You can use zgrep, which has the same semantics as grep, but can search within compressed files:

$ zgrep -Ril "My_Name"

Unix script to search within a compressed .gz file

The essence of how to accomplish this is to get the names of the files within the tarball to search over, and extract their content to be searched, while not extracting anything else. Because we don't want to write to the file system, we can use the -O flag to instead extract to standard-out.

tar -tzf file.tar.gz | grep '\.txt' | xargs tar -Oxzf file.tar.gz | grep -B 3 "string-or-regex" will concatenate all of the files in the .tar.gz with names ending in ".txt", and grep them for the given string, also outputting the 3 previous lines. It won't tell you which file in the tarball any match came from, and the "three previous lines" may in fact come from the previous file.

You can instead do:

for file in $(tar -tzf file.tar.gz | grep '\.txt'); do 
tar -Oxzf file.tar.gz "$file" | grep -B 3 --label="$file" -H "string-or-regex"
done

which will respect file boundaries, and report the file names, but be much less efficient.

(-z tells tar it is gzip compressed. -t lists the contents. -x extracts. -O redirects to standard output rather than the file system. Older tars may not have the -O or -z flag, and will want the flags without -: e.g. tar tz file.tar.gz)

Okay, so you have an unusable grep. We can fix that with awk!

#!/usr/bin/awk -f
BEGIN { context=3; }
{ add_buffer($0) }
/pattern/ { print_buffer() }
function add_buffer(line)
{
buffer[NR % context]=line
}
function print_buffer()
{
for(i = max(1, NR-context+1); i <= NR; i++) {
print buffer[i % context]
}
}
function max(a,b)
{
if (a > b) { return a } else { return b }
}

This will not coalesce adjacent matches, unlike grep -B, and can thus repeat lines that
are within 3 lines of two different matches.

grep several strings from gz file

use zgrep to search into compressed files. There are also other commands like bzgrep (for bzip2 files), xzgrep etc for compressed files.

zgrep -f match_strings.txt file.gz

-f is the flag for reading the patterns from a specified file.



Related Topics



Leave a reply



Submit