quickest way to select/copy lines containing string from huge txt.gz file
Untested, but likely pretty close to this with GNU Parallel.
First make output directory so as not to overwrite any valuable data:
mkdir -p output
Now declare a function that does one file and export it to subprocesses so jobs started by GNU Parallel can find it:
doit(){
echo Processing $1
gzcat "$1" | awk '
/^[ST]\|/ || /^#D=/ || /^##/ {next} # ignore lines starting S|, T| etc
/^H\|/ {print ","} # prefix "H|" with ","
/^Q\|/ {print ",,"} # prefix "Q|" with ",,"
1 # print all other lines
' | gzip > output/"$1"
}
export -f doit
Now process all txt.gz
files in parallel and show progress bar too:
parallel --bar doit ::: *txt.gz
How to get few lines from a .gz compressed file without uncompressing
zcat(1)
can be supplied by either compress(1)
or by gzip(1)
. On your system, it appears to be compress(1)
-- it is looking for a file with a .Z
extension.
Switch to gzip -cd
in place of zcat
and your command should work fine:
gzip -cd CONN.20111109.0057.gz | head
Explanation
-c --stdout --to-stdout
Write output on standard output; keep original files unchanged. If there are several input files, the output consists of a sequence of independently compressed members. To obtain better compression, concatenate all input files before compressing
them.
-d --decompress --uncompress
Decompress.
how to search for a particular string from a .gz file?
gunzip -c mygzfile.gz | grep "string to be searched"
But this would only work if the .gz file contains text file which is true in your case.
Read .gz files from a list and print lines
You are passing the complete list of filtered log files name at once, that's why you are getting error, iterate the list pass or read file one by one and then search the file
import re
import os
import glob
import gzip
from datetime import datetime, timedelta
date_entry = raw_input('Give a date in format YEAR, MONTH, DAY \n')
date = datetime.strptime(re.sub("\s+", "", date_entry), "%Y,%m,%d").date()
path = "/applis/tacacs/log/"
list_of_files = [
file for file in glob.glob(path + '*.gz')
if date == datetime.fromtimestamp(os.path.getmtime(file)).date()
]
print("Files found: ")
print(list_of_files)
Adresse_IP = raw_input('IP Address \n')
for fname in list_of_files: #iterate log file names to open it one by one
with gzip.open(fname, 'r') as file: #open single file
for line in file: #iterate all lines
if re.search(Adresse_IP, line): #search line
print(line) #print line if match
How to delete from a text file, all lines that contain a specific string?
To remove the line and print the output to standard out:
sed '/pattern to match/d' ./infile
To directly modify the file – does not work with BSD sed:
sed -i '/pattern to match/d' ./infile
Same, but for BSD sed (Mac OS X and FreeBSD) – does not work with GNU sed:
sed -i '' '/pattern to match/d' ./infile
To directly modify the file (and create a backup) – works with BSD and GNU sed:
sed -i.bak '/pattern to match/d' ./infile
How to read first N lines of a file?
Python 3:
with open("datafile") as myfile:
head = [next(myfile) for x in range(N)]
print(head)
Python 2:
with open("datafile") as myfile:
head = [next(myfile) for x in xrange(N)]
print head
Here's another way (both Python 2 & 3):
from itertools import islice
with open("datafile") as myfile:
head = list(islice(myfile, N))
print(head)
How to extract a specific text from gz file?
Another using zgrep
and positive lookbehind:
$ zgrep -oP "(?<=^[ACTGN]{4})[ACTGN]{6}" foo.gz
TNACGG
CNACCT
Explained:
zgrep
:man zgrep
: search possibly compressed files for a regular expression-o
Print only the matched (non-empty) parts of a matching line-P
Interpret the pattern as a Perl-compatible regular expression (PCRE).(?<=^[ACTGN]{4})
positive lookbehind[ACTGN]{6}
match 6 named characters that are preceeded by abovefoo.gz
my test file
Related Topics
Shell Bash Script to Print Numbers in Ascending Order
Make Diff to Ignore Symbolic Link
Use Stdin from Within R Studio
Securing a Simple Linux Server That Holds a MySQL Database
Replace Key:Value from One File in Another File in Shellscript
Extract Unique Block of Lines from a File Using Shell Script
"Sort Filename | Uniq" Does Not Work on Large Files
How to Couple Xargs with Pdftotext Converter to Search Inside Multiple PDF Files
How to Replace The Word "Hello" with "Goodbye" in Every File in This Directory, and Also Recursively
How to Include Debug Information with Nasm
Symbol Lookup Error Undefined Symbol, But All Symbols Seem to Be Present
Can Not Add New User in Docker Container with Mounted /Etc/Passwd and /Etc/Shadow
Communication Between Linked Docker Containers
How to Determine The Ip of The Computer That Connects to Me
Cvs Tab Completion for Modules Under Linux
Can 'Vim' Open a Large File in Read Only Mode as Fast as 'Less'