How to use awk for a compressed file
You need to read them compressed files like this:
awk '{ ... }' <(gzip -dc input1.vcf.gz) <(gzip -dc input2.vcf.gz)
Try this:
awk 'FNR==NR { sub(/AA=\.;/,""); array[$1,$2]=$8; next } ($1,$2) in array { print $0 ";" array[$1,$2] }' <(gzip -dc input1.vcf.gz) <(gzip -dc input2.vcf.gz) | gzip > output.vcf.gz
Use awk on zipped files getting by find commands
Find all file in current dir recursively start with GAUR and end with .zip, read output by line,create directory, unzip file and redirect the output into awk print 2. and 3. col into a file in the current directory /gaur/original file path (sed cut the .zip extension from the file name) without .zip ending.
find -name 'GAUR*.zip' | while read line ; do mkdir -p gaur/$(dirname $line) && unzip -p $line | awk -F"|" '{ print $2","$3 }' > ./gaur/$(echo $line | sed 's/.zip$//g') ; done
You have to unzip the file first then you able to run awk on the file. So i made this ugly one liner to do this. But it hard to modify so I would use regular shell script for this.
AWK to process compressed files and printing original (compressed) file names
Assuming you are looping over all the files and piping their decompression directly into awk something like the following will work.
for file in *.gz; do
gunzip -c "$file" | awk -v origname="$file" '.... {print origname " whatever"}'
done
Edit: To use a list of filenames from some source other than a direct glob something like the following can be used.
$ ls *.awk
a.awk e.awk
$ while IFS= read -d '' filename; do
echo "$filename";
done < <(find . -name \*.awk -printf '%P\0')
e.awk
a.awk
To use xargs instead of the above loop will require the body of the command to be in a pre-written script file I believe which can be called with xargs and the filename.
awk for many compressed files
The find
'-exec' can be used to invoke (and pass arguments) to a single program. The challenge here is that two commands (cat|awk) need to be combined with a pipe. Two possible path: construct a shell command OR use the more flexible xargs.
# Using the 'shell -c' command
find . -iname '*.fastq.gz' -exec sh -c "zcat {} | awk '(NR%4==2) \
{N1+=length(\$0);gsub(/[AT]/,\"\");N2+=length(\$0);}END{print N2/N1;}'" \;
# OR, using process substitution
find . -iname '*.fastq.gz' -exec bash -c "awk '(NR%4==2) \
{N1+=length(\$0);gsub(/[AT]/,\"\");N2+=length(\$0);}END{print N2/N1;}' <(zcat {})" \;
See many references to find/xargs in stack overflow
How to use awk script to generate a file
awk
index starts with 1
and $0
represents full record. So column numbers would be 1, 3, 6
.
You may use this awk
:
awk 'BEGIN{FS=OFS=","} !$6{$6=$1} {print $1, $3, $6}' file
Time,MsgType,RTime
7:20:13,A,7:20:13
7:20:13,C,7:20:14
7:20:14,E,7:20:15
7:20:16,A,7:20:17
7:20:17,C,7:20:17
7:20:17,D,7:20:18
7:20:18,F,7:20:18
Getting FILENAME in awk for multiple compressed files
Your command is parsing stdin
provided by the output of your previous command, so filename is not available. One way to deal with it is this:
for f in *.tsv.gz; do
zcat "$f" | awk -F, -v f="$f" '$1=="aaa" || $1=="bbb"{print f (NF?", ":"") $0}'
done
Use zcat and sed or awk to edit compressed .gz text file
You can't bypass compression, but you can chain the decompress/edit/recompress together in an automated fashion:
for f in /dir/*; do
cp "$f" "$f~" &&
gzip -cd "$f~" | sed '2~4s/^.\{6\}//' | gzip > "$f"
done
If you're quite confident in the operation, you can remove the backup files by adding rm "$f~"
to the end of the loop body.
Split a large, compressed file into multiple outputs using AWK and BASH
This little perl script does the job nicely
- keeping all destination files open for performance
- doing error elementary handling
- Edit now also pipes output through
gzip
on the fly
There is a bit of a kludge with $fh
because apparently using the hash entry directly doesn't work
#!/usr/bin/perl
use strict;
use warnings;
my $suffix = ".txt.gz";
my %pipes;
while (my ($id, $line) = split /\t/,(<>),2)
{
exists $pipes{$id}
or open ($pipes{$id}, "|gzip -9 > '$id$suffix'")
or die "can't open/create $id$suffix, or cannot spawn gzip";
my $fh = $pipes{$id};
print $fh $line;
}
print STDERR "Created: " . join(', ', map { "$_$suffix" } keys %pipes) . "\n"
Oh, use it like
zcat input.gz | ./myscript.pl
How to replace a value to another value in a specific column on a gzipped file using awk?
You could check only for X in the first column and check if the row number is greater than 1.
Then you can replace X at the start of the string using ^X
with 23.
awk 'NR > 1 && $1=="X" {sub(/^X/,"23")}1' > out.txt
Related Topics
Why Should I Recompile an Entire Program Just for a Library Update
Print Kernel's Page Table Entries
How to Add JSON Object to JSON File Using Shell Script
Why No Zero-Copy Networking in Linux Kernel
What Does "$" Give Us Exactly in a Shell Script
Jmeter - Could Not Find the Testplan Class
Object-Oriented Shell for Linux
Best Way to Overwrite File with Itself
How to Use .Notparallel in Makefile Only on Specific Targets
How to Get Eclipse Swt Browser Component Running on Ubuntu 11.04 (Natty Narwhal) with Webkit
Shifting from Windows to *Nix Programming Platform
How to Find Which Process Is Leaking Memory
Installing Qt on Linux, Cannot Find -Lgl
How to Know Which Device Is Connected in Which /Dev/Ttyusb Port