How to Check All Columns Data Types of Table Using Awk Script

Use awk/bash to add 1 to all columns but first

You can do this with one awk call:

awk 'BEGIN{OFS="\t"} FNR==1{print; next} {for (i=2;i<=NF;i++)$i=$i+1}1' infile > outfile

AWK: Help on transforming data table

Using GNU awk for arrays of arrays:

$ cat tst.awk
BEGIN { OFS="\t" }
{
sub(/-.*/,"",$1)
minYear = ( NR==1 || $1 < minYear ? $1 : minYear )
maxYear = ( NR==1 || $1 > maxYear ? $1 : maxYear )
items[$2][$3]
vals[$1][$2][$3] += $4
typeTots[$1][$2] += $4
yearTots[$1] += $4
}
END {
printf "%s", OFS
for ( year=minYear; year<=maxYear; year++ ) {
printf "%s%s", OFS, year
}
print ""

for ( type in items ) {
itemCnt = 0
for ( item in items[type] ) {
printf "%s%s%s", (itemCnt++ ? "" : type), OFS, item
for ( year=minYear; year<=maxYear; year++ ) {
printf "%s%0.2f", OFS, vals[year][type][item]
}
print ""
}
printf "Subt%s", OFS
for ( year=minYear; year<=maxYear; year++ ) {
printf "%s%0.2f", OFS, typeTots[year][type]
}
print ORS
}

printf "Total%s", OFS
for ( year=minYear; year<=maxYear; year++ ) {
printf "%s%0.2f", OFS, yearTots[year]
}
print ""
}


$ awk -f tst.awk in.txt
2020 2021 2022
alcohol beer 0.00 6.00 12.00
smirnov 26.99 0.00 0.00
Subt 26.99 6.00 12.00

fruit orange 8.40 0.00 4.30
mango 0.00 6.99 7.20
banana 3.40 0.00 0.00
Subt 11.80 6.99 11.50

Total 38.79 12.99 23.50

or if you really want specific date ranges instead of just the year in the header:

$ cat tst.awk
BEGIN { OFS="\t" }
{
sub(/-.*/,"",$1)
minYear = ( NR==1 || $1 < minYear ? $1 : minYear )
maxYear = ( NR==1 || $1 > maxYear ? $1 : maxYear )
items[$2][$3]
vals[$1][$2][$3] += $4
typeTots[$1][$2] += $4
yearTots[$1] += $4
}
END {
printf "%s", OFS
for ( year=minYear; year<=maxYear; year++ ) {
printf "%s%s-01-01", OFS, year
}
print ""

printf "%s", OFS
for ( year=minYear; year<=maxYear; year++ ) {
printf "%s-%s-12-31", OFS, year
}
print ""

for ( type in items ) {
itemCnt = 0
for ( item in items[type] ) {
printf "%s%s%s", (itemCnt++ ? "" : type), OFS, item
for ( year=minYear; year<=maxYear; year++ ) {
printf "%s%0.2f", OFS, vals[year][type][item]
}
print ""
}
printf "Subt%s", OFS
for ( year=minYear; year<=maxYear; year++ ) {
printf "%s%0.2f", OFS, typeTots[year][type]
}
print ORS
}

printf "Total%s", OFS
for ( year=minYear; year<=maxYear; year++ ) {
printf "%s%0.2f", OFS, yearTots[year]
}
print ""
}


$ awk -f tst.awk in.txt | column -s$'\t' -t
2020-01-01 2021-01-01 2022-01-01
-2020-12-31 -2021-12-31 -2022-12-31
alcohol beer 0.00 6.00 12.00
smirnov 26.99 0.00 0.00
Subt 26.99 6.00 12.00
fruit orange 8.40 0.00 4.30
mango 0.00 6.99 7.20
banana 3.40 0.00 0.00
Subt 11.80 6.99 11.50
Total 38.79 12.99 23.50

awk: find minimum and maximum in column

Awk guesses the type.

String "10" is less than string "4" because character "1" comes before "4".
Force a type conversion, using addition of zero:

min=`awk 'BEGIN{a=1000}{if ($1<0+a) a=$1} END{print a}' mydata.dat`
max=`awk 'BEGIN{a= 0}{if ($1>0+a) a=$1} END{print a}' mydata.dat`

Extract specific columns from delimited file using Awk

I don't know if it's possible to do ranges in awk. You could do a for loop, but you would have to add handling to filter out the columns you don't want. It's probably easier to do this:

awk -F, '{OFS=",";print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$20,$21,$22,$23,$24,$25,$30,$33}' infile.csv > outfile.csv

something else to consider - and this faster and more concise:

cut -d "," -f1-10,20-25,30-33 infile.csv > outfile.csv

As to the second part of your question, I would probably write a script in perl that knows how to handle header rows, parsing the columns names from stdin or a file and then doing the filtering. It's probably a tool I would want to have for other things. I am not sure about doing in a one liner, although I am sure it can be done.

awk sql parsing

Rather that using awk. If you have access to the database, why not generate the SQL you need from the catalog

SELECT 'ALTER TABLE "' || TABSCHEMA || '"."' || TABNAME || '" '
|| LISTAGG('ALTER COLUMN "' || COLNAME || '" DROP GENERATED', ' ') || ';'
FROM SYSCAT.COLUMNS
WHERE GENERATED = 'A' AND IDENTITY = 'N'
GROUP BY
TABSCHEMA, TABNAME

which produces this against your test table

ALTER TABLE "SCHEMA  "."TABLE"ALTER COLUMN "COL6" DROP GENERATED ALTER COLUMN "COL7" DROP GENERATED;

Simple

Check if two files match in 2 column values and print those lines to a new output file

This should work:

$ awk 'NR==FNR{a[$4,$5]=$0;next}(($2,$5) in a)' file2 file1

Output:

CHR BP BETA SE P PHENOTYPE FDR CATEGORY SNP
10 110408937 3.386e+00 1.333e+00 1.112e-02 1 1 Medication rs113627704

Explained:

$ awk '
NR==FNR { # process file2 as output we want are from file1
a[$4,$5]=$0 # desired fields are 4th and 5th, use them as hash key
next # move to next record
} # process file1 below this point
(($2,$5) in a) # test if 2nd and 5th in hash and output
' file2 file1 # mind the file order


Related Topics



Leave a reply



Submit