Convert Column to Matrix Format Using Awk

Convert column to matrix format using awk

The following awk script handles :

  • any size of matrix
  • no relation between row and column indices so it keeps track of them separately.
  • If a certain row column index does not appear, the value will default to zero.

This is done in this way:

awk '
BEGIN{PROCINFO["sorted_in"] = "@ind_num_asc"}
(NR==1){next}
{row[$1]=1;col[$2]=1;val[$1" "$2]=$3}
END { printf "%8s",""; for (j in col) { printf "%8.3f",j }; printf "\n"
for (i in row) {
printf "%8.3f",i; for (j in col) { printf "%8.3f",val[i" "j] }; printf "\n"
}
}' <file>

How does it work:

  • PROCINFO["sorted_in"] = "@ind_num_asc", states that all arrays are sorted numerically by index.
  • (NR==1){next} : skip the first line
  • {row[$1]=1;col[$2]=1;val[$1" "$2]=$3}, process the line by storing the row and column index and accompanying value.
  • The end statement does all the printing.

This outputs:

          20.500  21.500  22.500
20.500 -4.100 1.200 7.000
21.500 -6.200 4.300 10.400
22.500 0.000 6.000 16.700

note: the usage of PROCINFO is a gawk feature.

However, if you make a couple of assumptions, you can do it much shorter:

  • the file contains all possible entries, no missing values
  • you do not want the indices of the rows and columns printed out:
  • the indices are sorted in column-major-order

The you can use the following short versions:

sort -g <file> | awk '($1+0!=$1){next}
($1!=o)&&(NR!=1){printf "\n"}
{printf "%8.3f",$3; o=$1 }'

which outputs

  -4.100   1.200   7.000
-6.200 4.300 10.400
0.000 6.000 16.700

or for the transposed:

awk '(NR==1){next}
($2!=o)&&(NR!=2){printf "\n"}
{printf "%8.3f",$3; o=$2 }' <file>

This outputs

  -4.100  -6.200   0.000
1.200 4.300 6.000
7.000 10.400 16.700

convert column file into matrix with awk

can also use pr which allows to specify number of columns required

$ seq 72 | pr -24ats' '
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72


If pr complains of page width too narrow, you need to increase page width from default 72. The formula is (col-1)*len(delimiter) + col where col is number of columns required

For ex:

$ # 99 is minimum width required for 50 columns with single character wide delimiter
$ seq 100 | pr -J -w99 -50ats,
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50
51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100


As output delimiter required is space, can also use

< input xargs -d'\n' -n24

-d'\n' so that each line from input file is considered as single argument

convert data matrix using awk

This can be a way:

awk '{a[$1,$2]=$3; col[$1]; row[$2]}
END {printf "%s", FS
for (c in col) printf "%s%s", c, FS; print "";
for (r in row) {
printf "%s%s", r, FS
for (c in col) printf "%s%s", a[c,r], FS
print ""
}
}' file

It is quite descriptive, but still:

  • Store the data in an array a[col, row].
  • Store the possible names of cols and rows.

Once the file has been read, loop through the results and print.

For the given input it returns:

$ awk '{a[$1,$2]=$3; col[$1]; row[$2]} END {printf "%s", FS; for (c in col) printf "%s%s", c, FS; print ""; for (r in row) { printf "%s%s", r, FS; for (c in col) printf "%s%s", a[c,r], FS; print ""}}' a
f1 f2 f3
x1 1.2 1.1 2.3
x2 2.2 1.2 4.4
x3 0 3.3 0.1

Insert a row and a column in a matrix using awk

$ cat tst.awk
BEGIN {
lat = 100
lon = 20
latWid = lonWid = 6
latDel = lonDel = 0.33
latFmt = lonFmt = "%*.2f"
}
NR==1 {
printf "%*s", latWid, ""
for (i=1; i<=NF; i++) {
printf lonFmt, lonWid, lon
lon += lonDel
}
print ""
}
{
printf latFmt, latWid, lat
lat += latDel
for (i=1; i<=NF; i++) {
printf "%*s", lonWid, $i
}
print ""
}

$ awk -f tst.awk file
20.00 20.33 20.66 20.99 21.32 21.65
100.00 2 3 4 1 2 3
100.33 3 4 5 2 4 6
100.66 2 4 0 5 0 7
100.99 0 0 5 6 3 8

Transpose column to row using awk

you can also use pr here

$ seq 9 | pr -3ts' '
1 4 7
2 5 8
3 6 9
$ seq 9 | pr -5ts' '
1 3 5 7 9
2 4 6 8

where the number indicates how many columns you need and the s option allows to specify the delimiter between columns

Convert matrix into value pairs with awk

awk one-liner (well a bit long maybe):

 awk 'NR==1{for(i=1;i<=NF;i++)t[i]=$i}{ r[NR]=$1; for(i=2;i<=NF;i++) v[t[i-1],$1]=$i}END{for(i=1;i<=length(t);i++) for(j=2;j<=NR;j++) print t[i], r[j], v[t[i],r[j]]
} ' file

I would like to format above one-liner into "three-liners" :) :

 awk 'NR==1{for(i=1;i<=NF;i++)t[i]=$i}
{ r[NR]=$1; for(i=2;i<=NF;i++) v[t[i-1],$1]=$i}
END{for(i=1;i<=length(t);i++)for(j=2;j<=NR;j++)print t[i], r[j], v[t[i],r[j]]} ' file

test:

kent$  cat t
15W 14.5W 14W 13.5W 13W
30N 19.3 19.3 19.2 18.9 18.6
30.5N 19.1 19 19 18.9 18.4
31N 18.9 18.8 18.7 18.6 18.3
31.5N 18.9 18.7 18.7 18.6 18.1
32N 18.6 18.5 18.6 18.5 17.5

kent$ awk 'NR==1{for(i=1;i<=NF;i++)t[i]=$i}
{ r[NR]=$1; for(i=2;i<=NF;i++) v[t[i-1],$1]=$i}
END{for(i=1;i<=length(t);i++)for(j=2;j<=NR;j++)print t[i], r[j], v[t[i],r[j]]} ' t
15W 30N 19.3
15W 30.5N 19.1
15W 31N 18.9
15W 31.5N 18.9
15W 32N 18.6
14.5W 30N 19.3
14.5W 30.5N 19
14.5W 31N 18.8
14.5W 31.5N 18.7
14.5W 32N 18.5
14W 30N 19.2
14W 30.5N 19
14W 31N 18.7
14W 31.5N 18.7
14W 32N 18.6
13.5W 30N 18.9
13.5W 30.5N 18.9
13.5W 31N 18.6
13.5W 31.5N 18.6
13.5W 32N 18.5
13W 30N 18.6
13W 30.5N 18.4
13W 31N 18.3
13W 31.5N 18.1
13W 32N 17.5

Row to column and column to row using awk

Some awk version

awk 1 RS=" |\n" file1  # gnu awk version
awk '{for (i=1;i<=NF;i++) print $i}' file1 # portable version
a
b
c
1
2
3


awk '{printf "%s" (NR%3==0?RS:FS),$1}' file2
a b c
1 2 3

printf "%s" # print pararameter #1 ($1)

NR%3==0?"RS:FS # add extra formatting. Test if line is number 3. If its not, use FS (a blank space), if it is use RS, a new line.

So this adjust the next parameter after every 3 line.

Turn file of 3 columns into a matrix

awk to the rescue!

awk 'BEGIN {FS=OFS="\t"} 
{col[$1]; row[$2]; val[$2,$1]=$3}
END {for(c in col) printf "%s", OFS c; print "";
for(r in row)
{printf "%s", r;
for(c in col) printf "%s", OFS val[r,c]
print ""}}' file

Category1 Category2 Category3
type1 + + +
type2 - + -
type3 + + -

from xyz to matrix with awk

awk does not do real multidimensional arrays, but you can fake it with a properly constructed string:

awk '
{mx[$1 "," $2] = $3}
END {
size=sqrt(NR)
for (x=1; x<=size; x++) {
for (y=1; y<=size; y++)
printf("%s ",mx[x "," y])
print ""
}
}
' filename

You can accomplish your example with a single awk call and a call to wc

awk -v "nlines=$(wc -l < filename)" '
BEGIN {size = sqrt(nlines)}
{printf("%s%s", $3, (NR % size == 0 ? ORS : OFS))
}' filename


Related Topics



Leave a reply



Submit