Convert column to matrix format using awk
The following awk
script handles :
- any size of matrix
- no relation between row and column indices so it keeps track of them separately.
- If a certain row column index does not appear, the value will default to zero.
This is done in this way:
awk '
BEGIN{PROCINFO["sorted_in"] = "@ind_num_asc"}
(NR==1){next}
{row[$1]=1;col[$2]=1;val[$1" "$2]=$3}
END { printf "%8s",""; for (j in col) { printf "%8.3f",j }; printf "\n"
for (i in row) {
printf "%8.3f",i; for (j in col) { printf "%8.3f",val[i" "j] }; printf "\n"
}
}' <file>
How does it work:
PROCINFO["sorted_in"] = "@ind_num_asc"
, states that all arrays are sorted numerically by index.(NR==1){next}
: skip the first line{row[$1]=1;col[$2]=1;val[$1" "$2]=$3}
, process the line by storing the row and column index and accompanying value.- The end statement does all the printing.
This outputs:
20.500 21.500 22.500
20.500 -4.100 1.200 7.000
21.500 -6.200 4.300 10.400
22.500 0.000 6.000 16.700
note: the usage of PROCINFO
is a gawk
feature.
However, if you make a couple of assumptions, you can do it much shorter:
- the file contains all possible entries, no missing values
- you do not want the indices of the rows and columns printed out:
- the indices are sorted in column-major-order
The you can use the following short versions:
sort -g <file> | awk '($1+0!=$1){next}
($1!=o)&&(NR!=1){printf "\n"}
{printf "%8.3f",$3; o=$1 }'
which outputs
-4.100 1.200 7.000
-6.200 4.300 10.400
0.000 6.000 16.700
or for the transposed:
awk '(NR==1){next}
($2!=o)&&(NR!=2){printf "\n"}
{printf "%8.3f",$3; o=$2 }' <file>
This outputs
-4.100 -6.200 0.000
1.200 4.300 6.000
7.000 10.400 16.700
convert column file into matrix with awk
can also use pr
which allows to specify number of columns required
$ seq 72 | pr -24ats' '
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
If pr
complains of page width too narrow
, you need to increase page width from default 72. The formula is (col-1)*len(delimiter) + col
where col
is number of columns required
For ex:
$ # 99 is minimum width required for 50 columns with single character wide delimiter
$ seq 100 | pr -J -w99 -50ats,
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50
51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100
As output delimiter required is space, can also use
< input xargs -d'\n' -n24
-d'\n'
so that each line from input file is considered as single argument
convert data matrix using awk
This can be a way:
awk '{a[$1,$2]=$3; col[$1]; row[$2]}
END {printf "%s", FS
for (c in col) printf "%s%s", c, FS; print "";
for (r in row) {
printf "%s%s", r, FS
for (c in col) printf "%s%s", a[c,r], FS
print ""
}
}' file
It is quite descriptive, but still:
- Store the data in an array
a[col, row]
. - Store the possible names of cols and rows.
Once the file has been read, loop through the results and print.
For the given input it returns:
$ awk '{a[$1,$2]=$3; col[$1]; row[$2]} END {printf "%s", FS; for (c in col) printf "%s%s", c, FS; print ""; for (r in row) { printf "%s%s", r, FS; for (c in col) printf "%s%s", a[c,r], FS; print ""}}' a
f1 f2 f3
x1 1.2 1.1 2.3
x2 2.2 1.2 4.4
x3 0 3.3 0.1
Insert a row and a column in a matrix using awk
$ cat tst.awk
BEGIN {
lat = 100
lon = 20
latWid = lonWid = 6
latDel = lonDel = 0.33
latFmt = lonFmt = "%*.2f"
}
NR==1 {
printf "%*s", latWid, ""
for (i=1; i<=NF; i++) {
printf lonFmt, lonWid, lon
lon += lonDel
}
print ""
}
{
printf latFmt, latWid, lat
lat += latDel
for (i=1; i<=NF; i++) {
printf "%*s", lonWid, $i
}
print ""
}
$ awk -f tst.awk file
20.00 20.33 20.66 20.99 21.32 21.65
100.00 2 3 4 1 2 3
100.33 3 4 5 2 4 6
100.66 2 4 0 5 0 7
100.99 0 0 5 6 3 8
Transpose column to row using awk
you can also use pr
here
$ seq 9 | pr -3ts' '
1 4 7
2 5 8
3 6 9
$ seq 9 | pr -5ts' '
1 3 5 7 9
2 4 6 8
where the number indicates how many columns you need and the s
option allows to specify the delimiter between columns
Convert matrix into value pairs with awk
awk one-liner (well a bit long maybe):
awk 'NR==1{for(i=1;i<=NF;i++)t[i]=$i}{ r[NR]=$1; for(i=2;i<=NF;i++) v[t[i-1],$1]=$i}END{for(i=1;i<=length(t);i++) for(j=2;j<=NR;j++) print t[i], r[j], v[t[i],r[j]]
} ' file
I would like to format above one-liner into "three-liners" :) :
awk 'NR==1{for(i=1;i<=NF;i++)t[i]=$i}
{ r[NR]=$1; for(i=2;i<=NF;i++) v[t[i-1],$1]=$i}
END{for(i=1;i<=length(t);i++)for(j=2;j<=NR;j++)print t[i], r[j], v[t[i],r[j]]} ' file
test:
kent$ cat t
15W 14.5W 14W 13.5W 13W
30N 19.3 19.3 19.2 18.9 18.6
30.5N 19.1 19 19 18.9 18.4
31N 18.9 18.8 18.7 18.6 18.3
31.5N 18.9 18.7 18.7 18.6 18.1
32N 18.6 18.5 18.6 18.5 17.5
kent$ awk 'NR==1{for(i=1;i<=NF;i++)t[i]=$i}
{ r[NR]=$1; for(i=2;i<=NF;i++) v[t[i-1],$1]=$i}
END{for(i=1;i<=length(t);i++)for(j=2;j<=NR;j++)print t[i], r[j], v[t[i],r[j]]} ' t
15W 30N 19.3
15W 30.5N 19.1
15W 31N 18.9
15W 31.5N 18.9
15W 32N 18.6
14.5W 30N 19.3
14.5W 30.5N 19
14.5W 31N 18.8
14.5W 31.5N 18.7
14.5W 32N 18.5
14W 30N 19.2
14W 30.5N 19
14W 31N 18.7
14W 31.5N 18.7
14W 32N 18.6
13.5W 30N 18.9
13.5W 30.5N 18.9
13.5W 31N 18.6
13.5W 31.5N 18.6
13.5W 32N 18.5
13W 30N 18.6
13W 30.5N 18.4
13W 31N 18.3
13W 31.5N 18.1
13W 32N 17.5
Row to column and column to row using awk
Some awk
version
awk 1 RS=" |\n" file1 # gnu awk version
awk '{for (i=1;i<=NF;i++) print $i}' file1 # portable version
a
b
c
1
2
3
awk '{printf "%s" (NR%3==0?RS:FS),$1}' file2
a b c
1 2 3
printf "%s"
# print pararameter #1 ($1)NR%3==0?"RS:FS
# add extra formatting. Test if line is number 3. If its not, use FS (a blank space), if it is use RS, a new line.
So this adjust the next parameter after every 3 line.
Turn file of 3 columns into a matrix
awk
to the rescue!
awk 'BEGIN {FS=OFS="\t"}
{col[$1]; row[$2]; val[$2,$1]=$3}
END {for(c in col) printf "%s", OFS c; print "";
for(r in row)
{printf "%s", r;
for(c in col) printf "%s", OFS val[r,c]
print ""}}' file
Category1 Category2 Category3
type1 + + +
type2 - + -
type3 + + -
from xyz to matrix with awk
awk does not do real multidimensional arrays, but you can fake it with a properly constructed string:
awk '
{mx[$1 "," $2] = $3}
END {
size=sqrt(NR)
for (x=1; x<=size; x++) {
for (y=1; y<=size; y++)
printf("%s ",mx[x "," y])
print ""
}
}
' filename
You can accomplish your example with a single awk call and a call to wc
awk -v "nlines=$(wc -l < filename)" '
BEGIN {size = sqrt(nlines)}
{printf("%s%s", $3, (NR % size == 0 ? ORS : OFS))
}' filename
Related Topics
How to Take Advantage of The Vdso Object with Your Own Programming Language
How to Execute a Command Inside a Screen Session
Linux: How to Lock The Pages of a Process in Memory
Why Doesn't Tar Preserve File Permissions
Check If All Lines from One File Are Present Somewhere in Another File
What Is The Fastest Way to Display an Image in Qt on X11 Without Opengl
Calculate Total Disk I/O by a Single Process
Hidden File .Env Not Copied Using Docker Copy
How to Open Serial Port in Linux Without Changing Any Pin
How to Simulate Two Consecutive Enter Key Presses for a Command in a Bash Script
Sudo Apt-Get Update Fail on Ubuntu 17.04
Bash Script Does Not Continue to Read The Next Line of File
In Linux, How to Create a File Descriptor for a Memory Region
How to Make Sure Only One Instance of a Bash Script Is Running at a Time
How to Remove Special Characters in File Names