Bash: transform key-value lines to CSV format
A simple solution with cut
, paste
, and head
(assumes input file file
, outputs to file out.csv
):
#!/usr/bin/env bash
{ cut -d':' -f1 file | head -n 3 | paste -d, - - -;
cut -d':' -f2- file | paste -d, - - -; } >out.csv
cut -d':' -f1 file | head -n 3
creates the header line:cut -d':' -f1 file
extracts the first:
-based field from each input line, andhead -n 3
stops after 3 lines, given that the headers repeat every 3 lines.paste -d, - - -
takes 3 input lines from stdin (one for each-
) and combines them to a single, comma-separated output line (-d,
)
cut -d':' -f2- file | paste -d, - - -
creates the data lines:cut -d':' -f2- file
extracts everything after the:
from each input line.As above,
paste
then combines 3 values to a single, comma-separated output line.
agc points out in a comment that the column count (3
) and the paste
operands (- - -
) are hard-coded above.
The following solution parameterizes the column count (set it via n=...
):
{ n=3; pasteOperands=$(printf '%.s- ' $(seq $n))
cut -d':' -f1 file | head -n $n | paste -d, $pasteOperands;
cut -d':' -f2- file | paste -d, $pasteOperands; } >out.csv
printf '%.s- ' $(seq $n)
is a trick that produces a list of as many space-separated-
chars. as there are columns ($n
).
While the previous solution is now parameterized, it still assumes that the column count is known in advance; the following solution dynamically determines the column count (requires Bash 4+ due to use of readarray
, but could be made to work with Bash 3.x):
# Determine the unique list of column headers and
# read them into a Bash array.
readarray -t columnHeaders < <(awk -F: 'seen[$1]++ { exit } { print $1 }' file)
# Output the header line.
(IFS=','; echo "${columnHeaders[*]}") >out.csv
# Append the data lines.
cut -d':' -f2- file | paste -d, $(printf '%.s- ' $(seq ${#columnHeaders[@]})) >>out.csv
awk -F: 'seen[$1]++ { exit } { print $1 }
outputs each input line's column name (the 1st:
-separated field), remembers the column names in associative arrayseen
, and stops at the first column name that is seen for the second time.readarray -t columnHeaders
readsawk
's output line by line into arraycolumnHeaders
(IFS=','; echo "${columnHeaders[*]}") >out.csv
prints the array elements using a space as the separator (specified via$IFS
); note the use of a subshell ((...)
) so as to localize the effect of modifying$IFS
, which would otherwise have global effects.The
cut ...
pipeline uses the same approach as before, with the operands forpaste
being created based on the count of the elements of arraycolumnHeaders
(${#columnHeaders[@]}
).
To wrap the above up in a function that outputs to stdout and also works with Bash 3.x:
toCsv() {
local file=$1 columnHeaders
# Determine the unique list of column headers and
# read them into a Bash array.
IFS=$'\n' read -d '' -ra columnHeaders < <(awk -F: 'seen[$1]++ { exit } { print $1 }' "$file")
# Output the header line.
(IFS=','; echo "${columnHeaders[*]}")
# Append the data lines.
cut -d':' -f2- "$file" | paste -d, $(printf '%.s- ' $(seq ${#columnHeaders[@]}))
}
# Sample invocation
toCsv file > out.csv
How to convert space separated key value data into CSV format in bash?
awk
to the rescue!
$ awk -v OFS=',' '{for(i=1;i<NF;i+=2)
{if(!($i in c)){c[$i];cols[++k]=$i};
v[NR,$i]=$(i+1)}}
END{for(i=1;i<=k;i++) printf "%s", cols[i] OFS;
print "";
for(i=1;i<=NR;i++)
{for(j=1;j<=k;j++) printf "%s", v[i,cols[j]] OFS;
print ""}}' file
Table,count,size,
SCOTT.TABLE1,3889,300,
SCOTT.TABLE2,7744,,
SCOTT.TABLE3,2622,,
SCOTT.TABLE4,22,2773,
SCOTT.TABLE5,,21,
if you have gawk
you can simplify it more with sorted-in
UPDATE For the revised question, the header needs to be known in advance since the keys might be completely missing. This simplifies the problem and the following script should do the trick.
$ awk -v header='Table,count,size' \
'BEGIN{OFS=","; n=split(header,h,OFS); print header}
{for(i=1; i<NF; i+=2) v[NR,$i]=$(i+1)}
END{for(i=1; i<=NR; i++)
{printf "%s", v[i,h[1]];
for(j=2; j<=n; j++) printf "%s", OFS v[i,h[j]];
print ""}}' file
Convert key:value to CSV file
The problem with the original script are these lines:
NF == 0 {printline(); delete data}
END {printline()}
The first line means: Call printline() if the current line has no records. The second line means call printline()
after all data has been processed.
The difficulty with the input data format is that it does not really give a good indicator when to output the next record. In the following, I have simply changed the script to output the data every six records. In case there can be duplicate keys, the criterion for output might be "all fields populated" or such which would need to be programmed slightly differently.
#!/bin/sh -e
awk -F ":" -v OFS="," '
BEGIN {
records_in = 0
print "category","recommenderSubtype", "resource", "matchesPattern", "resource", "value"
}
{
data[$1] = $2
records_in++
if(records_in == 6) {
records_in = 0;
print data["category"], data["recommenderSubtype"], data["resource"], data["matchesPattern"], data["resource"], data["value"]
}
}
' file.yaml
Other commends
- I have just removed the
delete
statement, because I am unsure what it does. The POSIX specification forawk
only defines it for deleting single array elements. In case the whole array should be deleted, it recommends doing a loop over the elements. In case all fields are always present, however, it might as well be possible to eliminate it altogether. - Welcome to SO (I am new here as well). Next time you are asking, I would recommend tagging the question
awk
rather thanbash
because AWK is really the scripting language used in this question withbash
only being responsible for callingawk
with suitable parameters :)
shell script: convert raw output (from dict lines into csv)
This might work for you (GNU sed):
sed 'N;s/\n"Country":/,/;s/ "data":{/\n/;s/"data":{\|"myID"://g;s/ *"Country":/,/g;s/"//g;P;D' file
From the data provided, this sed solution provides the stated format but some how I think that there may be edge cases not accounted for.
How can I convert a file consisting of key/value pairs to a CSV?
My thought is that URL:
is used as RS
, so we can get three records from abc.txt. Then only print odd field of one record. Hope it helps you out.
:)
$ cat abc.awk
BEGIN{
RS="URL:";
}
{
for(i=1;i<=NF;i+=2)
{
if (i<=3)
printf("%s\t", $i)
else if (i>3 && i<NF)
printf("%s, ", $i)
else
printf("%s\n", $i)
}
}
$ awk -f abc.awk abc.txt
bbc.com 10.10.10.5#53 1.1.1.1, 6.6.6.6
cdn.com 10.10.10.10#53 2.2.2.2
ngo.com 10.10.10.5#53 3.3.3.3, 4.4.4.4, 5.5.5.5
How do I convert key value paired list into table with columns using AWK?
Your question isn't clear but this MAY be what you're looking for:
$ cat tst.awk
BEGIN { RS=""; FS="\n"; OFS=","; ofmt="\"%s\"%s" }
{
for (i=1; i<=NF; i++) {
tag = val = $i
sub(/[[:space:]].*/,"",tag)
sub(/[^[:space:]]+[[:space:]]+/,"",val)
tags[i] = tag
vals[i] = val
}
}
NR==1 {
for (i=1; i<=NF; i++) {
printf ofmt, tags[i], (i<NF ? OFS : ORS)
}
}
{
for (i=1; i<=NF; i++) {
printf ofmt, vals[i], (i<NF ? OFS : ORS)
}
}
$ awk -f tst.awk file
"part_no","date_part","history_code","user_id","other_information","pool_no"
"100000001","2010-10-13 12:12:12","ABCD","rsmith","note: Monday, December 10","101011777"
"100000002","2010-10-21 12:12:12","GHIJ","jsmith","other_information","101011888"
"100000002","2010-10-27 12:12:12","LMNO","fevers","[Mail]","101011999"
"100000003","2010-11-13 12:12:12","QXRT","sjohnson","note: Tuesday, August 31","101011111"
Generate Key value from a csv file in bash
Given the test file
header1,header2,header3,header4
value1,val2,val3,val4
a,b,c,d
We can do
declare -A map
{
# read the first line
IFS=, read -ra headers
# iterate over the rest of the lines
while IFS=, read -ra values; do
# create the mapping
for i in "${!headers[@]}"; do
map["${headers[i]}"]=${values[i]}
done
# do something with the map
declare -p map
done
} < file.csv
outputs
declare -A map='([header4]="val4" [header1]="value1" [header3]="val3" [header2]="val2" )'
declare -A map='([header4]="d" [header1]="a" [header3]="c" [header2]="b" )'
But that seems needlessly complicated. It may be sufficient to have the common indices between the headers
and values
arrays.
Related Topics
How to Do an Initial Setup of Slapd Olc with Ldapmodify
Libnetfilter_Queue Programming, How to Know Which Program Send The Packet
How to Pass Local Shell Script Variable to Expect
How Does Apparmor Handle Linux-Kernel Mount Namespaces
Linux Read Whitespaces and Special Characters
Why Do My Keystrokes Turn into Crazy Characters After I Dump a Bunch of Binary Data into My Terminal
Insecure $Env{Path} While Running with - T Switch
How to Source a Simple Bash Script
/Usr/Bin/Ld: Cannot Find -Lemu
Printing Floating Point Numbers in Assembler
Bash - While Read Line from File Print First and Second Column
Command Execution with Nohup in Background
Linux Select() and Fifo Ordering of Multiple Sockets
Udp Broadcast Sendto Failed:"Network Is Unreachable" on Linux 2.6.30
How to Extract Value from JSON Contained in a Variable Using Jq in Bash