Bash: Transform Key-Value Lines to CSV Format

Bash: transform key-value lines to CSV format

A simple solution with cut, paste, and head (assumes input file file, outputs to file out.csv):

#!/usr/bin/env bash

{ cut -d':' -f1 file | head -n 3 | paste -d, - - -;
cut -d':' -f2- file | paste -d, - - -; } >out.csv
  • cut -d':' -f1 file | head -n 3 creates the header line:

    • cut -d':' -f1 file extracts the first :-based field from each input line, and head -n 3 stops after 3 lines, given that the headers repeat every 3 lines.

    • paste -d, - - - takes 3 input lines from stdin (one for each -) and combines them to a single, comma-separated output line (-d,)

  • cut -d':' -f2- file | paste -d, - - - creates the data lines:

    • cut -d':' -f2- file extracts everything after the : from each input line.

    • As above, paste then combines 3 values to a single, comma-separated output line.


agc points out in a comment that the column count (3) and the paste operands (- - -) are hard-coded above.

The following solution parameterizes the column count (set it via n=...):

{ n=3; pasteOperands=$(printf '%.s- ' $(seq $n)) 
cut -d':' -f1 file | head -n $n | paste -d, $pasteOperands;
cut -d':' -f2- file | paste -d, $pasteOperands; } >out.csv
  • printf '%.s- ' $(seq $n) is a trick that produces a list of as many space-separated - chars. as there are columns ($n).

While the previous solution is now parameterized, it still assumes that the column count is known in advance; the following solution dynamically determines the column count (requires Bash 4+ due to use of readarray, but could be made to work with Bash 3.x):

# Determine the unique list of column headers and
# read them into a Bash array.
readarray -t columnHeaders < <(awk -F: 'seen[$1]++ { exit } { print $1 }' file)

# Output the header line.
(IFS=','; echo "${columnHeaders[*]}") >out.csv

# Append the data lines.
cut -d':' -f2- file | paste -d, $(printf '%.s- ' $(seq ${#columnHeaders[@]})) >>out.csv
  • awk -F: 'seen[$1]++ { exit } { print $1 } outputs each input line's column name (the 1st :-separated field), remembers the column names in associative array seen, and stops at the first column name that is seen for the second time.

  • readarray -t columnHeaders reads awk's output line by line into array columnHeaders

  • (IFS=','; echo "${columnHeaders[*]}") >out.csv prints the array elements using a space as the separator (specified via $IFS); note the use of a subshell ((...)) so as to localize the effect of modifying $IFS, which would otherwise have global effects.

  • The cut ... pipeline uses the same approach as before, with the operands for paste being created based on the count of the elements of array columnHeaders (${#columnHeaders[@]}).


To wrap the above up in a function that outputs to stdout and also works with Bash 3.x:

toCsv() {

local file=$1 columnHeaders

# Determine the unique list of column headers and
# read them into a Bash array.
IFS=$'\n' read -d '' -ra columnHeaders < <(awk -F: 'seen[$1]++ { exit } { print $1 }' "$file")

# Output the header line.
(IFS=','; echo "${columnHeaders[*]}")

# Append the data lines.
cut -d':' -f2- "$file" | paste -d, $(printf '%.s- ' $(seq ${#columnHeaders[@]}))
}

# Sample invocation
toCsv file > out.csv

How to convert space separated key value data into CSV format in bash?

awk to the rescue!

$ awk -v OFS=',' '{for(i=1;i<NF;i+=2) 
{if(!($i in c)){c[$i];cols[++k]=$i};
v[NR,$i]=$(i+1)}}
END{for(i=1;i<=k;i++) printf "%s", cols[i] OFS;
print "";
for(i=1;i<=NR;i++)
{for(j=1;j<=k;j++) printf "%s", v[i,cols[j]] OFS;
print ""}}' file

Table,count,size,
SCOTT.TABLE1,3889,300,
SCOTT.TABLE2,7744,,
SCOTT.TABLE3,2622,,
SCOTT.TABLE4,22,2773,
SCOTT.TABLE5,,21,

if you have gawk you can simplify it more with sorted-in

UPDATE For the revised question, the header needs to be known in advance since the keys might be completely missing. This simplifies the problem and the following script should do the trick.

$ awk -v header='Table,count,size' \
'BEGIN{OFS=","; n=split(header,h,OFS); print header}
{for(i=1; i<NF; i+=2) v[NR,$i]=$(i+1)}
END{for(i=1; i<=NR; i++)
{printf "%s", v[i,h[1]];
for(j=2; j<=n; j++) printf "%s", OFS v[i,h[j]];
print ""}}' file

Convert key:value to CSV file

The problem with the original script are these lines:

NF == 0 {printline(); delete data}
END {printline()}

The first line means: Call printline() if the current line has no records. The second line means call printline() after all data has been processed.

The difficulty with the input data format is that it does not really give a good indicator when to output the next record. In the following, I have simply changed the script to output the data every six records. In case there can be duplicate keys, the criterion for output might be "all fields populated" or such which would need to be programmed slightly differently.

#!/bin/sh -e
awk -F ":" -v OFS="," '
BEGIN {
records_in = 0
print "category","recommenderSubtype", "resource", "matchesPattern", "resource", "value"
}
{
data[$1] = $2
records_in++
if(records_in == 6) {
records_in = 0;
print data["category"], data["recommenderSubtype"], data["resource"], data["matchesPattern"], data["resource"], data["value"]
}
}
' file.yaml

Other commends

  • I have just removed the delete statement, because I am unsure what it does. The POSIX specification for awk only defines it for deleting single array elements. In case the whole array should be deleted, it recommends doing a loop over the elements. In case all fields are always present, however, it might as well be possible to eliminate it altogether.
  • Welcome to SO (I am new here as well). Next time you are asking, I would recommend tagging the question awk rather than bash because AWK is really the scripting language used in this question with bash only being responsible for calling awk with suitable parameters :)

shell script: convert raw output (from dict lines into csv)

This might work for you (GNU sed):

sed 'N;s/\n"Country":/,/;s/ "data":{/\n/;s/"data":{\|"myID"://g;s/ *"Country":/,/g;s/"//g;P;D' file

From the data provided, this sed solution provides the stated format but some how I think that there may be edge cases not accounted for.

How can I convert a file consisting of key/value pairs to a CSV?

My thought is that URL: is used as RS, so we can get three records from abc.txt. Then only print odd field of one record. Hope it helps you out.
:)

$ cat  abc.awk
BEGIN{
RS="URL:";
}

{
for(i=1;i<=NF;i+=2)
{
if (i<=3)
printf("%s\t", $i)
else if (i>3 && i<NF)
printf("%s, ", $i)
else
printf("%s\n", $i)
}
}

$ awk -f abc.awk abc.txt
bbc.com 10.10.10.5#53 1.1.1.1, 6.6.6.6
cdn.com 10.10.10.10#53 2.2.2.2
ngo.com 10.10.10.5#53 3.3.3.3, 4.4.4.4, 5.5.5.5

How do I convert key value paired list into table with columns using AWK?

Your question isn't clear but this MAY be what you're looking for:

$ cat tst.awk
BEGIN { RS=""; FS="\n"; OFS=","; ofmt="\"%s\"%s" }
{
for (i=1; i<=NF; i++) {
tag = val = $i
sub(/[[:space:]].*/,"",tag)
sub(/[^[:space:]]+[[:space:]]+/,"",val)
tags[i] = tag
vals[i] = val
}
}
NR==1 {
for (i=1; i<=NF; i++) {
printf ofmt, tags[i], (i<NF ? OFS : ORS)
}
}
{
for (i=1; i<=NF; i++) {
printf ofmt, vals[i], (i<NF ? OFS : ORS)
}
}

$ awk -f tst.awk file
"part_no","date_part","history_code","user_id","other_information","pool_no"
"100000001","2010-10-13 12:12:12","ABCD","rsmith","note: Monday, December 10","101011777"
"100000002","2010-10-21 12:12:12","GHIJ","jsmith","other_information","101011888"
"100000002","2010-10-27 12:12:12","LMNO","fevers","[Mail]","101011999"
"100000003","2010-11-13 12:12:12","QXRT","sjohnson","note: Tuesday, August 31","101011111"

Generate Key value from a csv file in bash

Given the test file

header1,header2,header3,header4
value1,val2,val3,val4
a,b,c,d

We can do

declare -A map
{
# read the first line
IFS=, read -ra headers

# iterate over the rest of the lines
while IFS=, read -ra values; do
# create the mapping
for i in "${!headers[@]}"; do
map["${headers[i]}"]=${values[i]}
done
# do something with the map
declare -p map
done
} < file.csv

outputs

declare -A map='([header4]="val4" [header1]="value1" [header3]="val3" [header2]="val2" )'
declare -A map='([header4]="d" [header1]="a" [header3]="c" [header2]="b" )'

But that seems needlessly complicated. It may be sufficient to have the common indices between the headers and values arrays.



Related Topics



Leave a reply



Submit