Format and Then Convert Txt to CSV Using Shell Script and Awk

Format and then convert txt to csv using shell script and awk

You may use this awk:

awk -v OFS=, '{k=$1 OFS $2 OFS $3}
!($4 in hdr){hn[++h]=$4; hdr[$4]}
k in row{row[k]=row[k] OFS $5; next}
{rn[++n]=k; row[k]=$5}
END {
printf "%s", rn[1]
for(i=1; i<=h; i++)
printf "%s", OFS hn[i]
print ""
for (i=2; i<=n; i++)
print rn[i], row[rn[i]]
}' file

x,y,z,t,01hr01Jan2018,02hr01Jan2018,03hr01Jan2018
1,1,5,3,3.1,3.2
1,3.4,3,4.1,6.1,1.1
1,4.2,6,6.33,8.33,5.33
3.4,1,2,3.5,5.65,3.66
3.4,3.4,4,6.32,9.32,12.32
3.4,4.2,8.1,7.43,7.93,5.43
4.2,1,3.4,6.12,7.15,9.12
4.2,3.4,5.5,2.2,3.42,3.21
4.2,4.2,6.2,1.3,3.4,1

Convert text file to csv using shell script

I used awk.
The separator is a tabulator because it's more common than a comma in the CSV format.
If you want a coma, you can simply change the \t -> ,.

cat InputFile.txt | \
awk '
BEGIN{print "Source\tResource\tUser\tExitCode"}
/^JOB/{i=0}
/^\s/{
i++;
match($0,/\s*[a-zA-Z]* /);
a[i]=substr($0,RLENGTH+RPOS)}
/^EndJob/{for(i=1;i<5;i++) printf "%s\t",a[i];print ""}'
  • The first line BEGIN writes header.
  • The second line matches /JOB/ and only sets an iterator i as zero.
  • The third line matches the blank on the start of a line and fills array a with values (it count on strict count and order of rows).
  • The fourth part of the awk script matches EndJob and prints stored values.

Output:































SourceResourceUserExitCode
C://files/InputFile0 ACGuest0 Success
C://files/1 ADCurrent1 Fail
C://files/Input/3 AEGuest20 Success

Convert .txt to .csv in shell

awk may be a bit of an overkill here. IMHO, using tr for straight-forward substitutions like this is much simpler:

$ cat ifile.txt | tr -s '[:blank:]' ',' > ofile.txt

Convert .txt file to .csv with header in bash

This should do it, if your fields don't contain any funny business:

(echo "Date;Visit;Login;Euro;Rate" ; cat file.txt) | sed 's/;/<tab>/g' > file.csv

You'll just have to type a tab literally in bash (^V TAB). If your version of sed supports it, you can write\t instead of a literal tab.

(sed/awk) extract values text file and write to csv (no pattern)

awk's basic structure is:

  1. read a record from the input (by default a record is a line)
  2. evaluate conditions
  3. apply actions

The record is split into fields (by default based on whitespace as the separator).
The fields are referenced by their position, starting at 1. $1 is the first field, $2 is the second.
The last field is referenced by a variable named NF for "number of fields." $NF is the last field, $(NF-1) is the second-to-last field, etc.

A "BEGIN" section will be executed before any input file is read, and it can be used to initialize variables (which are implicitly initialized to 0).

BEGIN {
counter = 1
OFS = "," # This is the output field separator used by the print statement
print "file", "start", "stop", "epoch", "run" # Print the header line
}

/start value/ {
startValue = $NF # when a line contains "start value" store the last field as startValue
}

/epoch/ {
epoch = $NF
}

/stop value/ {
stopValue = $NF

# we have everything to print our line
print FILENAME, startValue, stopValue, epoch, counter
counter = counter + 1
startValue = "" # clear variables so they aren't maintained through the next iteration
epoch = ""
}

Save that as processor.awk and invoke as:

awk -f processor.awk my_file_1.txt my_file_2.txt my_file_3.txt > output.csv

Create CSV file from a text file with header tokens using shell scripting

According to the documentation of COPY, PostgreSQL fully supports the CSV format, and a Text format which is compatible with the lossless TSV format.

Because I'm using awk, I choose to generate a TSV. The reason is that there are newlines in the data and POSIX awk doesn't allow having literal newlines in a user defined variable. A TSV doesn't have this problem because you have to replace the literal newlines with their C‑style notation \n.

Also, I changed the input format for making it easier to parse. The new rule is that one or more empty line(s) delimit the records, which means that you can't have empty lines in the content of Summary or Article Body; the work-around is to add a single space character, like I did in the example.


Input example:

Title:
Article title

Word Count:
100

Summary:
Article summary.

Can consist of multiple lines.

Keywords:
keyword1, keyword2, keyword3


Article Body:
The rest of the article body.

Till the end of the file.

And here's the awk command, which accepts multiple files as argument:

edit: added TSV escaping for the header / added basic comments / reduced code size

awk -v RS='' -v FS='^$' -v OFS='\t' '
FNR == 1 { ++onr } # the current file number is our "output record number"
/^[^:\n]+:/ {
# lossless TSV escaping
gsub(/\\/,"\\\\")
gsub(/\n/,"\\n")
gsub(/\r/,"\\r")
gsub(/\t/,"\\t")

# get the current field name
id = substr($0,1,index($0,":")-1)

# strip the first line (NOTE: the newline character is escaped)
sub(/^(\\[^n]|[^\\])*\\n/,"")

# save the data
fields[id] # keep track of the field names that we came across
records[0,id] = id # for the header line
records[onr,id] = $0 # for the output record
}
END {
# print the header (onr == 0) and the records (onr >= 1)
for (i = 0; i <= onr; i++) {
out = sep = ""
for (id in fields) {
out = out sep records[i,id]
sep = OFS
}
print out
}
}
' *.txt

Then the output (I replaced all the literal tabs with | for better legibility):

Summary | Article Body | Word Count | Title | Keywords
Article summary.\n \nCan consist of multiple lines. | The rest of the article body.\n \nTill the end of the file. | 100 | Article title | keyword1, keyword2, keyword3

Postscript: Once you got a valid TSV file, you can use a tool like mlr to convert it to CSV, JSON, etc... but for the purpose of importing the data in postgreSQL, it isn't required.

The SQL statement will be this (untested):

COPY table_name FROM '/path/file.tsv' WITH HEADER;

remark: You don't need to specify the FORMAT and the DELIMITER because the defaults are already text and \t

How to convert a .txt file into .csv using AWK

Here's how to approach your problem (and assuming a blank line or line with just | in the input indicates the end of the MGP section):

$ cat tst.awk
sub(/^[[:space:]]*MGP[^|]+[|][[:space:]]*/,"") { inMgp=1 }
inMgp {
sub(/[[:space:]]*[|][[:space:]]*$/,"")
if ( NF ) {
data = data $0
}
else {
gsub(/[[:space:]]*[|][[:space:]]*/,"|",data)
print data
inMgp = 0
}
}

$ awk -f tst.awk file
8,625|4.027,000|96.648,000|-|96.648,000

How to format a TXT file into a structured CSV file in bash?

You can use awk:

awk 'NR%2{printf "%s,",$0;next;}1' file.txt > file.csv

Converting text data file to csv format via shell/bash

This line should help:

awk 'BEGIN{FS=":|\n";RS="Gender";OFS=",";print "Gender,Age,History"}$0{print $2,$4,$6}' file

With your example as input, it gives:

Gender,Age,History
M, 46, 01305
F, 46, 01306
M, 19, 01307
M, 19, 01308


Related Topics



Leave a reply



Submit