Format and then convert txt to csv using shell script and awk
You may use this awk
:
awk -v OFS=, '{k=$1 OFS $2 OFS $3}
!($4 in hdr){hn[++h]=$4; hdr[$4]}
k in row{row[k]=row[k] OFS $5; next}
{rn[++n]=k; row[k]=$5}
END {
printf "%s", rn[1]
for(i=1; i<=h; i++)
printf "%s", OFS hn[i]
print ""
for (i=2; i<=n; i++)
print rn[i], row[rn[i]]
}' file
x,y,z,t,01hr01Jan2018,02hr01Jan2018,03hr01Jan2018
1,1,5,3,3.1,3.2
1,3.4,3,4.1,6.1,1.1
1,4.2,6,6.33,8.33,5.33
3.4,1,2,3.5,5.65,3.66
3.4,3.4,4,6.32,9.32,12.32
3.4,4.2,8.1,7.43,7.93,5.43
4.2,1,3.4,6.12,7.15,9.12
4.2,3.4,5.5,2.2,3.42,3.21
4.2,4.2,6.2,1.3,3.4,1
Convert text file to csv using shell script
I used awk.
The separator is a tabulator because it's more common than a comma in the CSV format.
If you want a coma, you can simply change the \t
-> ,
.
cat InputFile.txt | \
awk '
BEGIN{print "Source\tResource\tUser\tExitCode"}
/^JOB/{i=0}
/^\s/{
i++;
match($0,/\s*[a-zA-Z]* /);
a[i]=substr($0,RLENGTH+RPOS)}
/^EndJob/{for(i=1;i<5;i++) printf "%s\t",a[i];print ""}'
- The first line
BEGIN
writes header. - The second line matches
/JOB/
and only sets an iteratori
as zero. - The third line matches the blank on the start of a line and fills array
a
with values (it count on strict count and order of rows). - The fourth part of the awk script matches
EndJob
and prints stored values.
Output:
Source | Resource | User | ExitCode |
---|---|---|---|
C://files/InputFile | 0 AC | Guest | 0 Success |
C://files/ | 1 AD | Current | 1 Fail |
C://files/Input/ | 3 AE | Guest2 | 0 Success |
Convert .txt to .csv in shell
awk
may be a bit of an overkill here. IMHO, using tr
for straight-forward substitutions like this is much simpler:
$ cat ifile.txt | tr -s '[:blank:]' ',' > ofile.txt
Convert .txt file to .csv with header in bash
This should do it, if your fields don't contain any funny business:
(echo "Date;Visit;Login;Euro;Rate" ; cat file.txt) | sed 's/;/<tab>/g' > file.csv
You'll just have to type a tab literally in bash (^V TAB). If your version of sed supports it, you can write\t
instead of a literal tab.
(sed/awk) extract values text file and write to csv (no pattern)
awk's basic structure is:
- read a record from the input (by default a record is a line)
- evaluate conditions
- apply actions
The record is split into fields (by default based on whitespace as the separator).
The fields are referenced by their position, starting at 1. $1 is the first field, $2 is the second.
The last field is referenced by a variable named NF for "number of fields." $NF is the last field, $(NF-1) is the second-to-last field, etc.
A "BEGIN" section will be executed before any input file is read, and it can be used to initialize variables (which are implicitly initialized to 0).
BEGIN {
counter = 1
OFS = "," # This is the output field separator used by the print statement
print "file", "start", "stop", "epoch", "run" # Print the header line
}
/start value/ {
startValue = $NF # when a line contains "start value" store the last field as startValue
}
/epoch/ {
epoch = $NF
}
/stop value/ {
stopValue = $NF
# we have everything to print our line
print FILENAME, startValue, stopValue, epoch, counter
counter = counter + 1
startValue = "" # clear variables so they aren't maintained through the next iteration
epoch = ""
}
Save that as processor.awk and invoke as:
awk -f processor.awk my_file_1.txt my_file_2.txt my_file_3.txt > output.csv
Create CSV file from a text file with header tokens using shell scripting
According to the documentation of COPY, PostgreSQL fully supports the CSV format, and a Text format which is compatible with the lossless TSV format.
Because I'm using awk
, I choose to generate a TSV. The reason is that there are newlines in the data and POSIX awk doesn't allow having literal newlines in a user defined variable. A TSV doesn't have this problem because you have to replace the literal newlines with their C‑style notation \n
.
Also, I changed the input format for making it easier to parse. The new rule is that one or more empty line(s) delimit the records, which means that you can't have empty lines in the content of Summary
or Article Body
; the work-around is to add a single space character, like I did in the example.
Input example:
Title:
Article title
Word Count:
100
Summary:
Article summary.
Can consist of multiple lines.
Keywords:
keyword1, keyword2, keyword3
Article Body:
The rest of the article body.
Till the end of the file.
And here's the awk
command, which accepts multiple files as argument:
edit: added TSV escaping for the header / added basic comments / reduced code size
awk -v RS='' -v FS='^$' -v OFS='\t' '
FNR == 1 { ++onr } # the current file number is our "output record number"
/^[^:\n]+:/ {
# lossless TSV escaping
gsub(/\\/,"\\\\")
gsub(/\n/,"\\n")
gsub(/\r/,"\\r")
gsub(/\t/,"\\t")
# get the current field name
id = substr($0,1,index($0,":")-1)
# strip the first line (NOTE: the newline character is escaped)
sub(/^(\\[^n]|[^\\])*\\n/,"")
# save the data
fields[id] # keep track of the field names that we came across
records[0,id] = id # for the header line
records[onr,id] = $0 # for the output record
}
END {
# print the header (onr == 0) and the records (onr >= 1)
for (i = 0; i <= onr; i++) {
out = sep = ""
for (id in fields) {
out = out sep records[i,id]
sep = OFS
}
print out
}
}
' *.txt
Then the output (I replaced all the literal tabs with |
for better legibility):
Summary | Article Body | Word Count | Title | Keywords
Article summary.\n \nCan consist of multiple lines. | The rest of the article body.\n \nTill the end of the file. | 100 | Article title | keyword1, keyword2, keyword3
Postscript: Once you got a valid TSV file, you can use a tool like mlr
to convert it to CSV, JSON, etc... but for the purpose of importing the data in postgreSQL, it isn't required.
The SQL statement will be this (untested):
COPY table_name FROM '/path/file.tsv' WITH HEADER;
remark: You don't need to specify the FORMAT
and the DELIMITER
because the defaults are already text
and \t
How to convert a .txt file into .csv using AWK
Here's how to approach your problem (and assuming a blank line or line with just |
in the input indicates the end of the MGP section):
$ cat tst.awk
sub(/^[[:space:]]*MGP[^|]+[|][[:space:]]*/,"") { inMgp=1 }
inMgp {
sub(/[[:space:]]*[|][[:space:]]*$/,"")
if ( NF ) {
data = data $0
}
else {
gsub(/[[:space:]]*[|][[:space:]]*/,"|",data)
print data
inMgp = 0
}
}
$ awk -f tst.awk file
8,625|4.027,000|96.648,000|-|96.648,000
How to format a TXT file into a structured CSV file in bash?
You can use awk
:
awk 'NR%2{printf "%s,",$0;next;}1' file.txt > file.csv
Converting text data file to csv format via shell/bash
This line should help:
awk 'BEGIN{FS=":|\n";RS="Gender";OFS=",";print "Gender,Age,History"}$0{print $2,$4,$6}' file
With your example as input, it gives:
Gender,Age,History
M, 46, 01305
F, 46, 01306
M, 19, 01307
M, 19, 01308
Related Topics
Finding Number Is Even/Odd in Assembly
How to Exclude a Directory When Using 'Find'
Linux Blocking Vs. Non Blocking Serial Read
What Does the Number in Parentheses Shown After Unix Command Names in Manpages Mean
Use Sudo With Password as Parameter
How to Determine Whether a Given Linux Is 32 Bit or 64 Bit
How to Replace a String in Multiple Files in Linux Command Line
Does "Argument List Too Long" Restriction Apply to Shell Builtins
How to Run a Shell Script on a Unix Console or MAC Terminal
Multiline Bash Command in Jenkins Pipeline
How to Store a Command in a Variable in a Shell Script
Deploying Yesod to Heroku, Can't Build Statically
Is There Any API For Determining the Physical Address from Virtual Address in Linux