fastest way convert tab-delimited file to csv in linux
If all you need to do is translate all tab characters to comma characters, tr
is probably the way to go.
The blank space here is a literal tab:
$ echo "hello world" | tr "\\t" ","
hello,world
Of course, if you have embedded tabs inside string literals in the file, this will incorrectly translate those as well; but embedded literal tabs would be fairly uncommon.
unable to convert tab delimited .txt file to csv
It seems you have mix of tabs and spaces
cut -f 1,2,3 < input.txt | tr -s [:blank:] ','
Here tr
will collapse all white space to a single character and then replace it with comma. You also do not need cat
, but you can use it if you prefer it that way :)
How do I convert a tab-separated values (TSV) file to a comma-separated values (CSV) file in BASH?
Update: The following solutions are not generally robust, although they do work in the OP's specific use case; see the bottom section for a robust, awk
-based solution.
To summarize the options (interestingly, they all perform about the same):
tr:
devnull's solution (provided in a comment on the question) is the simplest:
tr '\t' ',' < file.tsv > file.csv
sed:
The OP's own sed
solution is perfectly fine, given that the input contains no quoted strings (with potentially embedded \t
chars.):
sed 's/\t/,/g' file.tsv > file.csv
The only caveat is that on some platforms (e.g., macOS) the escape sequence \t
is not supported, so a literal tab char. must be spliced into the command string using ANSI quoting ($'\t'
):
sed 's/'$'\t''/,/g' file.tsv > file.csv
awk:
The caveat with awk
is that FS
- the input field separator - must be set to \t
explicitly - the default behavior would otherwise strip leading and trailing tabs and replace interior spans of multiple tabs with only a single ,
:
awk 'BEGIN { FS="\t"; OFS="," } {$1=$1; print}' file.tsv > file.csv
Note that simply assigning $1
to itself causes awk
to rebuild the input line using OFS
- the output field separator; this effectively replaces all \t
chars. with ,
chars. print
then simply prints the rebuilt line.
Robust awk
solution:
As A. Rabus points out, the above solutions do not handle unquoted input fields that themselves contain ,
characters correctly - you'll end up with extra CSV fields.
The following awk
solution fixes this, by enclosing such fields in "..."
on demand (see the non-robust awk
solution above for a partial explanation of the approach).
If such fields also have embedded "
chars., these are escaped as ""
, in line with RFC 4180.Thanks, Wyatt Israel.
awk 'BEGIN { FS="\t"; OFS="," } {
rebuilt=0
for(i=1; i<=NF; ++i) {
if ($i ~ /,/ && $i !~ /^".*"$/) {
gsub("\"", "\"\"", $i)
$i = "\"" $i "\""
rebuilt=1
}
}
if (!rebuilt) { $1=$1 }
print
}' file.tsv > file.csv
$i ~ /[,"]/ && $i !~ /^".*"$/
detects any field that contains,
and/or"
and isn't already enclosed in double quotesgsub("\"", "\"\"", $i)
escapes embedded"
chars. by doubling them$i = "\"" $i "\""
updates the result by enclosing it in double quotesAs stated before, updating any field causes
awk
to rebuild the line from the fields with theOFS
value, i.e.,,
in this case, which amounts to the effective TSV -> CSV conversion; flagrebuilt
is used to ensure that each input record is rebuilt at least once.
Convert tab separated value CSV to comma separated values in R?
Read it in using readLines, convert the tabs to commas and write it out:
writeLines(gsub("\t", ",", readLines("myfile.tab")), "myfile.csv")
Python - Convert tab delimited file into csv in a specific manner
This should work for you. Note that this uses the csv library for both input and output, we just change the text delimiter. CSV should automatically escape your quote characters when you write the file.
import csv
try:
with open(r'input.tsv', 'r', newline='\n') as in_f, \
open(r'output.csv', 'w', newline='\n') as out_f:
reader = csv.reader(in_f, delimiter='\t')
writer = csv.writer(out_f, delimiter=',', quoting=csv.QUOTE_ALL) # Quoting added per comment from @Rob.
for li in reader:
try:
writer.writerow([li[0], li[1], li[2], li[7], li[8], li[9],])
except IndexError: # Prevent errors on blank lines.
pass
except IOError as err:
print(err)
I wasn't able to parse out where the tabs should be in your sample data (as opposed to spaces), but testing it with the following data for input.tsv
:
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
Will generate the following results in output.csv
:
"1","2","3","8","9","10"
"11","12","13","18","19","20"
"21","22","23","28","29","30"
Update
Note that the update in the code to add quoting=csv.QUOTE_ALL
was per a suggestion in the comments from Rob. Thanks for the catch!
How to Convert a tab delimited file with commas in values to .CSV and the values with commas to be enclosed in double quotes?
This will produce the output you asked for, but it's not clear if the criteria I'm assuming to be true for which fields to put in quotes (any containing a comma or a space), for example, is actually what you want so test it yourself with other input to see:
$ awk 'BEGIN { FS=OFS="\t" }
{
gsub(/"/,"")
for (i=1;i<=NF;i++)
if ($i ~ /[,[:space:]]/)
$i = "\"" $i "\""
gsub(OFS,",")
print
}
' file
column1,column2,column3,column4,column5,column6,column7
12,455,"string with quotes, and with a comma in between",4432,6787,890,88
4432,6787,"another, string with quotes, and with two comma in between",890,88,12,455
11,22,"simple string",77,777,333,22
(sed/awk) How to convert a field-delimited (like a csv) file into a txt with fixed-sized tab-delimited columns?
awk -F@ '{for(i=1;i<=NF;i++){printf "%-20s", $i};printf "\n"}' input.csv
Input
$ cat input.csv
1254343123@John@Smith@24@Engineer@Washington
23@Alexander@Kristofferson-Brown@35@Economic Advisor@Kent
Output
$ awk -F@ '{for(i=1;i<=NF;i++){printf "%-20s", $i};printf "\n"}' input.csv
1254343123 John Smith 24 Engineer Washington
23 Alexander Kristofferson-Brown 35 Economic Advisor Kent
If you want to make the field width (20 in the code above) a shell variable that can be passed in you do something like this:
#!/bin/bash
fldwth=20
awk -v fw=$fldwth -F@ '{for(i=1;i<=NF;i++){printf "%-*s", fw,$i};printf "\n"}' input.csv
Related Topics
Linux Terminal Input: Reading User Input from Terminal Truncating Lines at 4095 Character Limit
Init Function Invocation of Drivers Compiled into Kernel
Why Does Sed Fail with International Characters and How to Fix
How to Attach a File Using Mail Command on Linux
Bash Capturing Output of Awk into Array
Why Do We Need a Bootloader in an Embedded Device
How to Have Simple and Double Quotes in a Scripted Ssh Command
Why Does Printf Overwrite the Ecx Register
Are There Standards for Linux Command Line Switches and Arguments
Best Way to Find Os Name and Version in Unix/Linux Platform
Is /Usr/Local/Lib Searched for Shared Libraries
Generating a Sha-256 Hash from the Linux Command Line
How to Parse CSV Files on the Linux Command Line
Linux Command to Check If a Shell Script Is Running or Not