Fastest Way Convert Tab-Delimited File to CSV in Linux

fastest way convert tab-delimited file to csv in linux

If all you need to do is translate all tab characters to comma characters, tr is probably the way to go.

The blank space here is a literal tab:

$ echo "hello   world" | tr "\\t" ","
hello,world

Of course, if you have embedded tabs inside string literals in the file, this will incorrectly translate those as well; but embedded literal tabs would be fairly uncommon.

unable to convert tab delimited .txt file to csv

It seems you have mix of tabs and spaces

cut -f 1,2,3 < input.txt | tr -s [:blank:] ','

Here tr will collapse all white space to a single character and then replace it with comma. You also do not need cat, but you can use it if you prefer it that way :)

How do I convert a tab-separated values (TSV) file to a comma-separated values (CSV) file in BASH?

Update: The following solutions are not generally robust, although they do work in the OP's specific use case; see the bottom section for a robust, awk-based solution.


To summarize the options (interestingly, they all perform about the same):

tr:

devnull's solution (provided in a comment on the question) is the simplest:

tr '\t' ',' < file.tsv > file.csv

sed:

The OP's own sed solution is perfectly fine, given that the input contains no quoted strings (with potentially embedded \t chars.):

sed 's/\t/,/g' file.tsv > file.csv

The only caveat is that on some platforms (e.g., macOS) the escape sequence \t is not supported, so a literal tab char. must be spliced into the command string using ANSI quoting ($'\t'):

sed 's/'$'\t''/,/g' file.tsv > file.csv

awk:

The caveat with awk is that FS - the input field separator - must be set to \t explicitly - the default behavior would otherwise strip leading and trailing tabs and replace interior spans of multiple tabs with only a single ,:

awk 'BEGIN { FS="\t"; OFS="," } {$1=$1; print}' file.tsv > file.csv

Note that simply assigning $1 to itself causes awk to rebuild the input line using OFS - the output field separator; this effectively replaces all \t chars. with , chars. print then simply prints the rebuilt line.


Robust awk solution:

As A. Rabus points out, the above solutions do not handle unquoted input fields that themselves contain , characters correctly - you'll end up with extra CSV fields.

The following awk solution fixes this, by enclosing such fields in "..." on demand (see the non-robust awk solution above for a partial explanation of the approach).

If such fields also have embedded " chars., these are escaped as "", in line with RFC 4180.Thanks, Wyatt Israel.

awk 'BEGIN { FS="\t"; OFS="," } {
rebuilt=0
for(i=1; i<=NF; ++i) {
if ($i ~ /,/ && $i !~ /^".*"$/) {
gsub("\"", "\"\"", $i)
$i = "\"" $i "\""
rebuilt=1
}
}
if (!rebuilt) { $1=$1 }
print
}' file.tsv > file.csv
  • $i ~ /[,"]/ && $i !~ /^".*"$/ detects any field that contains , and/or " and isn't already enclosed in double quotes

  • gsub("\"", "\"\"", $i) escapes embedded " chars. by doubling them

  • $i = "\"" $i "\"" updates the result by enclosing it in double quotes

  • As stated before, updating any field causes awk to rebuild the line from the fields with the OFS value, i.e., , in this case, which amounts to the effective TSV -> CSV conversion; flag rebuilt is used to ensure that each input record is rebuilt at least once.

Convert tab separated value CSV to comma separated values in R?

Read it in using readLines, convert the tabs to commas and write it out:

writeLines(gsub("\t", ",", readLines("myfile.tab")), "myfile.csv")

Python - Convert tab delimited file into csv in a specific manner

This should work for you. Note that this uses the csv library for both input and output, we just change the text delimiter. CSV should automatically escape your quote characters when you write the file.

import csv
try:
with open(r'input.tsv', 'r', newline='\n') as in_f, \
open(r'output.csv', 'w', newline='\n') as out_f:
reader = csv.reader(in_f, delimiter='\t')
writer = csv.writer(out_f, delimiter=',', quoting=csv.QUOTE_ALL) # Quoting added per comment from @Rob.
for li in reader:
try:
writer.writerow([li[0], li[1], li[2], li[7], li[8], li[9],])
except IndexError: # Prevent errors on blank lines.
pass
except IOError as err:
print(err)

I wasn't able to parse out where the tabs should be in your sample data (as opposed to spaces), but testing it with the following data for input.tsv:

1   2   3   4   5   6   7   8   9   10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30

Will generate the following results in output.csv:

"1","2","3","8","9","10"
"11","12","13","18","19","20"
"21","22","23","28","29","30"

Update

Note that the update in the code to add quoting=csv.QUOTE_ALL was per a suggestion in the comments from Rob. Thanks for the catch!

How to Convert a tab delimited file with commas in values to .CSV and the values with commas to be enclosed in double quotes?

This will produce the output you asked for, but it's not clear if the criteria I'm assuming to be true for which fields to put in quotes (any containing a comma or a space), for example, is actually what you want so test it yourself with other input to see:

$ awk 'BEGIN { FS=OFS="\t" }
{
gsub(/"/,"")
for (i=1;i<=NF;i++)
if ($i ~ /[,[:space:]]/)
$i = "\"" $i "\""
gsub(OFS,",")
print
}
' file
column1,column2,column3,column4,column5,column6,column7
12,455,"string with quotes, and with a comma in between",4432,6787,890,88
4432,6787,"another, string with quotes, and with two comma in between",890,88,12,455
11,22,"simple string",77,777,333,22

(sed/awk) How to convert a field-delimited (like a csv) file into a txt with fixed-sized tab-delimited columns?


awk -F@ '{for(i=1;i<=NF;i++){printf "%-20s", $i};printf "\n"}' input.csv

Input

$ cat input.csv
1254343123@John@Smith@24@Engineer@Washington
23@Alexander@Kristofferson-Brown@35@Economic Advisor@Kent

Output

$ awk -F@ '{for(i=1;i<=NF;i++){printf "%-20s", $i};printf "\n"}' input.csv
1254343123 John Smith 24 Engineer Washington
23 Alexander Kristofferson-Brown 35 Economic Advisor Kent

If you want to make the field width (20 in the code above) a shell variable that can be passed in you do something like this:

#!/bin/bash

fldwth=20

awk -v fw=$fldwth -F@ '{for(i=1;i<=NF;i++){printf "%-*s", fw,$i};printf "\n"}' input.csv


Related Topics



Leave a reply



Submit