View Tabular File Such as CSV from Command Line

View tabular file such as CSV from command line

You can also use this:

column -s, -t < somefile.csv | less -#2 -N -S

column is a standard unix program that is very convenient -- it finds the appropriate width of each column, and displays the text as a nicely formatted table.

Note: whenever you have empty fields, you need to put some kind of placeholder in it, otherwise the column gets merged with following columns. The following example demonstrates how to use sed to insert a placeholder:

$ cat data.csv
1,2,3,4,5
1,,,,5
$ sed 's/,,/, ,/g;s/,,/, ,/g' data.csv | column -s, -t
1 2 3 4 5
1 5
$ cat data.csv
1,2,3,4,5
1,,,,5
$ column -s, -t < data.csv
1 2 3 4 5
1 5
$ sed 's/,,/, ,/g;s/,,/, ,/g' data.csv | column -s, -t
1 2 3 4 5
1 5

Note that the substitution of ,, for , , is done twice. If you do it only once, 1,,,4 will become 1, ,,4 since the second comma is matched already.

Commandline CSV viewer

It's the furthest thing from elegant, and it's probably something you've already thought of and are looking for a better solution, but I work around this annoyance by doing a series of sed replacements to put a whitespace in empty fields. I have these as functions in my bashrc...

csvcolumn() { sed -e "s/^$2/ $2/" -e "s/$2$/$2 /" -e "s/$2$2/$2 $2/g"  -e "s/$2$2/$2 $2/g" $1 | column -t -s$2 ; }
csvcomma() { sed -e 's/^,/ ,/' -e 's/,$/, /' -e 's/,,/, ,/g' -e 's/,,/, ,/g' $1 | column -t -s, ; }

The first one takes two args to be able to specify the delimiter character. The second is the same thing but it only takes one arg and assumes the delimiter is a comma since that's most often what I use anyway.

csvcolumn input.csv ,

or

csvcomma input.csv

Working with complex CSV from Linux command line

As long as your input doesn't contain columns with escaped embedded , chars., you should be able to parse it with awk, using , as the field separator; e.g.:

awk -F, '{ n = split($5, subField, "[[:blank:]]+"); for (i=1;i<=n;++i) print subField[i] }' file.csv

The above splits the 5th field into sub-fields by whitespace, using the split() function.

How to pretty print the csv which has long columns from command line?

I believe that csvlook is treating the tab characters in that column just like any other character, and doesn't know about their special behaviour.

The easiest way to get the columns to line up is to minimally expand the tabs:

expand -t1 nupic_out.csv | csvlook 

How to extract one column of a csv file

You could use awk for this. Change '$2' to the nth column you want.

awk -F "\"*,\"*" '{print $2}' textfile.csv

creating a bash function to print csv to screen


function dl {
wget -O stats.zip "http://...?DataFilter=itemCode:$1";
unzip -c stats.zip | columns -s, -t;
}
  1. You can use unzip's "-c" command to print the content of files
  2. Bash (and other shells) allow you to wrap statements into functions

Additionally I've used the column command in order to format the CSV output.

You can call the function with the ItemCode as parameter: dl 526

Viewing a very large CSV file?

You could try PostgreSQL 9.1+ and its file_fdw (File Foreign Data Wrapper) which would pretend that the CSV file is a table. If you replaced the CSV file with another CSV file of the same name, then you would see the new info immediately in the database.

You can improve performance by using a materialized view (PG 9.3+) which essentially creates a real database table from the CSV data. You could use pgAgent to refresh the materialized view on a schedule.

Another alternative would be to use the COPY statement:

/* the columns in this table are the same as the columns in your csv: */
create table if not exists my_csv (
some_field text, ...
);

/* COPY appends, so truncate the table if loading fresh data again: */
truncate table my_csv;

/*
you need to be a postgres superuser to use COPY
use psql \copy if you can't be superuser
put the csv file in /srv/vendor-name/
*/

copy
my_csv
from
'/srv/vendor-name/my.csv'
with (
format csv
);

How do I convert a tab-separated values (TSV) file to a comma-separated values (CSV) file in BASH?

Update: The following solutions are not generally robust, although they do work in the OP's specific use case; see the bottom section for a robust, awk-based solution.


To summarize the options (interestingly, they all perform about the same):

tr:

devnull's solution (provided in a comment on the question) is the simplest:

tr '\t' ',' < file.tsv > file.csv

sed:

The OP's own sed solution is perfectly fine, given that the input contains no quoted strings (with potentially embedded \t chars.):

sed 's/\t/,/g' file.tsv > file.csv

The only caveat is that on some platforms (e.g., macOS) the escape sequence \t is not supported, so a literal tab char. must be spliced into the command string using ANSI quoting ($'\t'):

sed 's/'$'\t''/,/g' file.tsv > file.csv

awk:

The caveat with awk is that FS - the input field separator - must be set to \t explicitly - the default behavior would otherwise strip leading and trailing tabs and replace interior spans of multiple tabs with only a single ,:

awk 'BEGIN { FS="\t"; OFS="," } {$1=$1; print}' file.tsv > file.csv

Note that simply assigning $1 to itself causes awk to rebuild the input line using OFS - the output field separator; this effectively replaces all \t chars. with , chars. print then simply prints the rebuilt line.


Robust awk solution:

As A. Rabus points out, the above solutions do not handle unquoted input fields that themselves contain , characters correctly - you'll end up with extra CSV fields.

The following awk solution fixes this, by enclosing such fields in "..." on demand (see the non-robust awk solution above for a partial explanation of the approach).

If such fields also have embedded " chars., these are escaped as "", in line with RFC 4180.Thanks, Wyatt Israel.

awk 'BEGIN { FS="\t"; OFS="," } {
rebuilt=0
for(i=1; i<=NF; ++i) {
if ($i ~ /,/ && $i !~ /^".*"$/) {
gsub("\"", "\"\"", $i)
$i = "\"" $i "\""
rebuilt=1
}
}
if (!rebuilt) { $1=$1 }
print
}' file.tsv > file.csv
  • $i ~ /[,"]/ && $i !~ /^".*"$/ detects any field that contains , and/or " and isn't already enclosed in double quotes

  • gsub("\"", "\"\"", $i) escapes embedded " chars. by doubling them

  • $i = "\"" $i "\"" updates the result by enclosing it in double quotes

  • As stated before, updating any field causes awk to rebuild the line from the fields with the OFS value, i.e., , in this case, which amounts to the effective TSV -> CSV conversion; flag rebuilt is used to ensure that each input record is rebuilt at least once.



Related Topics



Leave a reply



Submit