How to Parse CSV Files on the Linux Command Line

How can I parse CSV files on the Linux command line?

My FOSS CSV stream editor CSVfix does exactly what you want. There is a binary installer for Windows, and a compilable version (via a makefile) for UNIX/Linux.

How to parse a CSV file in Bash?

You need to use IFS instead of -d:

while IFS=, read -r col1 col2
do
    echo "I got:$col1|$col2"
done < myfile.csv

Note that for general purpose CSV parsing you should use a specialized tool which can handle quoted fields with internal commas, among other issues that Bash can't handle by itself. Examples of such tools are cvstool and csvkit.

How to read a .csv file with shell command?

grep "my_string" file |awk -F ";" '{print $5}'

awk -F ";" '/my_string/ {print $5}' file

For 2nd column:

awk -F ";" '$2 ~ /my_string/ {print $5}' file

For exact matching:

awk -F ";" '$2 == "my_string" {print $5}' file

View tabular file such as CSV from command line

You can also use this:

column -s, -t < somefile.csv | less -#2 -N -S

column is a standard unix program that is very convenient -- it finds the appropriate width of each column, and displays the text as a nicely formatted table.

Note: whenever you have empty fields, you need to put some kind of placeholder in it, otherwise the column gets merged with following columns. The following example demonstrates how to use sed to insert a placeholder:

$ cat data.csv
1,2,3,4,5
1,,,,5
$ sed 's/,,/, ,/g;s/,,/, ,/g' data.csv | column -s, -t
1  2  3  4  5
1           5
$ cat data.csv
1,2,3,4,5
1,,,,5
$ column -s, -t < data.csv
1  2  3  4  5
1  5
$ sed 's/,,/, ,/g;s/,,/, ,/g' data.csv | column -s, -t
1  2  3  4  5
1           5

Note that the substitution of ,, for , , is done twice. If you do it only once, 1,,,4 will become 1, ,,4 since the second comma is matched already.

Bash: Parse CSV and edit cell values

See Why is using a shell loop to process text considered bad practice?

As question is tagged linux, assuming GNU sed is available. And also that the input is actually csv, not space/tab separated

$ cat ip.csv 
ID,Location,Way,Day,DayTime,NightTime,StandNo
1,abc,Up,mon,6.00,18.00,6
2,xyz,down,TUE,2.32,5.23,4

$ sed '2,$ {s/[^,]*/\L\u&/4; s/[^,]*/\U&/3; s/[^,]*/\U&/2}' ip.csv 
ID,Location,Way,Day,DayTime,NightTime,StandNo
1,ABC,UP,Mon,6.00,18.00,6
2,XYZ,DOWN,Tue,2.32,5.23,4

2,$ to process input from 2nd line to end of file
s/[^,]*/\L\u&/4 capitalize only first letter of 4th field
s/[^,]*/\U&/3 capitalize all letters in 3rd field
s/[^,]*/\U&/2 capitalize all letters in 2nd field

If the fields themselves can contain , within double quotes and so on, use perl, python, etc which has csv modules

How to parse a CSV in a Bash script?

First prototype using plain old grep and cut:

grep "${VALUE}" inputfile.csv | cut -d, -f"${INDEX}"

If that's fast enough and gives the proper output, you're done.

What method should I use to parse csv files in bash

This is repeating a deleted but correct answer

IFS=";" read -r -a array < Input.csv
declare -p array

That reads the first line of the input file, splits on semi-colons and stores into the array variable named array.

The -r option for the read command means any backslashes in the input are handled as literal characters, not as introducing an escape sequence.

The -a option reads the words from the input into an array named by the gived variable name.

At a bash prompt, type help declare and help read.

Also find a bash tutorial that talks about the effect of IFS on the read command, for example BashGuide

The bash tag info page has tons of resources.

Bash / Shell: Parsing CSV file in bash script and skipping first line

OP hasn't (yet) provided any sample input data nor the desired output so some assumptions:

data values could be integer or reals, positive or negative
the user wants the average for each line (no need to calculate an average for the entire file)

Some sample data:

$ cat user-list.txt
a,b,c,d,e,f,g,h
1,id1,3,4,5,6,7
2,id2,13,14.233,15,16,17
3,id2,3.2,4.3,5.9233,6.0,7.32
4,id4,-3.2,4.3,-15.3,96.0,7.32

One awk solution:

$ awk -F"," 'FNR>=2 { printf "%s %10.3f\n", $2, ($3+$4+$5+$6+$7)/5.0 }' user-list.txt

Where:

-F"," - use comma as input field separator
FNR>=2 - skip the first line of the file
printf "%s %10.3f\n" - print field 2 using %s format; print the average using %10.3f format (width of 10 w/ max of 6 digits to left of decimal plus the decimal plus 3 digits to the right of the decimal); append a linefeed (\n) on the end

The above generates:

id1      5.000
id2     15.047
id2      5.349
id4     17.824

OP has added a new requirement ... sort the output by the calculated averages however, there are a few potential issues that need further input from the OP:

Can a userID show up more than once in the data file?
If a userID can show up more than once then do we need to generate a single line of output for each unique userID or do we generate separate lines for each occurrence of a userID?
Is the data to be sorted in ascending or descending order?

For now I'm going to assume:

A userID may show up more than once in the source data (eg, as with id2 in my sample data set - above).
We will not combine multiple lines for a given userID (ie, each line will stand on its own).
We'll show sorting in both ascending and descending order.

While the sorting can be done within awk I'm going to opt for piping the awk output to sort as this will require a bit less code and (imo) be a bit easier to understand.

Ascending sort:

$ awk -F"," 'FNR>=2 { printf "%s %10.3f\n", $2, ($3+$4+$5+$6+$7)/5.0 }' user-list.txt | sort -nk2
id1      5.000
id2      5.349
id2     15.047
id4     17.824

Where sort -nk2 says to sort by column #2 using a numeric sort.

Descending sort:

$ awk -F"," 'FNR>=2 { printf "%s %10.3f\n", $2, ($3+$4+$5+$6+$7)/5.0 }' user-list.txt | sort -rnk2
id4     17.824
id2     15.047
id2      5.349
id1      5.000

Where sort -rnk2 says to sort by column #2 using a numeric sort but to reverse the order