How can I parse CSV files on the Linux command line?
My FOSS CSV stream editor CSVfix does exactly what you want. There is a binary installer for Windows, and a compilable version (via a makefile) for UNIX/Linux.
How to parse a CSV file in Bash?
You need to use IFS
instead of -d
:
while IFS=, read -r col1 col2
do
echo "I got:$col1|$col2"
done < myfile.csv
Note that for general purpose CSV parsing you should use a specialized tool which can handle quoted fields with internal commas, among other issues that Bash can't handle by itself. Examples of such tools are cvstool
and csvkit
.
How to read a .csv file with shell command?
grep "my_string" file |awk -F ";" '{print $5}'
or
awk -F ";" '/my_string/ {print $5}' file
For 2nd column:
awk -F ";" '$2 ~ /my_string/ {print $5}' file
For exact matching:
awk -F ";" '$2 == "my_string" {print $5}' file
View tabular file such as CSV from command line
You can also use this:
column -s, -t < somefile.csv | less -#2 -N -S
column
is a standard unix program that is very convenient -- it finds the appropriate width of each column, and displays the text as a nicely formatted table.
Note: whenever you have empty fields, you need to put some kind of placeholder in it, otherwise the column gets merged with following columns. The following example demonstrates how to use sed
to insert a placeholder:
$ cat data.csv
1,2,3,4,5
1,,,,5
$ sed 's/,,/, ,/g;s/,,/, ,/g' data.csv | column -s, -t
1 2 3 4 5
1 5
$ cat data.csv
1,2,3,4,5
1,,,,5
$ column -s, -t < data.csv
1 2 3 4 5
1 5
$ sed 's/,,/, ,/g;s/,,/, ,/g' data.csv | column -s, -t
1 2 3 4 5
1 5
Note that the substitution of ,,
for , ,
is done twice. If you do it only once, 1,,,4
will become 1, ,,4
since the second comma is matched already.
Bash: Parse CSV and edit cell values
See Why is using a shell loop to process text considered bad practice?
As question is tagged linux
, assuming GNU sed
is available. And also that the input is actually csv
, not space/tab separated
$ cat ip.csv
ID,Location,Way,Day,DayTime,NightTime,StandNo
1,abc,Up,mon,6.00,18.00,6
2,xyz,down,TUE,2.32,5.23,4
$ sed '2,$ {s/[^,]*/\L\u&/4; s/[^,]*/\U&/3; s/[^,]*/\U&/2}' ip.csv
ID,Location,Way,Day,DayTime,NightTime,StandNo
1,ABC,UP,Mon,6.00,18.00,6
2,XYZ,DOWN,Tue,2.32,5.23,4
2,$
to process input from 2nd line to end of files/[^,]*/\L\u&/4
capitalize only first letter of 4th fields/[^,]*/\U&/3
capitalize all letters in 3rd fields/[^,]*/\U&/2
capitalize all letters in 2nd field
If the fields themselves can contain ,
within double quotes and so on, use perl
, python
, etc which has csv
modules
How to parse a CSV in a Bash script?
First prototype using plain old grep
and cut
:
grep "${VALUE}" inputfile.csv | cut -d, -f"${INDEX}"
If that's fast enough and gives the proper output, you're done.
What method should I use to parse csv files in bash
This is repeating a deleted but correct answer
IFS=";" read -r -a array < Input.csv
declare -p array
That reads the first line of the input file, splits on semi-colons and stores into the array variable named array
.
The -r
option for the read command means any backslashes in the input are handled as literal characters, not as introducing an escape sequence.
The -a
option reads the words from the input into an array named by the gived variable name.
At a bash prompt, type help declare
and help read
.
Also find a bash tutorial that talks about the effect of IFS
on the read
command, for example BashGuide
The bash tag info page has tons of resources.
Bash / Shell: Parsing CSV file in bash script and skipping first line
OP hasn't (yet) provided any sample input data nor the desired output so some assumptions:
- data values could be integer or reals, positive or negative
- the user wants the average for each line (no need to calculate an average for the entire file)
Some sample data:
$ cat user-list.txt
a,b,c,d,e,f,g,h
1,id1,3,4,5,6,7
2,id2,13,14.233,15,16,17
3,id2,3.2,4.3,5.9233,6.0,7.32
4,id4,-3.2,4.3,-15.3,96.0,7.32
One awk
solution:
$ awk -F"," 'FNR>=2 { printf "%s %10.3f\n", $2, ($3+$4+$5+$6+$7)/5.0 }' user-list.txt
Where:
-F","
- use comma as input field separatorFNR>=2
- skip the first line of the fileprintf "%s %10.3f\n"
- print field 2 using%s
format; print the average using%10.3f
format (width of 10 w/ max of 6 digits to left of decimal plus the decimal plus 3 digits to the right of the decimal); append a linefeed (\n
) on the end
The above generates:
id1 5.000
id2 15.047
id2 5.349
id4 17.824
OP has added a new requirement ... sort the output by the calculated averages however, there are a few potential issues that need further input from the OP:
- Can a userID show up more than once in the data file?
- If a userID can show up more than once then do we need to generate a single line of output for each unique userID or do we generate separate lines for each occurrence of a userID?
- Is the data to be sorted in ascending or descending order?
For now I'm going to assume:
- A userID may show up more than once in the source data (eg, as with
id2
in my sample data set - above). - We will not combine multiple lines for a given userID (ie, each line will stand on its own).
- We'll show sorting in both ascending and descending order.
While the sorting can be done within awk
I'm going to opt for piping the awk
output to sort
as this will require a bit less code and (imo) be a bit easier to understand.
Ascending sort:
$ awk -F"," 'FNR>=2 { printf "%s %10.3f\n", $2, ($3+$4+$5+$6+$7)/5.0 }' user-list.txt | sort -nk2
id1 5.000
id2 5.349
id2 15.047
id4 17.824
Where sort -nk2
says to sort by column #2 using a n
umeric sort.
Descending sort:
$ awk -F"," 'FNR>=2 { printf "%s %10.3f\n", $2, ($3+$4+$5+$6+$7)/5.0 }' user-list.txt | sort -rnk2
id4 17.824
id2 15.047
id2 5.349
id1 5.000
Where sort -rnk2
says to sort by column #2 using a n
umeric sort but to r
everse the order
Related Topics
How to Execute a Remote Command Over Ssh with Arguments
How to See Which CPU Core a Thread Is Running In
How to Run a Script on Login in *Nix
Shell - Write Variable Contents to a File
Linux: Command to Open Url in Default Browser
How to Count Lines of Code Including Sub-Directories
How Would I Get a Cron Job to Run Every 30 Minutes
What's the Purpose of Each of the Different Uids a Process Can Have
How to Sort a File, Based on Its Numerical Values for a Field
Installing Git with Non-Root User Account
How to Remove All .Svn Directories from My Application Directories
Repeat Command Automatically in Linux
Getting Pid and Details for Topmost Window
Don't Add "+" to Linux Kernel Version
How to Avoid Transparent_Hugepage/Defrag Warning from Mongodb
Symbols from Convenience Library Not Getting Exported in Executable