How to parse a CSV file in Bash?
You need to use IFS
instead of -d
:
while IFS=, read -r col1 col2
do
echo "I got:$col1|$col2"
done < myfile.csv
Note that for general purpose CSV parsing you should use a specialized tool which can handle quoted fields with internal commas, among other issues that Bash can't handle by itself. Examples of such tools are cvstool
and csvkit
.
Bash / Shell: Parsing CSV file in bash script and skipping first line
OP hasn't (yet) provided any sample input data nor the desired output so some assumptions:
- data values could be integer or reals, positive or negative
- the user wants the average for each line (no need to calculate an average for the entire file)
Some sample data:
$ cat user-list.txt
a,b,c,d,e,f,g,h
1,id1,3,4,5,6,7
2,id2,13,14.233,15,16,17
3,id2,3.2,4.3,5.9233,6.0,7.32
4,id4,-3.2,4.3,-15.3,96.0,7.32
One awk
solution:
$ awk -F"," 'FNR>=2 { printf "%s %10.3f\n", $2, ($3+$4+$5+$6+$7)/5.0 }' user-list.txt
Where:
-F","
- use comma as input field separatorFNR>=2
- skip the first line of the fileprintf "%s %10.3f\n"
- print field 2 using%s
format; print the average using%10.3f
format (width of 10 w/ max of 6 digits to left of decimal plus the decimal plus 3 digits to the right of the decimal); append a linefeed (\n
) on the end
The above generates:
id1 5.000
id2 15.047
id2 5.349
id4 17.824
OP has added a new requirement ... sort the output by the calculated averages however, there are a few potential issues that need further input from the OP:
- Can a userID show up more than once in the data file?
- If a userID can show up more than once then do we need to generate a single line of output for each unique userID or do we generate separate lines for each occurrence of a userID?
- Is the data to be sorted in ascending or descending order?
For now I'm going to assume:
- A userID may show up more than once in the source data (eg, as with
id2
in my sample data set - above). - We will not combine multiple lines for a given userID (ie, each line will stand on its own).
- We'll show sorting in both ascending and descending order.
While the sorting can be done within awk
I'm going to opt for piping the awk
output to sort
as this will require a bit less code and (imo) be a bit easier to understand.
Ascending sort:
$ awk -F"," 'FNR>=2 { printf "%s %10.3f\n", $2, ($3+$4+$5+$6+$7)/5.0 }' user-list.txt | sort -nk2
id1 5.000
id2 5.349
id2 15.047
id4 17.824
Where sort -nk2
says to sort by column #2 using a n
umeric sort.
Descending sort:
$ awk -F"," 'FNR>=2 { printf "%s %10.3f\n", $2, ($3+$4+$5+$6+$7)/5.0 }' user-list.txt | sort -rnk2
id4 17.824
id2 15.047
id2 5.349
id1 5.000
Where sort -rnk2
says to sort by column #2 using a n
umeric sort but to r
everse the order
How to parse a CSV in a Bash script?
First prototype using plain old grep
and cut
:
grep "${VALUE}" inputfile.csv | cut -d, -f"${INDEX}"
If that's fast enough and gives the proper output, you're done.
Bash: Parse CSV and edit cell values
See Why is using a shell loop to process text considered bad practice?
As question is tagged linux
, assuming GNU sed
is available. And also that the input is actually csv
, not space/tab separated
$ cat ip.csv
ID,Location,Way,Day,DayTime,NightTime,StandNo
1,abc,Up,mon,6.00,18.00,6
2,xyz,down,TUE,2.32,5.23,4
$ sed '2,$ {s/[^,]*/\L\u&/4; s/[^,]*/\U&/3; s/[^,]*/\U&/2}' ip.csv
ID,Location,Way,Day,DayTime,NightTime,StandNo
1,ABC,UP,Mon,6.00,18.00,6
2,XYZ,DOWN,Tue,2.32,5.23,4
2,$
to process input from 2nd line to end of files/[^,]*/\L\u&/4
capitalize only first letter of 4th fields/[^,]*/\U&/3
capitalize all letters in 3rd fields/[^,]*/\U&/2
capitalize all letters in 2nd field
If the fields themselves can contain ,
within double quotes and so on, use perl
, python
, etc which has csv
modules
What method should I use to parse csv files in bash
This is repeating a deleted but correct answer
IFS=";" read -r -a array < Input.csv
declare -p array
That reads the first line of the input file, splits on semi-colons and stores into the array variable named array
.
The -r
option for the read command means any backslashes in the input are handled as literal characters, not as introducing an escape sequence.
The -a
option reads the words from the input into an array named by the gived variable name.
At a bash prompt, type help declare
and help read
.
Also find a bash tutorial that talks about the effect of IFS
on the read
command, for example BashGuide
The bash tag info page has tons of resources.
Parsing multiple CSV files in bash by pattern with counter
Could you please try following. Since no samples are given so couldn't test it. But this should be faster than a for
loop which traverse through all csv files and calls awk
in each iteration.
Following are the points taken care in this program:
- NO need to use a
for
loop to traverse through.csv
files, sinceawk
is capable of it. - OP's code is NOT taking care of getting
x
,y
values from file names I have added that logic too. - One could setup the output file name in
BEGIN
section of code as per need too.
awk -v max=0 '
BEGIN{
OFS=" , "
output_file="output.txt"
}
FNR==1{
if(want){
print output":"ORS want > (output_file)
}
split(FILENAME,array,"[-.]")
output=array[2] array[3]
want=max=""
}
{
if($1>max){
want=$2
max=$1
}
}
END{
print output":"ORS want > (output_file)
}
' *.csv
Typo fixed by OP
Related Topics
How to Find Which Position a Word Is in a String
How to Store a Command in a Variable in a Shell Script
How to Merge Two "Ar" Static Libraries into One
Can Windows Containers Be Hosted on Linux
Better Way to Rename Files Based on Multiple Patterns
How to Use Grep to Show Just Filenames on Linux
Retrieve Cpu Usage and Memory Usage of a Single Process on Linux
Connection Refused to Mongodb Errno 111
Shell/Bash Shortcut For Bulk Renaming of Files in a Folder
How Does Sigint Relate to the Other Termination Signals Such as Sigterm, Sigquit and Sigkill
Remove a Specific Character Using Awk or Sed
What Registers Are Preserved Through a Linux X86-64 Function Call
How Does "Cat ≪≪ Eof" Work in Bash
Difference Between Clock_Realtime and Clock_Monotonic
How to Obtain the Number of Cpus/Cores in Linux from the Command Line
Curl to Access a Page That Requires a Login from a Different Page