Sort command in not working properly in unix for sorting a csv file
The below should work
awk 'NR<2{print $_;next}{ print $_ | "sort -t, -k3.8,3.11rn -k3.1,3.3rM -k3.5,3.6rn -k3.12rd" }'
The 'awk' snippet passes all lines except header to the sort command.
The order of the keys is important here :
k3.8,3.11rn
extracts the year part of the column and reverse sorts
k3.1,3.3rM
extracts the first 3 characters in column three to be reverse monthly sorted and the rest we do a reverse dictionary sort
k3.5,3.6rn
extracts the day and reverse sort and finally we do the same for time
How to sort this CSV file by date with the Unix sort command?
You don't need the n
— indeed, it is counterproductive. The dates are in ISO 8601 format, and they sort in time order when sorted alphanumerically. Numeric sorting only pays attention to the 2013 part of the field; the rest isn't part of a single number. You also don't need to worry about subsetting the time information — the fact that only some parts change won't matter.
You've given a very minimal data set with the pickup-time information already in sorted order, so we have to get a little inventive. The heading information won't sort numerically; you can remove it, or let it float around. To show that the sorting works when the data is sorted, I specify r
(reverse order). This puts the heading data at the top and reverses the two lines of actual data.
$ sort -t, -k6r data.file
medallion,hack_license,vendor_id,rate_code,store_and_fwd_flag,pickup_datetime,dropoff_datetime,passenger_count,trip_time_in_secs,trip_distance,pickup_longitude,pickup_latitude,dropoff_longitude,dropoff_latitude
B45D26A20BE724B0F752461C624233CB,B240D08915F9F593F219D9109127FF1A,VTS,1,,2013-01-16 19:26:00,2013-01-16 19:32:00,3,360,.67,-73.982338,40.768349,-73.981285,40.774017
A6699B6310BFDF8D1EE42C12622D94FA,66C6E65E8D6476B8DDA075A01D63E78A,VTS,1,,2013-01-16 19:21:00,2013-01-16 19:35:00,2,840,1.71,-73.986603,40.739986,-73.99221,40.719715
$
Or, in ascending order (the heading goes at the end):
$ sort -t, -k6 data.file
A6699B6310BFDF8D1EE42C12622D94FA,66C6E65E8D6476B8DDA075A01D63E78A,VTS,1,,2013-01-16 19:21:00,2013-01-16 19:35:00,2,840,1.71,-73.986603,40.739986,-73.99221,40.719715
B45D26A20BE724B0F752461C624233CB,B240D08915F9F593F219D9109127FF1A,VTS,1,,2013-01-16 19:26:00,2013-01-16 19:32:00,3,360,.67,-73.982338,40.768349,-73.981285,40.774017
medallion,hack_license,vendor_id,rate_code,store_and_fwd_flag,pickup_datetime,dropoff_datetime,passenger_count,trip_time_in_secs,trip_distance,pickup_longitude,pickup_latitude,dropoff_longitude,dropoff_latitude
$
Also, you can decide which dates are relevant and modify this grep
command to select the correct dates for the first week — which reduces the data size to about one quarter of its original size.
grep ',2013-01-0[1-7] [0-2][0-9]:[0-5][0-9]:[0-5][0-9],' data.file
That looks for dates in the range 2013-01-01 through 2013-01-07 (allowing any time for each day). You could omit the regex after the blank if you prefer; if the data is valid, it won't make any difference, but the regex avoids selecting some invalid data. Obviously, you can change the dates if you want the first week to run, for example, from the first Sunday through the first Saturday (Sunday 6th to Saturday 12th 2013):
grep -E ',2013-01-(0[6-9]|1[012]) [0-2][0-9]:[0-5][0-9]:[0-5][0-9],' data.file
You could then run this reduced data set through the sort process.
In future, please give 5 lines or so for sample data — it's easier to demonstrate what's working and what's not.
How do I sort a csv file in linux according to timestamps?
Because you are using a sensible timestamp format, you can simply use lexical sorting:
sort -t, -k3,3 file
Sort CSV file by multiple columns using the sort command
You need to use two options for the sort
command:
--field-separator
(or-t
)--key=<start,end>
(or-k
), to specify the sort key, i.e. which range of columns (start through end index) to sort by. Since you want to sort on 3 columns, you'll need to specify-k
3 times, for columns2,2
,1,1
, and3,3
.
To put it all together,
sort -t ';' -k 2,2 -k 1,1 -k 3,3
Note that sort
can't handle the situation in which fields contain the separator, even if it's escaped or quoted.
Also note: this is an old question, which belongs on UNIX.SE, and was also asked there a year later.
Old answer: depending on your system's version of sort
, the following might also work:
sort --field-separator=';' --key=2,1,3
Or, you might get "stray character in field spec".
According to the sort manual, if you don't specify the end column of the sort key, it defaults to the end of the line.
Differences between Unix commands for Sorting CSV
See also the answer by @morido for some other pointers, but here's a description of exactly what those two sort
invocations do:
sort -k 1n -o output.csv
This assumes that the "fields" in your file are delimited by a transition from non-whitespace to whitespace (i.e. leading whitespace is included in each field, not stripped, as many might expect/assume), and tells sort
to order things by a key that starts with the first field and extends to the end of the line, and assumes that the key is formatted as a numeric value. The output is sent explicitly to a specific file.
sort -t "," -k1 -n -k2
This defines the field separator as a comma, and then defines two keys to sort on. The first key again starts at the first field and extends to the end of the line and is lexicographic (dictionary order), not numeric, and the second key, which will be used when values of the first key are identical, starts with the second field and extends to the end of the line, and because of the intervening -n
, will be assumed to be numeric data as well. However, because your first key entails the entire line, essentially, the second key is not likely to ever be needed (if the first key of two separate lines is identical, the second key most likely will be too).
Since you didn't provide sample data, it's unknown whether the data in the first two fields is numeric or not, but I suspect you want something like what was suggested in the answer by @morido:
sort -t, -k1,1 -k2,2
or
sort -t, -k1,1n -k2,2n (alternatively sort -t, -n -k1,1 -k2,2)
if the data is numeric.
How to sort csv by specific column
Use the following sort
command:
sort -t, -k4,4 -nr temperature.csv
The output:
2017-06-24 14:25,22.21,19.0,17.5,0.197,4.774
2017-06-24 14:00,22.22,19.0,17.4,0.197,4.639
2017-06-24 16:00,22.42,19.0,17.3,0.134,5.93
2017-06-24 15:10,22.30,19.0,17.1,0.134,5.472
2017-06-24 13:00,21.92,19.0,17.1,0.096,4.229
2017-06-24 12:45,22.03,19.0,17.1,0.096,4.152
2017-06-24 17:45,22.07,21.0,17.0,0.144,6.472
2017-06-24 19:40,23.01,21.0,16.9,0.318,8.503
2017-06-24 18:25,21.90,21.0,16.9,0.15,6.814
2017-06-24 11:25,23.51,19.0,16.7,0.087,3.689
2017-06-24 11:20,23.57,19.0,16.7,0.087,3.615
-t,
- field delimiter-k4,4
- sort by 4th field only-nr
- sort numerically in reverse order
Related Topics
Movdqu Instruction + Page Boundary
Python Error "Attributeerror: 'Module' Object Has No Attribute 'Sha1'"
How to Feed Awk Input from Both Pipe and File
Dockerfile Cmd 'Command Not Found'
Escaping the Exclamation Point in Grep
How to Sort Comma Separated Values in Bash
How Kernel Notify a User Space Program an Interrupt Occurrs
Linux Script- Date Manipulations
Resolve Relative Relocations in Partial Link
Find Command Find Directories That Were Created After a Certain Date Under Linux/Cygwin
How to Change the Permissions in Openshift Container Platform
Use Bash Variable Within Slurm Sbatch Script
How to View Thread Id of a Process Which Has Opened a Socket Connection
How to Run Multiple Programs in a Sequence