Sort CSV by Column

Sort CSV file by multiple columns using the sort command

You need to use two options for the sort command:

  • --field-separator (or -t)
  • --key=<start,end> (or -k), to specify the sort key, i.e. which range of columns (start through end index) to sort by. Since you want to sort on 3 columns, you'll need to specify -k 3 times, for columns 2,2, 1,1, and 3,3.

To put it all together,

sort -t ';' -k 2,2 -k 1,1 -k 3,3

Note that sort can't handle the situation in which fields contain the separator, even if it's escaped or quoted.

Also note: this is an old question, which belongs on UNIX.SE, and was also asked there a year later.


Old answer: depending on your system's version of sort, the following might also work:

sort --field-separator=';' --key=2,1,3

Or, you might get "stray character in field spec".

According to the sort manual, if you don't specify the end column of the sort key, it defaults to the end of the line.

How can I sort csv data alphabetically then numerically by column?

The following assumes bash (if you don't use bash replace $'\t' by a quoted real tab character) and GNU coreutils. It also assumes that you want to sort alphabetically by Make column first, then numerically in decreasing order by Total, and finally keep at most the first 3 of each Make entries.

Sorting is a job for sort, head and tail can be used to isolate the header line, and awk can be used to keep maximum 3 of each Make, and re-number the first column:

$ head -n1 data.tsv; tail -n+2 data.tsv | sort -t$'\t' -k4,4 -k6,6rn |
awk -F'\t' -vOFS='\t' '$4==p {n+=1} $4!=p {n=1;p=$4} {$1=++r} n<=3'
Ranking ID Year Make Model Total
1 113 2012 Acura Tsx sportwagon 116
2 112 2008 Acura TL 110
3 50 2015 Acura TLX 102
4 15 014 Audi S4 120
5 216 2007 Chrystler 300 96
6 83 2014 Honda Accord 112
7 65 2009 Honda Fit 106
8 31 2007 Honda Fit 102
10 128 2010 Infiniti G37 128
11 124 2015 Jeep Wrangler 124
12 91 2010 Mitsu Lancer 102
13 126 2010 Volkswagen Eos 92

Note that this is different from your expected output: Make is sorted in alphabetic order (Audi comes after Acura, not Honda) and only the 3 largest Total are kept (112, 106, 102 for Honda, not 112, 102, 92).

If you use GNU awk, and your input file is small enough to fit in memory, you can also do all this with just awk, thanks to its multidimensional arrays and its asorti function, that sorts arrays based on indices:

$ awk -F'\t' -vOFS='\t' 'NR==1 {print; next} {l[$4][$6][$0]}
END {
PROCINFO["sorted_in"] = "@ind_str_asc"
for(m in l) {
n = asorti(l[m], t, "@ind_num_desc"); n = (n>3) ? 3 : n
for(i=1; i<=n; i++) for(s in l[m][t[i]]) {$0 = s; $1 = ++r; print}
}
}' data.tsv
Ranking ID Year Make Model Total
1 113 2012 Acura Tsx sportwagon 116
2 112 2008 Acura TL 110
3 50 2015 Acura TLX 102
4 15 014 Audi S4 120
5 216 2007 Chrystler 300 96
6 83 2014 Honda Accord 112
7 65 2009 Honda Fit 106
8 31 2007 Honda Fit 102
9 128 2010 Infiniti G37 128
10 124 2015 Jeep Wrangler 124
11 91 2010 Mitsu Lancer 102
12 126 2010 Volkswagen Eos 92

How to sort csv by specific column

Use the following sort command:

sort -t, -k4,4 -nr temperature.csv

The output:

2017-06-24 14:25,22.21,19.0,17.5,0.197,4.774
2017-06-24 14:00,22.22,19.0,17.4,0.197,4.639
2017-06-24 16:00,22.42,19.0,17.3,0.134,5.93
2017-06-24 15:10,22.30,19.0,17.1,0.134,5.472
2017-06-24 13:00,21.92,19.0,17.1,0.096,4.229
2017-06-24 12:45,22.03,19.0,17.1,0.096,4.152
2017-06-24 17:45,22.07,21.0,17.0,0.144,6.472
2017-06-24 19:40,23.01,21.0,16.9,0.318,8.503
2017-06-24 18:25,21.90,21.0,16.9,0.15,6.814
2017-06-24 11:25,23.51,19.0,16.7,0.087,3.689
2017-06-24 11:20,23.57,19.0,16.7,0.087,3.615

  • -t, - field delimiter

  • -k4,4 - sort by 4th field only

  • -nr - sort numerically in reverse order

How do I sort a csv file so that my columns are in descending order?

Using the csv.reader and csv.writer functions, as well as sorted with a tuple key:

import csv

with open('medal.csv', 'r') as in_file:
in_reader = csv.reader(in_file)
header = next(in_reader)
data = sorted(in_reader, key=lambda row: tuple(int(x) for x in row[1:]), reverse=True)

with open('sorted_medal.csv', 'w', newline='') as out_file:
out_writer = csv.writer(out_file)
out_writer.writerow(header)
out_writer.writerows(data)

Result:

# Input: medal.csv
team,gold,silver,bronze
t1,17,12,38
t2,8,7,29
t3,17,11,39
t4,17,12,37
t5,8,9,30

# Output: sorted_medal.csv
team,gold,silver,bronze
t1,17,12,38
t4,17,12,37
t3,17,11,39
t5,8,9,30
t2,8,7,29

sort csv by column

import operator
sortedlist = sorted(reader, key=operator.itemgetter(3), reverse=True)

or use lambda

sortedlist = sorted(reader, key=lambda row: row[3], reverse=True)

Sorting by column in a CSV and writing to a new CSV file in Python

As JonSG pointed out in the comments to your original post, you're calling writerows() (plural) on a single row, eachline.

Change that last line to write.writerow(eachline) and you'll be good.

Looking at the problem in depth

writerows() expects "a list of a list of values". The outer list contains the rows, the inner list for each row is effectively the cell (column for that row):

sort = [
['1', '9'],
['2', '17'],
['3', '4'],
['7', '10'],
]

writer.writerows(sort)

will produce the sorted CSV with two columns and four rows that you expect (and your print statement shows).

When you call writerows() with a single row:

for eachline in sort:
writer.writerows(eachline)

you get some really weird output:

  • it interprets eachline at the outer list containing a number of rows, which means...

  • it interprets each item in eachline as a row having individual columns...

  • and each item in eachline is a Python sequence, string, so writerows() iterates over each character in your string, treating each character as its own column...

    ['1','9'] is seen as two single-column rows, ['1'] and ['9']:

    1
    9

    ['2', '17'] is seen as the single-column row ['2'] and the double-column row ['1', '7']:

    2
    1,7

How to sort a csv file by a specific column in Java

Here is a Java 8 solution using streams with lambda syntax.

String filePath = "Filepath";
String content = Files.lines(Path.of(filePath))
.sorted(Comparator.comparing(line -> Integer.parseInt(line.split(",")[2])))
.collect(Collectors.joining("\n"));
Files.write(Paths.get("OutputFilepath"), content.getBytes());

Efficient way to Sort CSV raw string data

public class StackDemo
{
private string source = "James,Mary,Patricia,Anthony,Donald\n145,10,100,39,101\n21,212,313,28,1";

public string ProcessString()
{

var rows = source.Split('\n');

var row1Values = rows[0].Split(',');
var row2Values = rows[1].Split(',');
var row3Values = rows[2].Split(',');

List<Person> people = new List<Person>();
for (int index = 0; index < 5; index++)
{
people.Add(new Person()
{
Name = row1Values[index],
SomeValue = row2Values[index],
OtherValue = row3Values[index]
});
}

people.Sort((x, y) => x.Name.CompareTo(y.Name));

List<string> names = new List<string>();
List<string> someValues = new List<string>();
List<string> otherValues = new List<string>();

foreach (Person p in people)
{
names.Add(p.Name);
someValues.Add(p.SomeValue);
otherValues.Add(p.OtherValue);
}

string result = "";
result = BuildString(names, result);
result = BuildString(someValues, result);
result = BuildString(otherValues, result);

result = result.Remove(result.Length - 1, 1);

return result;
}

private static string BuildString(List<string> names, string result)
{
foreach (string s in names)
{
result += s + ",";
}

result = result.Remove(result.Length - 1, 1);
result += "\n";
return result;
}
}

public class Person
{
public string Name { get; set; }
public string SomeValue { get; set; }
public string OtherValue { get; set; }
}

This code is extremely basic, (rude) but it does what I think you want?)

Also it returns the string in the same format as it was received.

EDIT: Expanded on comment question!

Added some unit tests to help validate how I understood your question:

public class UnitTest1
{
[Fact]
public void TestWith5()
{
string input = "James,Mary,Patricia,Anthony,Donald\n145,10,100,39,101\n21,212,313,28,1";
string expected = "Anthony,Donald,James,Mary,Patricia\n39,101,145,10,100\n28,1,21,212,313";

// arrange
StackDemo3 subject = new StackDemo3();

// act
string actualResult = subject.ProcessString(input);

// assert
Assert.Equal(expected, actualResult);
}

[Fact]
public void TestWith4()
{
string input = "James,Mary,Patricia,Anthony,\n145,10,100,39,\n21,212,313,28,";
string expected = ",Anthony,James,Mary,Patricia\n,39,145,10,100\n,28,21,212,313";

// arrange
StackDemo3 subject = new StackDemo3();

// act
string actualResult = subject.ProcessString(input);

// assert
Assert.Equal(expected, actualResult);
}

[Fact]
public void TestWith3()
{
string input = "James,Mary,Patricia,,\n145,10,100,,\n21,212,313,,";
string expected = ",,James,Mary,Patricia\n,,145,10,100\n,,21,212,313";

// arrange
StackDemo3 subject = new StackDemo3();

// act
string actualResult = subject.ProcessString(input);

// assert
Assert.Equal(expected, actualResult);
}

[Fact]
public void TestWith2()
{
string input = ",,James,Mary,\n,,145,10,\n,,21,212,";
string expected = ",,,James,Mary\n,,,145,10\n,,,21,212";

// arrange
StackDemo3 subject = new StackDemo3();

// act
string actualResult = subject.ProcessString(input);

// assert
Assert.Equal(expected, actualResult);
}

[Fact]
public void TestWith1()
{
string input = "James,,,,\n145,,,,\n21,,,,";
string expected = "James,,,,\n145,,,,\n21,,,,";

// arrange
StackDemo3 subject = new StackDemo3();

// act
string actualResult = subject.ProcessString(input);

// assert
Assert.Equal(expected, actualResult);
}

[Fact]
public void TestWith0()
{
string input = ",,,,\n,,,,\n,,,,";
string expected = ",,,,\n,,,,\n,,,,";

// arrange
StackDemo3 subject = new StackDemo3();

// act
string actualResult = subject.ProcessString(input);

// assert
Assert.Equal(expected, actualResult);
}
}

Here is the actual implementation:

public interface IStringPeopleParser
{
List<Person> ConvertToPeople(string input);
}

public interface IPeopleStringParser
{
string ConvertPeopleToString(List<Person> people);
}

public class PeopleStringParser : IPeopleStringParser
{
public string ConvertPeopleToString(List<Person> people)
{
List<string> names = new List<string>();
List<string> someValues = new List<string>();
List<string> otherValues = new List<string>();

foreach (Person p in people)
{
names.Add(p.Name);
someValues.Add(p.SomeValue);
otherValues.Add(p.OtherValue);
}

string output = "";
output += string.Join(",", names);
output += "\n";
output += string.Join(",", someValues);
output += "\n";
output += string.Join(",", otherValues);

return output;
}
}

public class StringPeopleParser : IStringPeopleParser
{
public List<Person> ConvertToPeople(string source)
{
var rows = source.Split('\n');

string[] row1Values = rows[0].Split(',');
string[] row2Values = rows[1].Split(',');
string[] row3Values = rows[2].Split(',');

List<Person> people = new List<Person>();
for (int index = 0; index < row1Values.Length; index++)
{
people.Add(new Person()
{
Name = row1Values[index],
SomeValue = row2Values[index],
OtherValue = row3Values[index]
});
}

return people;
}
}

public class StackDemo3
{
IStringPeopleParser stringPeopleParser = new StringPeopleParser();
IPeopleStringParser peopleStringParser = new PeopleStringParser();

public string ProcessString(string s) {
List<Person> people = stringPeopleParser.ConvertToPeople(s);
int validCount = people.Where(x => x.IsValid()).Count();
switch (validCount)
{
case 0:
case 1:
{
return peopleStringParser.ConvertPeopleToString(people);
}
case 2:
case 3:
case 4:
case 5:
{
people = people.OrderBy(x => x.Name).ToList();
return peopleStringParser.ConvertPeopleToString(people);
}
default:
{
return "";//outside bounds of reality. Should never happen.
}
}
}

}

public class Person
{
public string Name { get; set; }
public string SomeValue { get; set; }
public string OtherValue { get; set; }

public bool IsValid() {
if (string.IsNullOrWhiteSpace(Name) || string.IsNullOrWhiteSpace(SomeValue) || string.IsNullOrWhiteSpace(OtherValue))
{
return false;
}
return true;
}
}

Also I don't really know why you don't want the person class?
You need to have a reference between the 3 values possible in each row (the index value is the key) by creating the Person class, the class instance becomes said reference.



Related Topics



Leave a reply



Submit