How to Sort Files Numerically

How do you sort files numerically?

This is called "natural sorting" or "human sorting" (as opposed to lexicographical sorting, which is the default). Ned B wrote up a quick version of one.

import re

def tryint(s):
try:
return int(s)
except:
return s

def alphanum_key(s):
""" Turn a string into a list of string and number chunks.
"z23a" -> ["z", 23, "a"]
"""
return [ tryint(c) for c in re.split('([0-9]+)', s) ]

def sort_nicely(l):
""" Sort the given list in the way that humans expect.
"""
l.sort(key=alphanum_key)

It's similar to what you're doing, but perhaps a bit more generalized.

How to sort files in python in numeric order?

sorted takes a key. You can use lambda function in the key to do a numeric order sort.

Ex:

import os
sorted(os.listdir('path/to/jpg/files'), key=lambda x: int(x.split(".")[0]))

Sorting files 'numerically' instead of alphabetically in java

Arrays.sort(fileArray, new Comparator<File>() {
public int compare(File f1, File f2) {
try {
int i1 = Integer.parseInt(f1.getName());
int i2 = Integer.parseInt(f2.getName());
return i1 - i2;
} catch(NumberFormatException e) {
throw new AssertionError(e);
}
}
});

Sort files numerically by name, then by parent directory in BASH

Use / as the separator, first sort by the filename, then by the directory name:

find . -type f | sort -t/ -k3,3n -k2,2n

Sort files by numerical order while excluding files with non-numeric filenames

Try this:

import os

path = "/absolute/path/to/files/to/sort/"
numerical_files = []
for f in os.listdir(path):
file_name, extension = f.split('.')
if extension == "csv" and file_name.isnumeric():
numerical_files.append(file_name)
else:
print("Invalid file: ", f)

numerical_filenames_ints = [ int(f) for f in numerical_files ]
numerical_filenames_ints.sort()

for f in numerical_filenames_ints:
file = str(f) + ".csv"
print(file)

Sort filenames in directory in ascending order

Assuming there's just one number in each file name:

>>> dirFiles = ['Picture 03.jpg', '02.jpg', '1.jpg']
>>> dirFiles.sort(key=lambda f: int(filter(str.isdigit, f)))
>>> dirFiles
['1.jpg', '02.jpg', 'Picture 03.jpg']

A version that also works in Python 3:

>>> dirFiles.sort(key=lambda f: int(re.sub('\D', '', f)))

How to sort glob.glob numerically?

The general answer would catch the number with re.match() and to convert that number (string) to integer with int(). Use these numbers to sort the files with sorted()

Code
import re 
import math
from pathlib import Path

file_pattern = re.compile(r'.*?(\d+).*?')
def get_order(file):
match = file_pattern.match(Path(file).name)
if not match:
return math.inf
return int(match.groups()[0])

sorted_files = sorted(files, key=get_order)

Example input

Consider random files with one integer number in any part of the filename:

├── 012 some file.mp3
├── 1 file.txt
├── 13 file.mp3
├── 2 another file.txt
├── 3 file.csv
├── 4 file.mp3
├── 6 yet another file.txt
├── 88 name of file.mp3
├── and final 999.txt
├── and some another file7.txt
├── some 5 file.mp3
└── test.py

Example output

The get_order() could be used to sort the files, when passed to the sorted() builtin function in the key argument

In [1]: sorted(files, key=get_order)
Out[1]:
['C:\\tmp\\file_sort\\1 file.txt',
'C:\\tmp\\file_sort\\2 another file.txt',
'C:\\tmp\\file_sort\\3 file.csv',
'C:\\tmp\\file_sort\\4 file.mp3',
'C:\\tmp\\file_sort\\some 5 file.mp3',
'C:\\tmp\\file_sort\\6 yet another file.txt',
'C:\\tmp\\file_sort\\and some another file7.txt',
'C:\\tmp\\file_sort\\012 some file.mp3',
'C:\\tmp\\file_sort\\13 file.mp3',
'C:\\tmp\\file_sort\\88 name of file.mp3',
'C:\\tmp\\file_sort\\and final 999.txt',
'C:\\tmp\\file_sort\\test.py']

Short explanation

  • The re.compile is used to give a small speed boost (if matching multiple times same pattern)
  • The re.match is used to match the regular expression patterh
  • In the regex pattern, .*? means any character (.), zero or more times (*) non-greedily (?). \d+ matches any digit number one or more times, and the parenthesis just captures that match to the groups() list.
  • In case of no match (no digits in the file), the match will be None, and the get_order gives infinity; these files are sorted arbitrarily, but one could add logic for these (was not asked in this question).
  • The sorted() function takes key argument, which should be callable which takes one argument: The item in the list. In this case, it will be one of those file strings (full file path)
  • The Path(file).name just takes the filename part (without suffix) from full file path.

How to sort files numerically from linux command line

This would be my first thought:

ls -1 | sed 's/\-\([kM]\)\?\([0-9]\{2\}\)\./-\10\2./' | sort | sed 's/0\([0-9]\{2\}\)/\1/'

Basically I just use sed to pad the number with zeros and then use it again afterwards to strip off the leading zero.

I don't know if it might be quicker in Perl.

Sorting files by numerical order

Set Arg = WScript.Arguments
set WshShell = createObject("Wscript.Shell")
Set Inp = WScript.Stdin
Set Outp = Wscript.Stdout
Set rs = CreateObject("ADODB.Recordset")
If LCase(Arg(1)) = "n" then
With rs
.Fields.Append "SortKey", 4
.Fields.Append "Txt", 201, 5000
.Open
Do Until Inp.AtEndOfStream
Lne = Inp.readline
SortKey = Mid(Lne, LCase(Arg(3)), LCase(Arg(4)) - LCase(Arg(3)))
If IsNumeric(Sortkey) = False then
Set RE = new Regexp
re.Pattern = "[^0-9\.,]"
re.global = true
re.ignorecase = true
Sortkey = re.replace(Sortkey, "")
End If
If IsNumeric(Sortkey) = False then
Sortkey = 0
ElseIf Sortkey = "" then
Sortkey = 0
ElseIf IsNull(Sortkey) = true then
Sortkey = 0
End If
.AddNew
.Fields("SortKey").value = CSng(SortKey)
.Fields("Txt").value = Lne
.UpDate
Loop
If LCase(Arg(2)) = "a" then SortColumn = "SortKey ASC"
If LCase(Arg(2)) = "d" then SortColumn = "SortKey DESC"
.Sort = SortColumn
Do While not .EOF
Outp.writeline .Fields("Txt").Value
.MoveNext
Loop
End With

ElseIf LCase(Arg(1)) = "d" then
With rs
.Fields.Append "SortKey", 4
.Fields.Append "Txt", 201, 5000
.Open
Do Until Inp.AtEndOfStream
Lne = Inp.readline
SortKey = Mid(Lne, LCase(Arg(3)), LCase(Arg(4)) - LCase(Arg(3)))
If IsDate(Sortkey) = False then
Set RE = new Regexp
re.Pattern = "[^0-9\\\-:]"
re.global = true
re.ignorecase = true
Sortkey = re.replace(Sortkey, "")
End If
If IsDate(Sortkey) = False then
Sortkey = 0
ElseIf Sortkey = "" then
Sortkey = 0
ElseIf IsNull(Sortkey) = true then
Sortkey = 0
End If
.AddNew
.Fields("SortKey").value = CDate(SortKey)
.Fields("Txt").value = Lne
.UpDate
Loop
If LCase(Arg(2)) = "a" then SortColumn = "SortKey ASC"
If LCase(Arg(2)) = "d" then SortColumn = "SortKey DESC"
.Sort = SortColumn
Do While not .EOF
Outp.writeline .Fields("Txt").Value
.MoveNext
Loop
End With


ElseIf LCase(Arg(1)) = "t" then
With rs
.Fields.Append "SortKey", 201, 260
.Fields.Append "Txt", 201, 5000
.Open
Do Until Inp.AtEndOfStream
Lne = Inp.readline
SortKey = Mid(Lne, LCase(Arg(3)), LCase(Arg(4)) - LCase(Arg(3)))
.AddNew
.Fields("SortKey").value = SortKey
.Fields("Txt").value = Lne
.UpDate
Loop
If LCase(Arg(2)) = "a" then SortColumn = "SortKey ASC"
If LCase(Arg(2)) = "d" then SortColumn = "SortKey DESC"
.Sort = SortColumn
Do While not .EOF
Outp.writeline .Fields("Txt").Value
.MoveNext
Loop
End With
ElseIf LCase(Arg(1)) = "tt" then
With rs
.Fields.Append "SortKey", 201, 260
.Fields.Append "Txt", 201, 5000
.Open
Do Until Inp.AtEndOfStream
Lne = Inp.readline
SortKey = Trim(Mid(Lne, LCase(Arg(3)), LCase(Arg(4)) - LCase(Arg(3))))
.AddNew
.Fields("SortKey").value = SortKey
.Fields("Txt").value = Lne
.UpDate
Loop
If LCase(Arg(2)) = "a" then SortColumn = "SortKey ASC"
If LCase(Arg(2)) = "d" then SortColumn = "SortKey DESC"
.Sort = SortColumn
Do While not .EOF
Outp.writeline .Fields("Txt").Value
.MoveNext
Loop
End With
End If

To use

cscript //nologo script.vbs sort {n|d|t|tt} {a|d} startcolumn  endcolumn < input.txt > output.txt

Options

n - extracts a number from the columns specified. Looks for the first number.
d - extracts a time or date from the columns specified. Looks for the first date.
t - extracts a text string including spaces from the columns specified.
tt - extracts a text string discarding leading and trailing spaces from the columns specified.
a - sorts acending
d - sorts decending
startcolumn - the starting column, the first character is column 1
endcolumn - the ending column

This is what command line synax means

The following table describes the notation used to indicate command-line syntax.

Notation Description

Text without brackets or braces
Items you must type as shown

<Text inside angle brackets>
Placeholder for which you must supply a value

[Text inside square brackets]
Optional items

{Text inside braces}
Set of required items; choose one

Vertical bar (|)
Separator for mutually exclusive items; choose one

Ellipsis (…)
Items that can be repeated


Related Topics



Leave a reply



Submit