How do you sort files numerically?
This is called "natural sorting" or "human sorting" (as opposed to lexicographical sorting, which is the default). Ned B wrote up a quick version of one.
import re
def tryint(s):
try:
return int(s)
except:
return s
def alphanum_key(s):
""" Turn a string into a list of string and number chunks.
"z23a" -> ["z", 23, "a"]
"""
return [ tryint(c) for c in re.split('([0-9]+)', s) ]
def sort_nicely(l):
""" Sort the given list in the way that humans expect.
"""
l.sort(key=alphanum_key)
It's similar to what you're doing, but perhaps a bit more generalized.
How to sort files in python in numeric order?
sorted
takes a key. You can use lambda
function in the key to do a numeric order sort.
Ex:
import os
sorted(os.listdir('path/to/jpg/files'), key=lambda x: int(x.split(".")[0]))
Sorting files 'numerically' instead of alphabetically in java
Arrays.sort(fileArray, new Comparator<File>() {
public int compare(File f1, File f2) {
try {
int i1 = Integer.parseInt(f1.getName());
int i2 = Integer.parseInt(f2.getName());
return i1 - i2;
} catch(NumberFormatException e) {
throw new AssertionError(e);
}
}
});
Sort files numerically by name, then by parent directory in BASH
Use /
as the separator, first sort by the filename, then by the directory name:
find . -type f | sort -t/ -k3,3n -k2,2n
Sort files by numerical order while excluding files with non-numeric filenames
Try this:
import os
path = "/absolute/path/to/files/to/sort/"
numerical_files = []
for f in os.listdir(path):
file_name, extension = f.split('.')
if extension == "csv" and file_name.isnumeric():
numerical_files.append(file_name)
else:
print("Invalid file: ", f)
numerical_filenames_ints = [ int(f) for f in numerical_files ]
numerical_filenames_ints.sort()
for f in numerical_filenames_ints:
file = str(f) + ".csv"
print(file)
Sort filenames in directory in ascending order
Assuming there's just one number in each file name:
>>> dirFiles = ['Picture 03.jpg', '02.jpg', '1.jpg']
>>> dirFiles.sort(key=lambda f: int(filter(str.isdigit, f)))
>>> dirFiles
['1.jpg', '02.jpg', 'Picture 03.jpg']
A version that also works in Python 3:
>>> dirFiles.sort(key=lambda f: int(re.sub('\D', '', f)))
How to sort glob.glob numerically?
The general answer would catch the number with re.match()
and to convert that number (string) to integer with int()
. Use these numbers to sort the files with sorted()
import re
import math
from pathlib import Path
file_pattern = re.compile(r'.*?(\d+).*?')
def get_order(file):
match = file_pattern.match(Path(file).name)
if not match:
return math.inf
return int(match.groups()[0])
sorted_files = sorted(files, key=get_order)
Example input
Consider random files with one integer number in any part of the filename:
├── 012 some file.mp3
├── 1 file.txt
├── 13 file.mp3
├── 2 another file.txt
├── 3 file.csv
├── 4 file.mp3
├── 6 yet another file.txt
├── 88 name of file.mp3
├── and final 999.txt
├── and some another file7.txt
├── some 5 file.mp3
└── test.py
Example output
The get_order()
could be used to sort the files, when passed to the sorted()
builtin function in the key
argument
In [1]: sorted(files, key=get_order)
Out[1]:
['C:\\tmp\\file_sort\\1 file.txt',
'C:\\tmp\\file_sort\\2 another file.txt',
'C:\\tmp\\file_sort\\3 file.csv',
'C:\\tmp\\file_sort\\4 file.mp3',
'C:\\tmp\\file_sort\\some 5 file.mp3',
'C:\\tmp\\file_sort\\6 yet another file.txt',
'C:\\tmp\\file_sort\\and some another file7.txt',
'C:\\tmp\\file_sort\\012 some file.mp3',
'C:\\tmp\\file_sort\\13 file.mp3',
'C:\\tmp\\file_sort\\88 name of file.mp3',
'C:\\tmp\\file_sort\\and final 999.txt',
'C:\\tmp\\file_sort\\test.py']
Short explanation
- The
re.compile
is used to give a small speed boost (if matching multiple times same pattern) - The
re.match
is used to match the regular expression patterh - In the regex pattern,
.*?
means any character (.
), zero or more times (*
) non-greedily (?
).\d+
matches any digit number one or more times, and the parenthesis just captures that match to thegroups()
list. - In case of no match (no digits in the file), the
match
will beNone
, and theget_order
gives infinity; these files are sorted arbitrarily, but one could add logic for these (was not asked in this question). - The
sorted()
function takeskey
argument, which should be callable which takes one argument: The item in the list. In this case, it will be one of those file strings (full file path) - The
Path(file).name
just takes the filename part (without suffix) from full file path.
How to sort files numerically from linux command line
This would be my first thought:
ls -1 | sed 's/\-\([kM]\)\?\([0-9]\{2\}\)\./-\10\2./' | sort | sed 's/0\([0-9]\{2\}\)/\1/'
Basically I just use sed
to pad the number with zeros and then use it again afterwards to strip off the leading zero.
I don't know if it might be quicker in Perl.
Sorting files by numerical order
Set Arg = WScript.Arguments
set WshShell = createObject("Wscript.Shell")
Set Inp = WScript.Stdin
Set Outp = Wscript.Stdout
Set rs = CreateObject("ADODB.Recordset")
If LCase(Arg(1)) = "n" then
With rs
.Fields.Append "SortKey", 4
.Fields.Append "Txt", 201, 5000
.Open
Do Until Inp.AtEndOfStream
Lne = Inp.readline
SortKey = Mid(Lne, LCase(Arg(3)), LCase(Arg(4)) - LCase(Arg(3)))
If IsNumeric(Sortkey) = False then
Set RE = new Regexp
re.Pattern = "[^0-9\.,]"
re.global = true
re.ignorecase = true
Sortkey = re.replace(Sortkey, "")
End If
If IsNumeric(Sortkey) = False then
Sortkey = 0
ElseIf Sortkey = "" then
Sortkey = 0
ElseIf IsNull(Sortkey) = true then
Sortkey = 0
End If
.AddNew
.Fields("SortKey").value = CSng(SortKey)
.Fields("Txt").value = Lne
.UpDate
Loop
If LCase(Arg(2)) = "a" then SortColumn = "SortKey ASC"
If LCase(Arg(2)) = "d" then SortColumn = "SortKey DESC"
.Sort = SortColumn
Do While not .EOF
Outp.writeline .Fields("Txt").Value
.MoveNext
Loop
End With
ElseIf LCase(Arg(1)) = "d" then
With rs
.Fields.Append "SortKey", 4
.Fields.Append "Txt", 201, 5000
.Open
Do Until Inp.AtEndOfStream
Lne = Inp.readline
SortKey = Mid(Lne, LCase(Arg(3)), LCase(Arg(4)) - LCase(Arg(3)))
If IsDate(Sortkey) = False then
Set RE = new Regexp
re.Pattern = "[^0-9\\\-:]"
re.global = true
re.ignorecase = true
Sortkey = re.replace(Sortkey, "")
End If
If IsDate(Sortkey) = False then
Sortkey = 0
ElseIf Sortkey = "" then
Sortkey = 0
ElseIf IsNull(Sortkey) = true then
Sortkey = 0
End If
.AddNew
.Fields("SortKey").value = CDate(SortKey)
.Fields("Txt").value = Lne
.UpDate
Loop
If LCase(Arg(2)) = "a" then SortColumn = "SortKey ASC"
If LCase(Arg(2)) = "d" then SortColumn = "SortKey DESC"
.Sort = SortColumn
Do While not .EOF
Outp.writeline .Fields("Txt").Value
.MoveNext
Loop
End With
ElseIf LCase(Arg(1)) = "t" then
With rs
.Fields.Append "SortKey", 201, 260
.Fields.Append "Txt", 201, 5000
.Open
Do Until Inp.AtEndOfStream
Lne = Inp.readline
SortKey = Mid(Lne, LCase(Arg(3)), LCase(Arg(4)) - LCase(Arg(3)))
.AddNew
.Fields("SortKey").value = SortKey
.Fields("Txt").value = Lne
.UpDate
Loop
If LCase(Arg(2)) = "a" then SortColumn = "SortKey ASC"
If LCase(Arg(2)) = "d" then SortColumn = "SortKey DESC"
.Sort = SortColumn
Do While not .EOF
Outp.writeline .Fields("Txt").Value
.MoveNext
Loop
End With
ElseIf LCase(Arg(1)) = "tt" then
With rs
.Fields.Append "SortKey", 201, 260
.Fields.Append "Txt", 201, 5000
.Open
Do Until Inp.AtEndOfStream
Lne = Inp.readline
SortKey = Trim(Mid(Lne, LCase(Arg(3)), LCase(Arg(4)) - LCase(Arg(3))))
.AddNew
.Fields("SortKey").value = SortKey
.Fields("Txt").value = Lne
.UpDate
Loop
If LCase(Arg(2)) = "a" then SortColumn = "SortKey ASC"
If LCase(Arg(2)) = "d" then SortColumn = "SortKey DESC"
.Sort = SortColumn
Do While not .EOF
Outp.writeline .Fields("Txt").Value
.MoveNext
Loop
End With
End If
To use
cscript //nologo script.vbs sort {n|d|t|tt} {a|d} startcolumn endcolumn < input.txt > output.txt
Options
n - extracts a number from the columns specified. Looks for the first number.
d - extracts a time or date from the columns specified. Looks for the first date.
t - extracts a text string including spaces from the columns specified.
tt - extracts a text string discarding leading and trailing spaces from the columns specified.
a - sorts acending
d - sorts decending
startcolumn - the starting column, the first character is column 1
endcolumn - the ending column
This is what command line synax means
The following table describes the notation used to indicate command-line syntax.
Notation Description
Text without brackets or braces
Items you must type as shown
<Text inside angle brackets>
Placeholder for which you must supply a value
[Text inside square brackets]
Optional items
{Text inside braces}
Set of required items; choose one
Vertical bar (|)
Separator for mutually exclusive items; choose one
Ellipsis (…)
Items that can be repeated
Related Topics
How to Change Python Version in Anaconda Spyder
Find Specific Words in Text File and Print the Line Using Python
How to Normalize a Numpy Array to Within a Certain Range
How to Check List Containing Nan
Python - Regex Match Multiple Patterns in Multiple Lines
Adding Different Sized/Shaped Displaced Numpy Matrices
Delete Every Non Utf-8 Symbols from String
How to Merge 2 CSV Files Together by Multiple Columns in Python
How to Stop a Running Function Without Exiting the Tkinter Window Entirely
Save Variables in Every Iteration of for Loop and Load Them Later
How to Update a Pyspark Dataframe With New Values from Another Dataframe
How to Limit the User Input to Only Integers in Python
How to Continue a Loop After Catching Exception in Try ... Except
How to Make Python Code to Execute Only Once
How to Transfer Data from One Worksheet into Another Using Python in the Same Workbook
Can Anyone Explain Me What This Python 3 Command Do
Finding a Substring Within a String Without Using Any Built in Functions