How to find files with same size?
Solution working with "file names with spaces" (based on Kent (+1) and awiebe (+1) posts):
for FILE in *; do stat -c"%s/%n" "$FILE"; done | awk -F/ '{if ($1 in a)print $2; else a[$1]=1}' | xargs echo rm
to make it remove duplicates, remove echo
from xargs.
Windows batch file to find duplicates size files in harddisk (no matter the name and the extension)
The Batch file below should solve your problem; however, be aware that in despite of its apparent simplicity, it is based on advanced concepts, like array management in Batch files.
@echo off
setlocal EnableDelayedExpansion
rem Group all file names by size
for /R %%a in (*.*) do (
set "size[%%~Za]=!size[%%~Za]!,%%~Fa"
)
rem Show groups that have more than one element
for /F "tokens=2,3* delims=[]=," %%a in ('set size[') do (
if "%%c" neq "" echo [%%a]: %%b,%%c
)
This program may take too much time if the number of files is large or if the starting folder have a long path.
Finding files with same size (potential duplicates) in nested sub-folders in Linux Mint shell?
#prefix each filepath with the size of the file padded to 10 places
find . -type f -printf "%10s\t%p\n" |
sort --numeric | #sort numerically (uniq needs this)
uniq --repeated --check-chars=10 #select duplicates
See the respective manpages for more details.
List files with path and file size only in Command Line
Get-ChildItem -Recurse | select FullName,Length | Format-Table -HideTableHeaders | Out-File filelist.txt
compare files with same name in 2 folders and check their size to delete the bigger one in Python
Deleting files based on size
This is a simple procedure and can be implemented in one funciton.
def compare_folders(path1, path2):
ignore = [".", "..", ".DS_Store"] # ignore these pointers/ files
for file in os.listdir(path1):
if file in os.listdir(path2):
if file not in ignore:
delete_larger_file(path1 + "/" + file, path2 + "/" + file)
def merge_folders(path1, path2):
for file in os.listdir(path1):
if file not in os.listdir(path2):
os.rename(path1 + "/" + file, path2 + "/" + file)
def delete_larger_file(path1, path2):
if os.path.getsize(path1) > os.path.getsize(path2):
os.remove(path1)
else:
os.remove(path2)
What's going on here?
- The first function
compare_folders()
will take the paths to the folders being compared as inputs. It will then iterate through the contents of each folder and call the other functiondelete_larger_file()
which compares the sizes of 2 files and deletes the larger one. - A subsequent call to
merge_folders()
is necessary to merge the folders in place. In other words, it will compare the contents of both folders and move the files that are not in one to the other. In the end, one folder should be empty and the other one should have all the smallest files. - Be warned: this cannot be undone so maybe test it first? Also if there are subfolders this will not work and will require recursion.
First call compare_folders()
then call merge_folders
Group files by size, then find hash duplicates
Sort the filenames by size, and then use itertools.groupby
to group similar sized files together.
import os
import os.path
import itertools
#creates dummy files with a given number of bytes.
def create_file(name, size):
if os.path.isfile(name): return
file = open(name, "w")
file.write("X" * size)
file.close()
#create some sample files
create_file("foo.txt", 4)
create_file("bar.txt", 4)
create_file("baz.txt", 4)
create_file("qux.txt", 8)
create_file("lorem.txt", 8)
create_file("ipsum.txt", 16)
#get the filenames in this directory
filenames = [filename for filename in os.listdir(".") if os.path.isfile(filename)]
#sort by size
filenames.sort(key=lambda name: os.stat(name).st_size)
#group by size and iterate
for size, items_iterator in itertools.groupby(filenames, key=lambda name: os.stat(name).st_size):
items = list(items_iterator)
print "{} item(s) of size {}:".format(len(items), size)
#insert hashlib code here, or whatever else you want to do
for item in items:
print item
Result:
3 item(s) of size 4:
bar.txt
baz.txt
foo.txt
2 item(s) of size 8:
lorem.txt
qux.txt
1 item(s) of size 16:
ipsum.txt
1 item(s) of size 968:
test.py
How do I get the find command to print out the file size with the file name?
find . -name '*.ear' -exec ls -lh {} \;
just the h extra from jer.drab.org's reply. saves time converting to MB mentally ;)
How can I create two files of identical size and date modified?
You can create a file with a specified size and arbitrary content using head --bytes=NUM < /dev/urandom >newfile
, where NUM is the filesize in bytes and also accepts a multiplier suffix such as MB or GB (check the manual of head
). You can run this command twice to make two files of the same size and different content, or just run it once and then copy the file to get files with the same size and content.
Timestamps can be manipulated with touch --date=STRING
. Either touch both using the same date string, or touch them with different date strings to give them different timestamps.
ASP.NET: Fast way to get file size and count of all files?
Try dumping recursion, and try dumping linq - it is slow and eats up a lot of memory.
Try this:
Dim strFolder = "C:\Users\AlbertKallal\Desktop"
Dim MyDir As New DirectoryInfo(strFolder)
Dim MyFiles() As FileInfo = MyDir.GetFiles("*.*", SearchOption.AllDirectories)
For Each MyFile As FileInfo In MyFiles
tSize += MyFile.Length
Next
Related Topics
How to Merge Two Rows in a Same Row from a Text File in Linux Shell Script
Are the 'Dot' and 'Dot Dot' Files in Unix and Linux Real Files
How to Merge Two Seperate - Yet Similar - Codebases into One Svn Rep
Batch Remove Substring from Filename with Special Characters in Bash
Print the Output of Strace Command in a Text File
How to Delete the Matching Pattern from Given Occurrence
How to Automate Measuring of Bandwidth Usage Between Two Hosts
Commands Will Not Pass to Cli After Logging into New User with Sudo Su - User
Safely Remembering Ssh Credentials in Bash Script
Check If a File Exists with a Filename Containing Spaces
Apache Proxypass Not Loading Resources
How to Make a Bash String of Command with Redirect and Pipe
Environment Variable Used in Shell Script Appear Blank in Log File When Run by Cron
Delete the Word Whose Length Is Less Than 2 in Bash
Stack Smashing Code Not Working on Linux Kernel 2.6.38.7... Please Help
How to Add Output "Non_Assigned" When There Is No Match in Grep