How to Find Files with Same Size

How to find files with same size?

Solution working with "file names with spaces" (based on Kent (+1) and awiebe (+1) posts):

for FILE in *; do stat -c"%s/%n" "$FILE"; done | awk -F/ '{if ($1 in a)print $2; else a[$1]=1}' | xargs echo rm

to make it remove duplicates, remove echo from xargs.

Windows batch file to find duplicates size files in harddisk (no matter the name and the extension)

The Batch file below should solve your problem; however, be aware that in despite of its apparent simplicity, it is based on advanced concepts, like array management in Batch files.

@echo off
setlocal EnableDelayedExpansion

rem Group all file names by size
for /R %%a in (*.*) do (
set "size[%%~Za]=!size[%%~Za]!,%%~Fa"
)

rem Show groups that have more than one element
for /F "tokens=2,3* delims=[]=," %%a in ('set size[') do (
if "%%c" neq "" echo [%%a]: %%b,%%c
)

This program may take too much time if the number of files is large or if the starting folder have a long path.

Finding files with same size (potential duplicates) in nested sub-folders in Linux Mint shell?

#prefix each filepath with the size of the file padded to 10 places
find . -type f -printf "%10s\t%p\n" |
sort --numeric | #sort numerically (uniq needs this)
uniq --repeated --check-chars=10 #select duplicates

See the respective manpages for more details.

List files with path and file size only in Command Line

Get-ChildItem -Recurse | select FullName,Length | Format-Table -HideTableHeaders | Out-File filelist.txt

compare files with same name in 2 folders and check their size to delete the bigger one in Python

Deleting files based on size

This is a simple procedure and can be implemented in one funciton.

def  compare_folders(path1, path2):
ignore = [".", "..", ".DS_Store"] # ignore these pointers/ files
for file in os.listdir(path1):
if file in os.listdir(path2):
if file not in ignore:
delete_larger_file(path1 + "/" + file, path2 + "/" + file)


def merge_folders(path1, path2):
for file in os.listdir(path1):
if file not in os.listdir(path2):
os.rename(path1 + "/" + file, path2 + "/" + file)

def delete_larger_file(path1, path2):
if os.path.getsize(path1) > os.path.getsize(path2):
os.remove(path1)
else:
os.remove(path2)

What's going on here?

  • The first function compare_folders() will take the paths to the folders being compared as inputs. It will then iterate through the contents of each folder and call the other function delete_larger_file() which compares the sizes of 2 files and deletes the larger one.
  • A subsequent call to merge_folders() is necessary to merge the folders in place. In other words, it will compare the contents of both folders and move the files that are not in one to the other. In the end, one folder should be empty and the other one should have all the smallest files.
  • Be warned: this cannot be undone so maybe test it first? Also if there are subfolders this will not work and will require recursion.

First call compare_folders() then call merge_folders

Group files by size, then find hash duplicates

Sort the filenames by size, and then use itertools.groupby to group similar sized files together.

import os
import os.path
import itertools

#creates dummy files with a given number of bytes.
def create_file(name, size):
if os.path.isfile(name): return
file = open(name, "w")
file.write("X" * size)
file.close()

#create some sample files
create_file("foo.txt", 4)
create_file("bar.txt", 4)
create_file("baz.txt", 4)
create_file("qux.txt", 8)
create_file("lorem.txt", 8)
create_file("ipsum.txt", 16)

#get the filenames in this directory
filenames = [filename for filename in os.listdir(".") if os.path.isfile(filename)]

#sort by size
filenames.sort(key=lambda name: os.stat(name).st_size)

#group by size and iterate
for size, items_iterator in itertools.groupby(filenames, key=lambda name: os.stat(name).st_size):
items = list(items_iterator)
print "{} item(s) of size {}:".format(len(items), size)
#insert hashlib code here, or whatever else you want to do
for item in items:
print item

Result:

3 item(s) of size 4:
bar.txt
baz.txt
foo.txt
2 item(s) of size 8:
lorem.txt
qux.txt
1 item(s) of size 16:
ipsum.txt
1 item(s) of size 968:
test.py

How do I get the find command to print out the file size with the file name?

find . -name '*.ear' -exec ls -lh {} \;

just the h extra from jer.drab.org's reply. saves time converting to MB mentally ;)

How can I create two files of identical size and date modified?

You can create a file with a specified size and arbitrary content using head --bytes=NUM < /dev/urandom >newfile, where NUM is the filesize in bytes and also accepts a multiplier suffix such as MB or GB (check the manual of head). You can run this command twice to make two files of the same size and different content, or just run it once and then copy the file to get files with the same size and content.

Timestamps can be manipulated with touch --date=STRING. Either touch both using the same date string, or touch them with different date strings to give them different timestamps.

ASP.NET: Fast way to get file size and count of all files?

Try dumping recursion, and try dumping linq - it is slow and eats up a lot of memory.

Try this:

  Dim strFolder = "C:\Users\AlbertKallal\Desktop"

Dim MyDir As New DirectoryInfo(strFolder)
Dim MyFiles() As FileInfo = MyDir.GetFiles("*.*", SearchOption.AllDirectories)

For Each MyFile As FileInfo In MyFiles
tSize += MyFile.Length
Next


Related Topics



Leave a reply



Submit