How to Compare The Size of Two Directories

compare files with same name in 2 folders and check their size to delete the bigger one in Python

Deleting files based on size

This is a simple procedure and can be implemented in one funciton.

def  compare_folders(path1, path2):
ignore = [".", "..", ".DS_Store"] # ignore these pointers/ files
for file in os.listdir(path1):
if file in os.listdir(path2):
if file not in ignore:
delete_larger_file(path1 + "/" + file, path2 + "/" + file)


def merge_folders(path1, path2):
for file in os.listdir(path1):
if file not in os.listdir(path2):
os.rename(path1 + "/" + file, path2 + "/" + file)

def delete_larger_file(path1, path2):
if os.path.getsize(path1) > os.path.getsize(path2):
os.remove(path1)
else:
os.remove(path2)

What's going on here?

  • The first function compare_folders() will take the paths to the folders being compared as inputs. It will then iterate through the contents of each folder and call the other function delete_larger_file() which compares the sizes of 2 files and deletes the larger one.
  • A subsequent call to merge_folders() is necessary to merge the folders in place. In other words, it will compare the contents of both folders and move the files that are not in one to the other. In the end, one folder should be empty and the other one should have all the smallest files.
  • Be warned: this cannot be undone so maybe test it first? Also if there are subfolders this will not work and will require recursion.

First call compare_folders() then call merge_folders

How to compare two directories and move files unique (in name or contents) to the 1st directory elsewhere?

You can use jdupes to do this, like so:

Windows:

jdupes.exe -O -R -d -N dir2 dir1
robocopy dir1 dir1 /S /MOVE
rename dir1 dir3
rmdir /S /Q dir2

Linux:

#! /bin/sh
./jdupes -O -R -d -N ./dir2 ./dir1
find ./dir1 -type d -empty -delete
mv ./dir1 ./dir3
rm -rf ./dir2

Note: These scripts/commands only work when they are executed in the same directory (which must also contain the jdupes binary) as the target directories.

Compare size of multiple subdirectory before and after a break in a Powershell script

If I understood correctly, you're looking to filter those folders where it's Size has changed after 30 seconds, if that's the case, you could use a function so that you don't need to repeat your code. You can make your function return a hash table where the Keys are the folder's absolute path and the Values are their calculated size, once you have both results (before 30 seconds and after 30 seconds) you can run a comparison against both hash tables outputting a new object with the folder's Absolute Path, Size Before and Size After only for those folders where their calculated size has changed.

function GetFolderSize {
[cmdletbinding()]
param($path)

$map = @{}
Get-ChildItem $path -Directory -Force | ForEach-Object {
$Size = (Get-ChildItem $_.Fullname -Recurse | Measure-Object -Property Length -Sum).Sum / 1Kb
$FolderName = $_.BaseName -match '1B(\d{6})_LEAP 1A version aout2021_(\d{4})-(\d{2})-(\d{2})T(\d{2})h(\d{2})m(\d{2})s_S(\d{6})' -or $_.BaseName -match '1B(\d{6})_SML 10_LEAP 1A version aout2021_(\d{4})-(\d{2})-(\d{2})T(\d{2})h(\d{2})m(\d{2})s_S(\d{5})'
if ($FolderName) {
$map[$_.FullName] = $size
}
}
if($map) { $map }
}

$path = "C:\Users\s611284\Desktop\archive"

$before = GetFolderSize $path -ErrorAction SilentlyContinue
Start-Sleep -Seconds 30
$after = GetFolderSize $path -ErrorAction SilentlyContinue

foreach($key in $after.PSBase.Keys) {
if($before[$key] -ne $after[$key]) {
# this is a folder with a different size than before
[PSCustomObject]@{
FullName = $key
SizeBefore = $before[$key]
SizeAfter = $after[$key]
}
}
}

Comparing two directories and then removing mismatches (Python)

You can use Path.rglob:

from pathlib import Path

pl = Path(path/to/left)
pr = Path(path/to/right)

difference = (set(map(lambda p: p.relative_to(pr), pr.rglob('*'))) -
set(map(lambda p: p.relative_to(pl), pl.rglob('*'))))

Here is an example:

right
file1
file5
dir1
file2
file6
dir2
file3
file7
subdir1
file4
file8
subdir2
file9
subdir3

left
file1
dir1
file2
dir2
file3
subdir1
file4
>>> difference
{PosixPath('dir1/file6'),
PosixPath('file5'),
PosixPath('dir2/subdir3'),
PosixPath('dir2/subdir2'),
PosixPath('dir2/subdir1/file8'),
PosixPath('dir2/subdir2/file9'),
PosixPath('dir2/file7')}

Now you just need to delete all files and directories in difference.

Compare folder size and run script if the size changed

Tracking the total size of the directory is limiting. How about you keep a list of files and their sizes? That way you can act on changed files and new files. Using a dictionary here as a basic example, you can really make it as complicated as you wish, tracking creation, modification dates etc. If you don't want the complexity I have retained tracking of total size, however you still need to track which file(s) have changed.

import os
import time

def check_dir(fh,start_path='/tmp',new_cb=None,changed_cb=None):
total_size = 0
for dirpath, dirnames, filenames in os.walk(start_path):
for f in filenames:
fp = os.path.join(dirpath, f)
if not os.path.islink(fp):
fs = os.path.getsize(fp)
total_size += fs
if f in fh:
if fh[f] == fs:
# file unchanged
pass
else:
if changed_cb:
changed_cb(fp)
else:
#new file
if new_cb:
new_cb(fp)
fh[f] = fs

return total_size

def new_file(fp):
print("New File {0}!".format(fp))

def changed_file(fp):
print("File {0} changed!".format(fp))

if __name__ == '__main__':
file_history={}
total = 0

while(True):
nt = check_dir(file_history,'/tmp/test',new_file,changed_file)
if total and nt != total:
print("Total size changed from {0} to {1}".format(total,nt))
total = nt
time.sleep(10)
print("File list:\n{0}".format(file_history))


Related Topics



Leave a reply



Submit