compare files with same name in 2 folders and check their size to delete the bigger one in Python
Deleting files based on size
This is a simple procedure and can be implemented in one funciton.
def compare_folders(path1, path2):
ignore = [".", "..", ".DS_Store"] # ignore these pointers/ files
for file in os.listdir(path1):
if file in os.listdir(path2):
if file not in ignore:
delete_larger_file(path1 + "/" + file, path2 + "/" + file)
def merge_folders(path1, path2):
for file in os.listdir(path1):
if file not in os.listdir(path2):
os.rename(path1 + "/" + file, path2 + "/" + file)
def delete_larger_file(path1, path2):
if os.path.getsize(path1) > os.path.getsize(path2):
os.remove(path1)
else:
os.remove(path2)
What's going on here?
- The first function
compare_folders()
will take the paths to the folders being compared as inputs. It will then iterate through the contents of each folder and call the other functiondelete_larger_file()
which compares the sizes of 2 files and deletes the larger one. - A subsequent call to
merge_folders()
is necessary to merge the folders in place. In other words, it will compare the contents of both folders and move the files that are not in one to the other. In the end, one folder should be empty and the other one should have all the smallest files. - Be warned: this cannot be undone so maybe test it first? Also if there are subfolders this will not work and will require recursion.
First call compare_folders()
then call merge_folders
How to compare two directories and move files unique (in name or contents) to the 1st directory elsewhere?
You can use jdupes to do this, like so:
Windows:
jdupes.exe -O -R -d -N dir2 dir1
robocopy dir1 dir1 /S /MOVE
rename dir1 dir3
rmdir /S /Q dir2
Linux:
#! /bin/sh
./jdupes -O -R -d -N ./dir2 ./dir1
find ./dir1 -type d -empty -delete
mv ./dir1 ./dir3
rm -rf ./dir2
Note: These scripts/commands only work when they are executed in the same directory (which must also contain the jdupes binary) as the target directories.
Compare size of multiple subdirectory before and after a break in a Powershell script
If I understood correctly, you're looking to filter those folders where it's Size has changed after 30 seconds, if that's the case, you could use a function so that you don't need to repeat your code. You can make your function return a hash table where the Keys are the folder's absolute path and the Values are their calculated size, once you have both results (before 30 seconds and after 30 seconds) you can run a comparison against both hash tables outputting a new object with the folder's Absolute Path, Size Before and Size After only for those folders where their calculated size has changed.
function GetFolderSize {
[cmdletbinding()]
param($path)
$map = @{}
Get-ChildItem $path -Directory -Force | ForEach-Object {
$Size = (Get-ChildItem $_.Fullname -Recurse | Measure-Object -Property Length -Sum).Sum / 1Kb
$FolderName = $_.BaseName -match '1B(\d{6})_LEAP 1A version aout2021_(\d{4})-(\d{2})-(\d{2})T(\d{2})h(\d{2})m(\d{2})s_S(\d{6})' -or $_.BaseName -match '1B(\d{6})_SML 10_LEAP 1A version aout2021_(\d{4})-(\d{2})-(\d{2})T(\d{2})h(\d{2})m(\d{2})s_S(\d{5})'
if ($FolderName) {
$map[$_.FullName] = $size
}
}
if($map) { $map }
}
$path = "C:\Users\s611284\Desktop\archive"
$before = GetFolderSize $path -ErrorAction SilentlyContinue
Start-Sleep -Seconds 30
$after = GetFolderSize $path -ErrorAction SilentlyContinue
foreach($key in $after.PSBase.Keys) {
if($before[$key] -ne $after[$key]) {
# this is a folder with a different size than before
[PSCustomObject]@{
FullName = $key
SizeBefore = $before[$key]
SizeAfter = $after[$key]
}
}
}
Comparing two directories and then removing mismatches (Python)
You can use Path.rglob
:
from pathlib import Path
pl = Path(path/to/left)
pr = Path(path/to/right)
difference = (set(map(lambda p: p.relative_to(pr), pr.rglob('*'))) -
set(map(lambda p: p.relative_to(pl), pl.rglob('*'))))
Here is an example:
right
file1
file5
dir1
file2
file6
dir2
file3
file7
subdir1
file4
file8
subdir2
file9
subdir3
left
file1
dir1
file2
dir2
file3
subdir1
file4
>>> difference
{PosixPath('dir1/file6'),
PosixPath('file5'),
PosixPath('dir2/subdir3'),
PosixPath('dir2/subdir2'),
PosixPath('dir2/subdir1/file8'),
PosixPath('dir2/subdir2/file9'),
PosixPath('dir2/file7')}
Now you just need to delete all files and directories in difference
.
Compare folder size and run script if the size changed
Tracking the total size of the directory is limiting. How about you keep a list of files and their sizes? That way you can act on changed files and new files. Using a dictionary here as a basic example, you can really make it as complicated as you wish, tracking creation, modification dates etc. If you don't want the complexity I have retained tracking of total size, however you still need to track which file(s) have changed.
import os
import time
def check_dir(fh,start_path='/tmp',new_cb=None,changed_cb=None):
total_size = 0
for dirpath, dirnames, filenames in os.walk(start_path):
for f in filenames:
fp = os.path.join(dirpath, f)
if not os.path.islink(fp):
fs = os.path.getsize(fp)
total_size += fs
if f in fh:
if fh[f] == fs:
# file unchanged
pass
else:
if changed_cb:
changed_cb(fp)
else:
#new file
if new_cb:
new_cb(fp)
fh[f] = fs
return total_size
def new_file(fp):
print("New File {0}!".format(fp))
def changed_file(fp):
print("File {0} changed!".format(fp))
if __name__ == '__main__':
file_history={}
total = 0
while(True):
nt = check_dir(file_history,'/tmp/test',new_file,changed_file)
if total and nt != total:
print("Total size changed from {0} to {1}".format(total,nt))
total = nt
time.sleep(10)
print("File list:\n{0}".format(file_history))
Related Topics
What Is The 'Tr' Command in Windows
Maximum Number of Threads Allowed to Run
How to Continue Next Iteration When an Error Occurs in Bash
How to Overwrite Linux System Files into The Yocto Filesystem
Why Am I Getting an "Implicit Declaration of Function 'Ndo_Get_Stats' " Error
Bash Concurrent Jobs Gets Stuck
Counting Lines Starting with a Certain Word
Influxdb Not Asking for Authentication
Session Permission Denied on Aws After Setting Up Cake PHP.
Command and Script to Re-Read a File in Gnuplot
Headless Protractor Tests Don't Plug on Xvfb
Can't Untar a Complete Directory Using Tar -Cvpzf
Why Doesn't ''Var=Value Echo $Var'' Emit Value
Hide Information During Bash Debug Run
Succinct Way to Print All Lines Up Until The Last Line That Matches a Given Pattern