Have one folder with files that have the same name but different file
Your if [ "$file_hash" == "$a" ];
compares a hash with a filename. You need something like
if [ "$file_hash" == $(md5sum "$a" | cut -d ' ' -f 1) ];
to compute the hash for each of the file in destination folder.
Furthermore, your for loop, in its current version, runs only once ; you need something like
for a in $destination_folder/*
to get all files in that folder rather than just the folder name.
Based on your edits, a solution would look like
#!/bin/bash -xv
file=$1
destination_folder=$2
file_hash=`md5sum "$file" | cut -d ' ' -f 1`
# test that the filename exists in the destination dir
if [[ -f $destination_folder/$file ]] ; then
dest_hash=$(md5sum "$destination_folder/$file" | cut -d ' ' -f 1)
# test that the hash is the same
if [[ "$file_hash" == $curr_hash ]] ; then
cp "$file.JPG" "$destination_folder/$file.JPG"
else
# do nothing
fi
else
# destination does not exit, copy file
cp "$file.JPG" "$destination_folder/$file"
fi
This does not ensure that there are no duplicates. It simply ensures that distinct files with identical names do not overwrite each other.
#!/bin/bash -xv
file=$1
destination_folder=$2
file_hash=`md5sum "$file" | cut -d ' ' -f 1`
# test each file in destination
for a in $destination_folder/*
do
curr_hash=$(md5sum "$a" | cut -d ' ' -f 1)
if [ "$file_hash" == $curr_hash ];
then
# an identical file exists. (maybe under another name)
# do nothing
exists=1
break
fi
done
if [[ $exists != 1 ]] ; then
if [[ -f $destination_folder/$file ]] ; then
cp "$file.JPG" "$destination_folder/$file.JPG"
else
cp "$file.JPG" "$destination_folder"
fi
fi
Not tested.
files with same name in different folders
Try this to get the files which have names in common:
cd dir1
find . -type f | sort > /tmp/dir1.txt
cd dir2
find . -type f | sort > /tmp/dir2.txt
comm -12 /tmp/dir1.txt /tmp/dir2.txt
Then use a loop to do whatever you need:
for filename in "$(comm -12 /tmp/dir1.txt /tmp/dir2.txt)"; do
cat "dir1/$filename"
cat "dir2/$filename"
done
Find same file name in different folders, and utilize two files for a certain task in python
Use python's os.walk()
to get a list of files in both folders. Then match them as per your needs and run whichever function you want on them.
Use this and this reference to understand how to use os.walk()
.
How to pack files into a ZIP file which have same name but different suffixes?
The task could be done with following batch file:
@echo off
setlocal EnableExtensions DisableDelayedExpansion
cd /D "C:\folder" || exit /B
for %%I in ("video\*-video.*") do (
set "VideoFileName=%%~nxI"
setlocal EnableDelayedExpansion
set "CommonName=!VideoFileName:-video%%~xI=!"
"%ProgramFiles%\7-Zip\7z.exe" a -bd -bso0 -mx0 -y -- "!CommonName!.zip" "cover\!CommonName!-art.*" "image\!CommonName!-screen.*" "mix\!CommonName!-HDphoto.*" "video\!VideoFileName!"
endlocal
)
endlocal
The batch file defines with the first two command lines the required execution environment which is:
- command echo mode turned off
- command extensions enabled
- delayed expansion disabled
The next command changes the current directory to C:\folder
. If that command fails because of the directory does not exist or a UNC path is used instead of a path starting with a drive letter and a colon, the batch file processing is exited without any further message than the error message output by command CD.
The FOR loop searches in subdirectory video
for non-hidden files matching the wildcard pattern *-video.*
. The file name with file extension of a found video file matching this wildcard pattern is assigned to the environment variable VideoFileName
.
Next delayed expansion is enabled as required to make use of the file name assigned to the environment variable VideoFileName
. Please read this answer for details on what happens in background on execution of setlocal EnableDelayedExpansion
and later on execution of corresponding endlocal
.
A case-insensitive string substitution using delayed expansion is used to remove from video file name -video
left to the file extension and the file extension itself which can be .mp4
, .mpg
, .mpeg
, etc. VideoFileName
was defined with file extension in case of the file name itself contains anywhere the string -video
which should not be removed by the string substitution. For example My Home-Video-video.mp4
assigned to VideoFileName
results in My Home-Video
getting assigned to the environment variable CommonName
because of taking also the file extension into account on string substitution.
Next 7-Zip is executed with the command a
and the switches as posted in question to create or add to a ZIP file in current directory with common name part of the files in the four directories and file extension .zip
the video file name and the other image files from the other three directories with whatever file extension the images have in the other three directories.
Then endlocal
is executed to restore the previous environment with delayed expansion disabled again.
The commands setlocal EnableDelayedExpansion
and endlocal
are used inside the FOR loop to be able to process correct also a files collection with a common name like Superman & Batman (+ Robin!)
containing an exclamation mark.
The video files in subdirectory video
define which files to pack into a ZIP file. So all image files in the three image directories are ignored for which no video file exists in directory video
.
Note: The batch file is not capable processing correct video file names which contain an exclamation mark in the file extension. I doubt that this limitation is ever a problem as I have never seen a video file extension with an exclamation mark.
For understanding the used commands and how they work, open a command prompt window, execute there the following commands, and read entirely all help pages displayed for each command very carefully.
cd /?
echo /?
endlocal /?
exit /?
set /?
setlocal /?
See also single line with multiple commands using Windows batch file for an explanation of conditional operator ||
.
Merge files with same name in more than 100 folders
The cat
command is just about the simplest there is, so there is no obvious and portable way to make the copying of file contents any faster. The bottleneck is probably going to be finding the files, anyway, not in copying them. If indeed the files are all in subdirectories immediately below the root directory,
cat /*/replaced_txt >merged_txt
will expand the wildcard alphabetically (so /folder10/replaced_txt
comes before /folder2/replaced_txt
) but might run into "Argument list too long" and/or take a long time to expand the wildcard if some of these directories are large (especially on an older Linux system with an ext3
filesystem, which doesn't scale to large directories very well). A more general solution is find
, which is better at finding files in arbitrarily nested subdirectories, and won't run into "Argument list too long" because it never tries to assemble all the file names into an alphabetized list; instead, it just enumerates the files it finds as it traverses directories in whichever order the filesystem reports them, and creates a new cat
process when the argument list fills up to the point where the system's ARG_MAX
limit would be exceeded.
find / -type f -name replaced_txt -xdev -exec cat {} + >merged_txt
If you want to limit how far subdirectories will be traversed or you only want to visit some directories, look at the find
man page for additional options.
Related Topics
How to Do 'Ret' Instruction from Code at _Start in MACos? Linux
How to Connect a Shell to a Pseudo Tty
How to Ssh Multiple Hops Without Putting the Local Rsa Key Everywhere
What's the Difference Between Event-Driven and Asynchronous? Between Epoll and Aio
Postgresql Won't Start: "Server.Key" Has Group or World Access
Bash: Let Statement VS Assignment
Selecting Text in Terminal Without Using the Mouse
Track the Time a Command Takes in Unix/Linux
Command to Get Time in Milliseconds
Best Way to Set Environment Variables in Calling Shell
Printing Variable to Command Line Using Assembly in Linux
Need an Overview of Debugging Process from the Hardware Layer
Docker Networking Namespace Not Visible in Ip Netns List
Is It Safe to Delete the Journal File of Mongodb
Limit the Memory and CPU Available for a User in Linux