Splitting a Large Directory into Smaller Ones in Linux

Splitting a large directory into smaller ones in Linux

Using bash:

x=("path/to/dir1" "path/to/dir2" "path/to/dir3")
c=0
for f in *
do
mv "$f" "${x[c]}"
c=$(( (c+1)%3 ))
done

Split a folder into multiple subfolders in terminal/bash script

This solution can handle names with whitespace and wildcards and can be easily extended to support less straightforward tree structures. It will look for files in all direct subdirectories of the working directory and sort them into new subdirectories of those. New directories will be named 0, 1, etc.:

#!/bin/bash

maxfilesperdir=20

# loop through all top level directories:
while IFS= read -r -d $'\0' topleveldir
do
# enter top level subdirectory:
cd "$topleveldir"

declare -i filecount=0 # number of moved files per dir
declare -i dircount=0 # number of subdirs created per top level dir

# loop through all files in that directory and below
while IFS= read -r -d $'\0' filename
do
# whenever file counter is 0, make a new dir:
if [ "$filecount" -eq 0 ]
then
mkdir "$dircount"
fi

# move the file into the current dir:
mv "$filename" "${dircount}/"
filecount+=1

# whenever our file counter reaches its maximum, reset it, and
# increase dir counter:
if [ "$filecount" -ge "$maxfilesperdir" ]
then
dircount+=1
filecount=0
fi
done < <(find -type f -print0)

# go back to top level:
cd ..
done < <(find -mindepth 1 -maxdepth 1 -type d -print0)

The find -print0/read combination with process substitution has been stolen from another question.

It should be noted that simple globbing can handle all kinds of strange directory and file names as well. It is however not easily extensible for multiple levels of directories.

Split large directory into subdirectories

I suspect that if you checked, you'd noticed your program was actually moving the files, albeit really slowly. Launching a program is rather expensive (at least compared to making a system call), and you do so three or four times per file! As such, the following should be much faster:

perl -e'
my $base_dir_qfn = ".";
my $i = 0;
my $dir;
opendir(my $dh, $base_dir_qfn)
or die("Can'\''t open dir \"$base_dir_qfn\": $!\n");

while (defined( my $fn = readdir($dh) )) {
next if $fn =~ /^(?:\.\.?|dir_\d+)\z/;

my $qfn = "$base_dir_qfn/$fn";

if ($i % 1000 == 0) {
$dir_qfn = sprintf("%s/dir_%03d", $base_dir_qfn, int($i/1000)+1);
mkdir($dir_qfn)
or die("Can'\''t make directory \"$dir_qfn\": $!\n");
}

rename($qfn, "$dir_qfn/$fn")
or do {
warn("Can'\''t move \"$qfn\" into \"$dir_qfn\": $!\n");
next;
};

++$i;
}
'

How can I split a large text file into smaller files with an equal number of lines?

Have a look at the split command:

$ split --help
Usage: split [OPTION] [INPUT [PREFIX]]
Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default
size is 1000 lines, and default PREFIX is `x'. With no INPUT, or when INPUT
is -, read standard input.

Mandatory arguments to long options are mandatory for short options too.
-a, --suffix-length=N use suffixes of length N (default 2)
-b, --bytes=SIZE put SIZE bytes per output file
-C, --line-bytes=SIZE put at most SIZE bytes of lines per output file
-d, --numeric-suffixes use numeric suffixes instead of alphabetic
-l, --lines=NUMBER put NUMBER lines per output file
--verbose print a diagnostic to standard error just
before each output file is opened
--help display this help and exit
--version output version information and exit

You could do something like this:

split -l 200000 filename

which will create files each with 200000 lines named xaa xab xac ...

Another option, split by size of output file (still splits on line breaks):

 split -C 20m --numeric-suffixes input_filename output_prefix

creates files like output_prefix01 output_prefix02 output_prefix03 ... each of maximum size 20 megabytes.

split directory with 10000 files into 2 directories

One approach could be to iterate over the files in the folder, keep and counter and move they files the other directory on each iteration:

counter=0
mkdir -p logos-0
mkdir -p logos-1
for file in logos/*
do
[ -e "$file" ] || continue
echo mv "$file" "logos-$((counter++%2))/"
done

Remove the echo if the mv commands looks appropriate.



Related Topics



Leave a reply



Submit