How to Use Find on Dirs with White Spaces

How to use find on dirs with white spaces?

It can be done in several ways, but I find it much better to do it this way:

find . -type f -name \*.jpg | while read i ; do echo "Procesing $i..." ; done

Bash: Find directories with trailing spaces at the end of each name

The following is written to be easy-to-follow (as a secondary goal), and correct in corner cases (as a primary goal):

# because "find"'s usage is dense, we're defining that command in an array, so each
# ...element of that array can have its usage described.
find_cmd=(
find # run the tool 'find'
. # searching from the current directory
-depth # depth-first traversal so we don't invalidate our own renames
-type d # including only directories in results
-name '*[[:space:]]' # and filtering *those* for ones that end in spaces
-print0 # ...delimiting output with NUL characters
)

shopt -s extglob # turn on extended glob syntax
while IFS= read -r -d '' source_name; do # read NUL-separated values to source_name
dest_name=${source_name%%+([[:space:]])} # trim trailing whitespace from name
mv -- "$source_name" "$dest_name" # rename source_name to dest_name
done < <("${find_cmd[@]}") # w/ input from the find command defined above

See also:

  • BashFAQ #1, describing the while read loop's syntax in general, and the purposes of the specific amendments (IFS=, read -r, etc) above.
  • BashFAQ #100, describing how to do string manipulation in bash (particularly including the ${var%suffix} syntax, known as "parameter expansion", used to trim a suffix from a value above).
  • Using Find, providing a general introduction both to the use of find and its integration with bash.

Get the name of the folder from a path with whitespace

When dealing with whitespaces, all variables should be double-quoted when passed as command line arguments, so bash would know to treat them as a single parameter:

mypath="/Users/ckull/Desktop/Winchester stuff/a b c/some other folder/"
dir="$(basename "$mypath")" # quote also around $mypath!
echo "lookig in $dir"
# examples
ls "$dir" # quote only around $dir!
cp "$dir/a.txt" "$dir/b.txt"

This is how variable expansion occurs in bash:

var="aaa bbb"
# args: 0 1 2 3
foo $var ccc # ==> "foo" "aaa" "bbb" "ccc"
foo "$var" ccc # ==> "foo" "aaa bbb" "ccc"
foo "$var ccc" # ==> "foo" "aaa bbb ccc"

Find all files inside a folder that do not contain a whitespace

find . -type f \( -exec grep -q '[[:space:]]' {} \; -o -print \)

When grep finds files with whitespace, it returns "success". If the command in -exec is successful, find allows the next predicate to operate; however, if the next operator is -o for "OR", then find only allows the next predicate to operate if the command in -exec is not successful. That's why the above works: It matches files that have whitespace, but then only prints out files whose names do not match. (The parentheses are necessary so the that -type f isn't also subject to the "or" – otherwise we'd get all things that are not files, like directory names.) You can limit it just to *.js files, if you like:

find . -type f -name '*.js' \
\( -exec grep -q '[[:space:]]' {} \; -o -print \)

It's worth noting that grep is not a good tool if you want to detect newlines. For that, you might consider something brute-force:

for file in "$d"/*.js; do
origcheck=$(md5sum < "$file")
nospacecheck=$(tr -d '[:space:]' < "$file" | md5sum)
[[ "$origcheck" = "$nospacecheck" ]] || printf '%s\n' "$file"
done

This creates a checksum of each matching file with, and without all of its whitespace. If the checksums are the same, the file never had any whitespace. (But many files end with a newline, so watch out.)

Notes on the original approach:

The grep manpage on my computer says

-L … Only the names of files not containing selected lines are written…
If the standard input is searched, the string ``(standard input)'' is written.

But the standards do not mention -L, so there is no guarantee that it behaves that way in other implementations. Here are some experiments:

Quick sanity check:

$ grep -L '[a]' <<< 'a'
$ grep -L '[a]' <<< 'b'
(standard input)

So far, so good.

$ grep -L '[ \t]' <<< 'ab c'
$ grep -L '[ \t]' <<< $'ab\tc'
(standard input)

(In bash, we can write literal characters like tabs and newlines with a special form of quoting that interprets backslash escapes. Here, $'\t' expands to a literal tab character.) So we see that the string with the space is a match, but the string with the literal tab is not a match.

$ grep -L '[ \t]' <<< t
$ grep -L '[ \t]' <<< '\'
$

The fact that a literal 't' is a match is evidence that the backslash-t is not a tab to grep. A literal backslash is a match, too, so it seems the expression is being taken by grep at face value. Well, we know one way to express a real tab:

$ grep -L $'[ \t]' <<< $'\t'
$ grep -L $'[ \t]' <<< 't'
(standard input)
$ grep -L $'[ \t]' <<< '\'
(standard input)

So the problem with the original expression was that we weren't looking for files that had no spaces or tabs: We were looking for files that had no spaces, backslash or 't' characters.

I avoided talking about * until now, but that matches zero or more characters, so even if you get the character class to match the right characters, following it with an asterisk will not get the results you want:

$ grep -L $'[ \t]*' <<< $'\t'
$ grep -L $'[ \t]*' <<< t
$

Do the above input strings have zero or more tab characters? Yes. Both of them do. You just want to find one character, so don't make it complicated.

But what about [[:space:]]?

$ grep -L '[[:space:]]' <<< ' '
$ grep -L '[[:space:]]' <<< $'\t'
$ grep -L '[[:space:]]' <<< x
(standard input)

Well, this one I can't explain. It all works as expected on both machines I tested it on (OS X and Linux). Perhaps you originally had an asterisk after '[[:space:]]'? I don't know. It's a mystery.

find … | xargs

Piping find to xargs can introduce problems itself. The shell does wordsplitting on argument names, and pipes like this can lose information about spaces in the actual filenames being passed. It's a rare enough case that many simply don't think or care about it, but it can and does happen, and it's not really hard to solve it.

First, find has -exec, so instead of

find . -some -predicate | xargs some command

you can simply write

find . -some -predicate -exec some command {} +

If, for some reason, you really want to use xargs (perhaps you want to take advantage of parallelization), then tell both find and xargs that filenames are to be delimited with the NUL character instead of space:

find . -some -predicate -print0 | xargs -0 some command

Find files with spaces in the bash

Use find command with a space between two wildcards. It will match files with single or multiple spaces. "find ." will find all files in current folder and all the sub-folders. "-type f" will only look for files and not folders.

find . -type f -name "* *"

EDIT

To replace the spaces with underscores, try this

find . -type f -name "* *" | while read file; do mv "$file" ${file// /_}; done

Removing white spaces from files but not from directories throws an error

What is passed to xargs is the full path of the file, not just the file name. So your s/ // substitute command also removes spaces from the directory part. And as the new directories (without spaces) don't exist you get the error you see. The renaming, in your example, was:

./FOLDER WITH SPACES/FOLDER1.1/SUBFOLDER1.1/FILE.01 A.jpg ->
./FOLDERWITH SPACES/FOLDER1.1/SUBFOLDER1.1/FILE.01 A.jpg

And this is not possible if directories ./FOLDERWITH SPACES/FOLDER1.1/SUBFOLDER1.1 don't already exist.

Try with the -d option of rename:

find . -type f -name '* *' -print0 | xargs -0 rename -d 's/ //'

(the -d option only renames the filename component of the path.)

Note that you don't need xargs. You could use the -execdir action of find:

find . -type f -name '* *' -execdir rename 's/ //' {} +

And as the -execdir command is executed in the subdirectory containing the matched file, you don't need the -d option of rename any more. And the -print0 action of find is not needed neither.

Last note: if you want to replace all spaces in the file names, not just the first one, do not forget to add the g flag: rename 's/ //g'.



Related Topics



Leave a reply



Submit