Removing a Non Empty Directory Programmatically in C or C++

Removing a non empty directory programmatically in C or C++

You want to write a function (a recursive function is easiest, but can easily run out of stack space on deep directories) that will enumerate the children of a directory. If you find a child that is a directory, you recurse on that. Otherwise, you delete the files inside. When you are done, the directory is empty and you can remove it via the syscall.

To enumerate directories on Unix, you can use opendir(), readdir(), and closedir(). To remove you use rmdir() on an empty directory (i.e. at the end of your function, after deleting the children) and unlink() on a file. Note that on many systems the d_type member in struct dirent is not supported; on these platforms, you will have to use stat() and S_ISDIR(stat.st_mode) to determine if a given path is a directory.

On Windows, you will use FindFirstFile()/FindNextFile() to enumerate, RemoveDirectory() on empty directories, and DeleteFile() to remove files.

Here's an example that might work on Unix (completely untested):

int remove_directory(const char *path) {
   DIR *d = opendir(path);
   size_t path_len = strlen(path);
   int r = -1;

   if (d) {
      struct dirent *p;

      r = 0;
      while (!r && (p=readdir(d))) {
          int r2 = -1;
          char *buf;
          size_t len;

          /* Skip the names "." and ".." as we don't want to recurse on them. */
          if (!strcmp(p->d_name, ".") || !strcmp(p->d_name, ".."))
             continue;

          len = path_len + strlen(p->d_name) + 2; 
          buf = malloc(len);

          if (buf) {
             struct stat statbuf;

             snprintf(buf, len, "%s/%s", path, p->d_name);
             if (!stat(buf, &statbuf)) {
                if (S_ISDIR(statbuf.st_mode))
                   r2 = remove_directory(buf);
                else
                   r2 = unlink(buf);
             }
             free(buf);
          }
          r = r2;
      }
      closedir(d);
   }

   if (!r)
      r = rmdir(path);

   return r;
}

How to delete a directory and its contents in (POSIX) C?

You need to use nftw() (or possibly ftw()) to traverse the hierarchy.
You need to use unlink() to remove files and other non-directories.
You need to use rmdir() to remove (empty) directories.

You would be better off using nftw() (rather than ftw()) since it gives you controls such as FTW_DEPTH to ensure that all files under a directory are visited before the directory itself is visited.

If I want to empty a directory, is there any reason I shouldn't remove it and recreate It?

The rm -fr /some/directory command has to do the recursive work for you. Using the command is a lot simpler than writing your own code to do the same job — it embodies the virtue of laziness and exploits code reuse on the program scale. (You can't use the rmdir() system call on a directory unless it is already empty.). So that's not wholly unreasonable.

One issue is security: can someone place an alternative to the system's rm command on your path? On Unix, are /bin and /usr/bin first in your path, or do other directories come first?

If you decide that using /bin/rm (or /usr/bin/rm) is safe enough, that might be a better choice than an unadorned rm, but overall, that isn't too bad.

Another issue is a different aspect of security — can you actually remove and create the directory? Can you write in /some or not? And should you keep the current owner, group, permissions of /some/directory if you remove and recreate it? After those operations, the directory will be owned by the effective UID of the process; if will belong to group with the effective GID of the process (unless there's a sticky bit set on the /some directory — or unless you're on macOS); and the permissions will be 0777 as modified by the current setting of umask().

If these issues are not important, or are circumventable, then remove and recreate is plausible.

Expanding on the comments above:

When you mentioned the system() call in your comment, I had no idea what you were talking about.

The system() function executes the string passed as an argument via a command interpreter. When you write code, you have to think about how it could go wrong when someone malicious is trying to get you run it. When you write "rm -fr /some/directory", you are relying on the shell finding the normal rm command and working properly. However, if your PATH has a value such as $HOME/bin:/bin:/usr/bin (so that commands in your own private bin directory are used in preference to those provided by the system), then if the user can install their own script as $HOME/bin/rm, they can execute arbitrary code with your privileges — which could allow them to leave a way of breaking into your system later keeping all the privileges you have. They might even clean up (most of) the evidence that there was once a $HOME/bin/rm script.

One way to avoid such a problem is to request "/bin/rm -fr /some/directory" (unless rm is /usr/bin and not in /bin, of course). This is arguably safer. There used to be attacks available via the IFS environment variable; these are neutered by modern shells which do not use any inherited value for IFS.

Note that one problem will be interpreting whether the rm command was successful. The -fr option means it will report success under almost all circumstances — but if the directory didn't vanish, your mkdir("/some/directory", 0777) call will fail.

As for the security of the /some/directory, my understanding of what you're trying to say, is that the wrong directory could be selected. I just can't imagine how that would occur.

Assuming that you did have permission to modify the /some directory (you need that to be able to remove /some/directory), and that you could modify all the sub-directories of the old version of /some/directory, then you might start off with /some/directory owned by user victim, group witless and with permission 775, whereas after the command and system call (note that system() is a function rather than a system call; mkdir() is a system call) are successful, the directory might be owned by user victor, group mischief and with permission 0777. This might be less than ideal. If you don't want to break such permission settings, you probably don't want to use the remove and recreate technique — or, not such a simple-minded one as this example shows. You would then have to work a bit harder to remove the contents of the directory without modifying those attributes. You might scan the directory (opendir(), readdir(), closedir() and invoke rm on each name — or sets of names — to clean up thoroughly without removing the directory itself. This is of hybrid complexity. It is more fiddly than simply removing and recreating, but far less fiddly than dealing with full recursive delete over multiple levels.

As I said before, you have to decide whether these issues matter or not. It is important that you're aware that the issues exist, and that you have made a conscious (and informed) decision about whether to handle the issues, and how to handle the issues.

Remember that if your program may be run with root (administrator) privileges, it is a lot more important to be careful — but it still matters even if only ordinary mortal users will run the program.