Differentiate Between a Unix Directory and File in C and C++

Differentiate between a unix directory and file in C and C++

The following code uses the stat() function and the S_ISDIR ('is a directory') and S_ISREG ('is a regular file') macros to get information on the file. The rest is just error checking and enough to make a complete compilable program.

#include <stdio.h>
#include <errno.h>
#include <sys/stat.h>

int main (int argc, char *argv[]) {
    int status;
    struct stat st_buf;

    // Ensure argument passed.

    if (argc != 2) {
        printf ("Usage: progName <fileSpec>\n");
        printf ("       where <fileSpec> is the file to check.\n");
        return 1;
    }

    // Get the status of the file system object.

    status = stat (argv[1], &st_buf);
    if (status != 0) {
        printf ("Error, errno = %d\n", errno);
        return 1;
    }

    // Tell us what it is then exit.

    if (S_ISREG (st_buf.st_mode)) {
        printf ("%s is a regular file.\n", argv[1]);
    }
    if (S_ISDIR (st_buf.st_mode)) {
        printf ("%s is a directory.\n", argv[1]);
    }

    return 0;
}

Sample runs are shown here:


pax> vi progName.c ; gcc -o progName progName.c ; ./progName
Usage: progName 
       where  is the file to check.

pax> ./progName /home
/home is a directory.

pax> ./progName .profile
.profile is a regular file.

pax> ./progName /no_such_file
Error, errno = 2

Difference between ./ and ~/

./ means "starting from the current directory". . refers to the current working directory, so something like ./foo.bar would be looking for a file called foo.bar in the current directory. (As a side note, .. means refers to the parent directory of the current directory. So ../foo.bar would be looking for that file one directory above.)

~/ means "starting from the home directory". This could have different meanings in different scenarios. For example, in a Unix environment ~/foo.bar would be looking for a file called foo.bar in your home directory, something like /home/totzam/foo.bar. In many web applications, ~/foo.bar would be looking for a file called foo.bar in the web application root, something like /var/http/mywebapp/foo.bar.

What is the difference between a directory and a folder?

Check "The folder metaphor" section at Wikipedia. It states:

There is a difference between a directory, which is a file system concept, and the graphical user interface metaphor that is used to represent it (a folder). For example, Microsoft Windows uses the concept of special folders to help present the contents of the computer to the user in a fairly consistent way that frees the user from having to deal with absolute directory paths, which can vary between versions of Windows, and between individual installations. ...
If one is referring to a container of documents, the term folder is more appropriate. The term directory refers to the way a structured list of document files and folders is stored on the computer. The distinction can be due to the way a directory is accessed; on Unix systems, /usr/bin/ is usually referred to as a directory when viewed in a command line console, but if accessed through a graphical file manager, users may sometimes call it a folder.

Checking if a file is a directory or just a file

You can call the stat() function and use the S_ISREG() macro on the st_mode field of the stat structure in order to determine if your path points to a regular file:

#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>

int is_regular_file(const char *path)
{
    struct stat path_stat;
    stat(path, &path_stat);
    return S_ISREG(path_stat.st_mode);
}

Note that there are other file types besides regular and directory, like devices, pipes, symbolic links, sockets, etc. You might want to take those into account.

Difference in using file path and file name in file-handling C

The string in the first argument has to match the file name exactly. It is simply a label which indicates to the file system the name of the requested file.

The terminology in the documentation is not entirely stable; both the phrases you are asking about basically mean the same thing.
The path terminology is strictly speaking more correct, and emphasizes that the string may contain a relative or absolute directory path as well as the file name within that directory.

C does not have any concept of a "file extension" which however is useful on some platforms to identify the file's "type". For example, Windows uses the file extension to identify (in rough terms) which application a file belongs to. But as far as C is concerned, if the file name includes an extension, that is a mandatory part of the name, and needs to be included.

(Technically, the OS could decide on your behalf which file to open if you omit the extension, but this is not how modern systems work. But for example, VMS had the concept of a file version, which was optional; if you omitted this part of the file name, the OS would always open the newest version of the file.)

If you want to open the file "/path/to/data.csv" then that is a valid file path. If your current working directory is /path/to then you can simply omit the directory, and open "data.csv" directly. You can also specify a relative path like "./data.csv" which simply uses the . alias for the current directory.

There isn't really a dichotomy between .txt and .bin files based on the extension, though some systems make a distinction between "text" and "binary" files on another level. In very brief, binary files can contain arbitrary byte streams, whereas text files have some conventions and (on some legacy systems) restrictions on what they can contain. These days, the distinction mainly pertains to normalization of line endings, where different systems still have different conventions for how to terminate a line of text; Windows uses CRLF, while Unix-based systems use plain LF. The identification of a byte stream as "text" offers some guidance for how to treat such differences.

Difference between compiled filename and source filename. C

The name of the executable file can be anything, as long as the operating system is okay with it.

By default, some build tools name the executable file a.out. In many common Unix tools, this is easily changed with the command-line switch -o name, which says to put the output file in a file named name.

Executable files automatically become commands because, when a command is typed in a command-line shell and it is not a command built into the shell, the shell looks for a file with that name and, if it is executable, it executes the file. When making use of this, you would like to have the file name be meaningful to the user. a.out is not generic and not particularly meaningful, so programs are usually given other names.

For simple programs, the name of the sole source file or the primary source file is often used as the name of the executable file (or vice-versa) except that the source file as an extension designating its programming language (like .c) and the executable file either has no extension or has some extension like .exe (for “executable”).

Command-line shells might look only in certain directories for executable files. Often they can be told to search in the current directory, but that is not a good idea because it makes it too easy to run one program by accident when another is intended. In particular, an attacker might be able to get a person or program to run a program that the attacker controls if the shell is lax about where it searches for executable files. Because of this, you usually need to run programs in the current directory using ./name, which indicates you deliberately want to run a program in the current directory.

Many people designate a directory in their home folder where they put executable files and add that directory to the path their shell searches for executables.

what is the difference between ./ and / (root and current directory)?

./b and /b are the same thing if and only if your current working directory is /. You should use the former^(a).

By way of further clarification, let's say your current working directory is /my_code_dir and you have code of the form:

#include "./b.h"
#include "/b.h"

Putting aside the whole issue of C inclusions being implementation-defined, the former will use /my_code_dir/b.h and the latter will use /b.h.

^(a) Assuming they are your only two choices, of course. In any decent-sized development environment, you probably should be avoiding these "breadcrumb"-style paths (like ../../../include/xyzzy/plugh.h) and instead rely on the your environment setting up include paths for you (so you can just use xyzzy/plugh.h). That way, things can move around freely without having to go and change large swathes of code.

Compare two folders which have many files inside contents

To get summary of new/missing files, and which files differ:

diff -arq folder1 folder2

a treats all files as text, r recursively searched subdirectories, q reports 'briefly', only when files differ

Differentiate Between a Unix Directory and File in C and C++