Is "Argv[0] = Name-Of-Executable" an Accepted Standard or Just a Common Convention

Is argv[0] = name-of-executable an accepted standard or just a common convention?

Guesswork (even educated guesswork) is fun but you really need to go to the standards documents to be sure. For example, ISO C11 states (my emphasis):

If the value of argc is greater than zero, the string pointed to by argv[0] represents the program name; argv[0][0] shall be the null character if the program name is not available from the host environment.

So no, it's only the program name if that name is available. And it "represents" the program name, not necessarily is the program name. The section before that states:

If the value of argc is greater than zero, the array members argv[0] through argv[argc-1] inclusive shall contain pointers to strings, which are given implementation-defined values by the host environment prior to program startup.

This is unchanged from C99, the previous standard, and means that even the values are not dictated by the standard - it's up to the implementation entirely.

This means that the program name can be empty if the host environment doesn't provide it, and anything else if the host environment does provide it, provided that "anything else" somehow represents the program name. In my more sadistic moments, I would consider translating it into Swahili, running it through a substitution cipher then storing it in reverse byte order :-).

However, implementation-defined does have a specific meaning in the ISO standards - the implementation must document how it works. So even UNIX, which can put anything it likes into argv[0] with the exec family of calls, has to (and does) document it.

Windows vs. Linux GCC argv[0] value

No, there isn't. Under most shells on Linux, argv[0] contains exactly what the user typed to run the binary. This allows binaries to do different things depending on what the user types.

For example, a program with several different command-line commands may install the binary once, and then hard-link the various different commands to the same binary. For example, on my system:


$ ls -l /usr/bin/git*
-rwxr-xr-x 109 root wheel 2500640 16 May 18:44 /usr/bin/git
-rwxr-xr-x 2 root wheel 121453 16 May 18:43 /usr/bin/git-cvsserver
-rwxr-xr-x 109 root wheel 2500640 16 May 18:44 /usr/bin/git-receive-pack
-rwxr-xr-x 2 root wheel 1021264 16 May 18:44 /usr/bin/git-shell
-rwxr-xr-x 109 root wheel 2500640 16 May 18:44 /usr/bin/git-upload-archive
-rwxr-xr-x 2 root wheel 1042560 16 May 18:44 /usr/bin/git-upload-pack
-rwxr-xr-x 1 root wheel 323897 16 May 18:43 /usr/bin/gitk

Notice how some of these files have exactly the same size. More investigation reveals:


$ stat /usr/bin/git
234881026 459240 -rwxr-xr-x 109 root wheel 0 2500640 "Oct 29 08:51:50 2011" "May 16 18:44:05 2011" "Jul 26 20:28:29 2011" "May 16 18:44:05 2011" 4096 4888 0 /usr/bin/git
$ stat /usr/bin/git-receive-pack
234881026 459240 -rwxr-xr-x 109 root wheel 0 2500640 "Oct 29 08:51:50 2011" "May 16 18:44:05 2011" "Jul 26 20:28:29 2011" "May 16 18:44:05 2011" 4096 4888 0 /usr/bin/git-receive-pack

The inode number (459240) is identical and so these are two links to the same file on disk. When run, the binary uses the contents of argv[0] to determine which function to execute. You can see this (sort of) in the code for Git's main().

When can argv[0] have null?

With the exec class of calls, you specify the program name and program executable separately so you can set it to NULL then.

But that quote is actually from the ISO standard (possibly paraphrased) and that standard covers a awfully large range of execution environments from the smallest micro-controller to the latest z10 Enterprise-class mainframe.

Many of those embedded systems would be in the situation where an executable name makes little sense.

From the latest c1x draft:

The value of argc shall be nonnegative.

The value argv[argc] shall be a null pointer.

If the value of argc is greater than zero, the array members argv[0] through argv[argc-1] inclusive shall contain pointers to strings, which are given implementation-defined values by the host environment prior to program start up.

This means that, if argc is zero (and it can be), argv[0] is NULL.

But, even when argc is not 0, you may not get the program name, since the standard also states:

If the value of argc is greater than zero, the string pointed to by argv[0] represents the program name; argv[0][0] shall be the null character if the program name is not available from the host environment. If the value of argc is greater than one, the strings pointed to by argv[1] through argv[argc-1] represent the program parameters.

So, there is no requirement under the standard that a program name be provided. I've seen programs use a wide selection of options for this value:

  • no value at all (for supposed security).
  • a blatant lie (such as sleep for a malicious piece of code).
  • the actual program name (such as sleep).
  • a slightly modified one (such as -ksh for the login shell).
  • a descriptive name (e.g., progname - a program for something).

What is the type of command-line argument `argv` in C?

Directly quoting from C11, chapter §5.1.2.2.1/p2, program startup, (emphasis mine)

int main(int argc, char *argv[]) { /* ... */ }

[...] If the value of argc is greater than zero, the array members argv[0] through
argv[argc-1] inclusive shall contain pointers to strings
, [...]

and

[...] and the strings pointed to by the argv array [...]

So, basically, argv is a pointer to the first element of an array of strings note. This can be made clearer from the alternative form,

int main(int argc, char **argv) { /* ... */ }

You can rephrase that as pointer to the first element of an array of pointers to the first element of null-terminated char arrays, but I'd prefer to stick to strings .


NOTE:

To clarify the usage of "pointer to the first element of an array" in above answer, following §6.3.2.1/p3

Except when it is the operand of the sizeof operator, the _Alignof operator, or the
unary & operator, or is a string literal used to initialize an array, an expression that has
type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points
to the initial element of the array object
and is not an lvalue. [...]

Can argv[0] contain an empty string?

It's implementation defined. §5.1.2.2.1 abridged:

  • If the value of argc is greater than zero, the array members argv[0] through
    argv[argc-1] inclusive shall contain pointers to strings, which are given
    implementation-defined values by the host environment prior to program startup. The
    intent is to supply to the program information determined prior to program startup
    from elsewhere in the hosted environment. [...]

  • If the value of argc is greater than zero, the string pointed to by argv[0]
    represents the program name; argv[0][0] shall be the null character if the
    program name is not available from the host environment. [...]

So if argc is greater than zero, it's quite the intention that argv[0] never be an empty string, but it could happen. (Note that with argc equal to n, argv[0] through argv[n - 1] are never null and always point to a string. The string itself may be empty, though. If n is zero, argv[0] is null.)

In practice, of course, you just need to make sure the platforms your targetting behave as needed.



Related Topics



Leave a reply



Submit