Inconsistent Strcmp() Return Value When Passing Strings as Pointers or as Literals

Inconsistent strcmp() return value when passing strings as pointers or as literals

TL:DR: Use gcc -fno-builtin-strcmp so strcmp() isn't treated as equivalent to __builtin_strcmp(). With optimization disabled, GCC will only be able to do constant-propagation within a single statement, not across statements. The actual library version subtracts the differing character; the compile-time eval probably normalizes the result to 1 / 0 / -1, which isn't required or guaranteed by ISO C.


You are most likely seeing the result of a compiler optimization. If we test the code using gcc on godbolt, with -O0 optimization level, we can see for the first case it does not call strcmp:

movl    $-1, %esi   #,
movl $.LC0, %edi #,
movl $0, %eax #,
call printf #

Since your are using constants as arguments to strcmp the compiler is able for perform constant folding and call a compiler intrinsic at compile time and generate the -1 then, instead of having to call strcmp at run-time which is implemented in the standard library and will have a different implementation then a likely more simple compile time strcmp.

In the second case it does generate a call to strcmp:

call    strcmp  #
movl %eax, %esi # D.2047,
movl $.LC0, %edi #,
movl $0, %eax #,
call printf #

This is consistent with the fact that gcc has a builtin for strcmp, which is what gcc will use during constant folding.

If we further test using -O1 optimization level or greater gcc is able to fold both cases and the result will be -1 for both cases:

movl    $-1, %esi   #,
movl $.LC0, %edi #,
xorl %eax, %eax #
call printf #
movl $-1, %esi #,
movl $.LC0, %edi #,
xorl %eax, %eax #
call printf #

With more optimizations options turned on the optimizer is able to determine that a and b point to constants known at compile time as well and can also compute the result of strcmp for this case as well during compile time.

We can confirm that gcc is using builtin function by building with the -fno-builtin flag and observing that a call to strcmp will be generated for all cases.

clang is slightly different in that it does not fold at all using -O0 but will fold at -O1 and above for both.

Note, that any negative result is an entirely conformant, we can see by going to the draft C99 standard section 7.21.4.2 The strcmp function which says (emphasis mine):

int strcmp(const char *s1, const char *s2);

The strcmp function returns an integer greater than, equal to, or less
than zero
, accordingly as the string pointed to by s1 is greater than,
equal to, or less than the string pointed to by s2.

technosurus points out that strcmp is specified to treat the strings as if they were composed of unsigned char, this is covered in C99 under 7.21.1 which says:

For all functions in this subclause, each character shall be
interpreted as if it had the type unsigned char (and therefore every
possible object representation is valid and has a different value).

Weird return value in strcmp

It looks like you didn't enable optimizations (e.g. -O2).

From my tests it looks like gcc always recognizes strcmp with constant arguments and optimizes it, even with -O0 (no optimizations). Clang needs at least -O1 to do so.

That's where the difference comes from: The code produced by clang calls strcmp twice, but the code produced by gcc just does printf("%d\n", 1) in the first case because it knows that 'h' > 'H' (ASCIIbetically, that is). It's just constant folding, really.

Live example: https://godbolt.org/z/8Hg-gI

As the other answers explain, any positive value will do to indicate that the first string is greater than the second, so the compiler optimizer simply chooses 1. The strcmp library function apparently uses a different value.

Why the returns of strcmp is different?

Why the output is different

Because all that matters is the sign (positive, negative or zero) of the return value. strcmp() is not required to return +1 or -1, nor does it have to return consistent values. I suspect that in the first and third case, the compiler optimizes away the call to strcmp() and puts -1 into the place of the return value. In the second case, I think the function is actually called.

what is the code of strcmp?

Deducing from the fact that it seemingly returns the difference between the character codes of the first differing character, I'd say this is glibc's strcmp():

int
strcmp (p1, p2)
const char *p1;
const char *p2;
{
register const unsigned char *s1 = (const unsigned char *) p1;
register const unsigned char *s2 = (const unsigned char *) p2;
unsigned char c1, c2;

do
{
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0')
return c1 - c2;
}
while (c1 == c2);

return c1 - c2;
}

Edit: @AndreyT doesn't believe me, so here's the assembly GCC 4.2 generated for me (OS X 10.7.5 64-bit Intel, default optimization level - no flags):

    .section    __TEXT,__text,regular,pure_instructions
.globl _main
.align 4, 0x90
_main:
Leh_func_begin1:
pushq %rbp
Ltmp0:
movq %rsp, %rbp
Ltmp1:
subq $32, %rsp
Ltmp2:
leaq L_.str(%rip), %rax
movq %rax, -16(%rbp)
leaq L_.str1(%rip), %rax
movq %rax, -24(%rbp)
movl $-1, %ecx ; <- THIS!
xorb %dl, %dl
leaq L_.str2(%rip), %rsi
movq %rsi, %rdi
movl %ecx, %esi
movq %rax, -32(%rbp)
movb %dl, %al
callq _printf ; <- no call to `strcmp()` so far!
movq -16(%rbp), %rax
movq %rax, %rdi
movq -32(%rbp), %rsi
callq _strcmp ; <- strcmp()
movl %eax, %ecx
xorb %dl, %dl
leaq L_.str2(%rip), %rdi
movl %ecx, %esi
movb %dl, %al
callq _printf ; <- printf()
movq -16(%rbp), %rax
movq -24(%rbp), %rcx
movq %rax, %rdi
movq %rcx, %rsi
callq _strcmp ; <- strcmp()
movl %eax, %ecx
xorb %dl, %dl
leaq L_.str2(%rip), %rdi
movl %ecx, %esi
movb %dl, %al
callq _printf ; <- printf()
movl $0, -8(%rbp)
movl -8(%rbp), %eax
movl %eax, -4(%rbp)
movl -4(%rbp), %eax
addq $32, %rsp
popq %rbp
ret
Leh_func_end1:

.section __TEXT,__cstring,cstring_literals
L_.str:
.asciz "a"

L_.str1:
.asciz "d"

L_.str2:
.asciz "%d\n"

.section __TEXT,__eh_frame,coalesced,no_toc+strip_static_syms+live_support
EH_frame0:
Lsection_eh_frame:
Leh_frame_common:
Lset0 = Leh_frame_common_end-Leh_frame_common_begin
.long Lset0
Leh_frame_common_begin:
.long 0
.byte 1
.asciz "zR"
.byte 1
.byte 120
.byte 16
.byte 1
.byte 16
.byte 12
.byte 7
.byte 8
.byte 144
.byte 1
.align 3
Leh_frame_common_end:
.globl _main.eh
_main.eh:
Lset1 = Leh_frame_end1-Leh_frame_begin1
.long Lset1
Leh_frame_begin1:
Lset2 = Leh_frame_begin1-Leh_frame_common
.long Lset2
Ltmp3:
.quad Leh_func_begin1-Ltmp3
Lset3 = Leh_func_end1-Leh_func_begin1
.quad Lset3
.byte 0
.byte 4
Lset4 = Ltmp0-Leh_func_begin1
.long Lset4
.byte 14
.byte 16
.byte 134
.byte 2
.byte 4
Lset5 = Ltmp1-Ltmp0
.long Lset5
.byte 13
.byte 6
.align 3
Leh_frame_end1:

.subsections_via_symbols

And the original source code:

#include <stdio.h>
#include <string.h>

int main()
{
const char *a = "a";
const char *d = "d";
printf("%d\n", strcmp("a", "d"));
printf("%d\n", strcmp(a, "d"));
printf("%d\n", strcmp(a, d));

return 0;
}

And the output it generated (screenshot for having a better proof):

Sample Image

Implement strcmp function in assembly 64 on linux

strcmp() only guarantees the sign of the result. Something probably got optimized in the second case. You don't need to care that the magnitude is different, so it would be best if you didn't.

The compiler would be within its rights to optimize

printf("strcmp = %d\n",strcmp("hella world", "hello world"));

to

printf("strcmp = %d\n",-1);

When will strcmp not return -1, 0 or 1?

In the C99 standard, §7.21.4.2 The strcmp function:

The strcmp function returns an integer greater than, equal to, or less than zero,
accordingly as the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2.

Emphasis added.

It means the standard doesn't guarantee about the -1, 0 or 1; it may vary according to operating systems.

The value you are getting is the difference between w and h which is 15.

In your case hello and world so 'h'-'w' = -15 < 0 and that's why strcmp returns -15.

Getting incompatible integer to pointer conversion error in program. Unsure how/why exactly this is occurring but looking for an explanation

The function strcmp has the following declaration

int strcmp(const char *s1, const char *s2);

As you can see the both its parameters have the pointer type const char *.

But in you program in this call of strcmp

if (strcmp(p[i], "-") == 0)

you supplied the first argument p[i] of the type char. It seems you want to compare two characters

if ( p[i] == '-' )

You could use the function strcmp but you have to supply a string as the first argument of the function something like

if ( strcmp( &p[i], "-" ) == 0 )

This call of strcmp is semantically correct but the condition will evaluate to true only in one case when the string pointed to by the pointer expression &p[i] also represents the string literal "-". In other cases the if statement will evaluates to false.

Pay attention to that the parameter of the function howmanyDash should have the qualifier const because the passed string is not changed within the function. And there is not necessary to use any standard C string function (though you could use the standard function strchr).

The function can be declared and defined the following way.

size_t howmanyDash( const char s[] )
{
size_t count = 0;

for ( ; *s; ++s )
{
if ( *s == '-' )
{
++count;
}
}

return count;
}

And in main you can write

size_t dashCount = howManyDash(word);
printf("Dashes: %zu\n", dashCount);

With using the function strchr the function howManyDash can be written the following way

size_t howmanyDash( const char s[] )
{
size_t count = 0;
const char c = '-';

for ( ; ( s = strchr( s, c ) ) != NULL; ++s )
{
++count;
}

return count;
}

Comparing 2 Strings, one in a struct other not C programming

int checkProduct (char nameCheck[100])

Note that the type signature is a lie. The signature should be

int checkProduct(char *nameCheck)

since the argument the function expects and receives is a pointer to a char, or, to document it for the user that the argument should be a pointer to the first element of a 0-terminated char array

int checkProduct(char nameCheck[])

Arrays are never passed as arguments to functions, as function arguments, and in most circumstances [the exceptions are when the array is the operand of sizeof, _Alignof or the address operator &] are converted to pointers to the first element.

    {
product temp;
p.pName = nameCheck;

Arrays are not assignable. The only time you can have an array name on the left of a = is initialisation at the point where the array is declared.

You probably want

strcpy(p.pName, nameCheck);

there.

        rewind (pfp);
while (fread(&temp,STRUCTSIZE,1,pfp)==1)
{
if (strcmp (temp.pName,p.pName))

strcmp returns a negative value if the first argument is lexicographically smaller than the second, 0 if both arguments are equal, and a positive value if the first is lexicographically larger than the second.

You probably want

if (strcmp(temp.pName, p.pName) == 0)

there.

    gets (nameCheck);

Never use gets. It is extremely unsafe (and has been remoed from the language in the last standard, yay). Use

fgets(nameCheck, sizeof nameCheck, stdin);

but that stores the newline in the buffer if there is enough space, so you have to overwrite that with 0 if present.

If you are on a POSIX system and don't need to care about portability, you can use getline() to read in a line without storing the trailing newline.

    checkProduct (nameCheck);

You check whether the product is known, but throw away the result. Store it in a variable.

    while (checkProduct == 1)

checkProduct is a function. In almost all circumstances, a function designator is converted into a pointer, hence the warning about the comparison between a pointer and an integer. You meant to compare to the value of the call you should have stored above.

    {
printf ("Product Already Exists!\n Enter another!\n");
while (getchar() !='\n')

You read in characters without storing them. So you will never change the contents of nameCheck, and then be trapped in an infinite loop.

        {
continue;
}

If the only statement in a loop body is continue;, you should leave the body empty.

    }
p.pName = nameCheck;

Once again, you can't assign to an array.


Concerning the edit,

char *nameCheck;
nameCheck = "";
fgets(nameCheck,sizeof nameCheck, stdin);

you have changed nameCheck from an array to a pointer. That means that sizeof nameCheck now doesn't give the number of chars you can store in the array, but the size of a pointer to char, which is independent of what it points to (usually 4 on 32-bit systems and 8 on 64-bit systems).

And you let that pointer point to a string literal "", which is the reason for the crash. Attempting to modify string literals is undefined behaviour, and more often than not leads to a crash, since string literals are usually stored in a read-only segment of the memory nowadays.

You should have left it at

char nameCheck[100];
fgets(nameCheck, sizeof nameCheck, stdin);

and then you can use sizeof nameCheck to tell fgets how many characters it may read, or, alternatively, you could have a pointer and malloc some memory,

#define NAME_LENGTH 100
char *nameCheck = malloc(NAME_LENGTH);
if (nameCheck == NULL) {
// malloc failed, handle it if possible, or
exit(EXIT_FAILURE);
}
fgets(nameCheck, NAME_LENGTH, stdin);

Either way, after getting input, remove the newline if there is one:

size_t len = strlen(nameCheck);
if (len > 0 && nameCheck[len-1] == '\n') {
nameCheck[len-1] = 0;
}
// Does windows also add a '\r' when reading from stdin?
if (len > 1 && nameCheck[len-2] == '\r') {
nameCheck[len-2] = 0;
}


Related Topics



Leave a reply



Submit