Inconsistent strcmp() return value when passing strings as pointers or as literals
TL:DR: Use gcc -fno-builtin-strcmp
so strcmp()
isn't treated as equivalent to __builtin_strcmp()
. With optimization disabled, GCC will only be able to do constant-propagation within a single statement, not across statements. The actual library version subtracts the differing character; the compile-time eval probably normalizes the result to 1 / 0 / -1, which isn't required or guaranteed by ISO C.
You are most likely seeing the result of a compiler optimization. If we test the code using gcc on godbolt, with -O0
optimization level, we can see for the first case it does not call strcmp
:
movl $-1, %esi #,
movl $.LC0, %edi #,
movl $0, %eax #,
call printf #
Since your are using constants as arguments to strcmp the compiler is able for perform constant folding and call a compiler intrinsic at compile time and generate the -1
then, instead of having to call strcmp
at run-time which is implemented in the standard library and will have a different implementation then a likely more simple compile time strcmp
.
In the second case it does generate a call to strcmp
:
call strcmp #
movl %eax, %esi # D.2047,
movl $.LC0, %edi #,
movl $0, %eax #,
call printf #
This is consistent with the fact that gcc has a builtin for strcmp, which is what gcc
will use during constant folding.
If we further test using -O1
optimization level or greater gcc
is able to fold both cases and the result will be -1
for both cases:
movl $-1, %esi #,
movl $.LC0, %edi #,
xorl %eax, %eax #
call printf #
movl $-1, %esi #,
movl $.LC0, %edi #,
xorl %eax, %eax #
call printf #
With more optimizations options turned on the optimizer is able to determine that a
and b
point to constants known at compile time as well and can also compute the result of strcmp
for this case as well during compile time.
We can confirm that gcc
is using builtin function by building with the -fno-builtin flag and observing that a call to strcmp
will be generated for all cases.
clang
is slightly different in that it does not fold at all using -O0
but will fold at -O1
and above for both.
Note, that any negative result is an entirely conformant, we can see by going to the draft C99 standard section 7.21.4.2
The strcmp function which says (emphasis mine):
int strcmp(const char *s1, const char *s2);
The strcmp function returns an integer greater than, equal to, or less
than zero, accordingly as the string pointed to by s1 is greater than,
equal to, or less than the string pointed to by s2.
technosurus points out that strcmp
is specified to treat the strings as if they were composed of unsigned char, this is covered in C99 under 7.21.1
which says:
For all functions in this subclause, each character shall be
interpreted as if it had the type unsigned char (and therefore every
possible object representation is valid and has a different value).
Weird return value in strcmp
It looks like you didn't enable optimizations (e.g. -O2
).
From my tests it looks like gcc always recognizes strcmp
with constant arguments and optimizes it, even with -O0
(no optimizations). Clang needs at least -O1
to do so.
That's where the difference comes from: The code produced by clang calls strcmp
twice, but the code produced by gcc just does printf("%d\n", 1)
in the first case because it knows that 'h' > 'H'
(ASCIIbetically, that is). It's just constant folding, really.
Live example: https://godbolt.org/z/8Hg-gI
As the other answers explain, any positive value will do to indicate that the first string is greater than the second, so the compiler optimizer simply chooses 1
. The strcmp
library function apparently uses a different value.
Why the returns of strcmp is different?
Why the output is different
Because all that matters is the sign (positive, negative or zero) of the return value. strcmp()
is not required to return +1 or -1, nor does it have to return consistent values. I suspect that in the first and third case, the compiler optimizes away the call to strcmp()
and puts -1 into the place of the return value. In the second case, I think the function is actually called.
what is the code of strcmp?
Deducing from the fact that it seemingly returns the difference between the character codes of the first differing character, I'd say this is glibc's strcmp()
:
int
strcmp (p1, p2)
const char *p1;
const char *p2;
{
register const unsigned char *s1 = (const unsigned char *) p1;
register const unsigned char *s2 = (const unsigned char *) p2;
unsigned char c1, c2;
do
{
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0')
return c1 - c2;
}
while (c1 == c2);
return c1 - c2;
}
Edit: @AndreyT doesn't believe me, so here's the assembly GCC 4.2 generated for me (OS X 10.7.5 64-bit Intel, default optimization level - no flags):
.section __TEXT,__text,regular,pure_instructions
.globl _main
.align 4, 0x90
_main:
Leh_func_begin1:
pushq %rbp
Ltmp0:
movq %rsp, %rbp
Ltmp1:
subq $32, %rsp
Ltmp2:
leaq L_.str(%rip), %rax
movq %rax, -16(%rbp)
leaq L_.str1(%rip), %rax
movq %rax, -24(%rbp)
movl $-1, %ecx ; <- THIS!
xorb %dl, %dl
leaq L_.str2(%rip), %rsi
movq %rsi, %rdi
movl %ecx, %esi
movq %rax, -32(%rbp)
movb %dl, %al
callq _printf ; <- no call to `strcmp()` so far!
movq -16(%rbp), %rax
movq %rax, %rdi
movq -32(%rbp), %rsi
callq _strcmp ; <- strcmp()
movl %eax, %ecx
xorb %dl, %dl
leaq L_.str2(%rip), %rdi
movl %ecx, %esi
movb %dl, %al
callq _printf ; <- printf()
movq -16(%rbp), %rax
movq -24(%rbp), %rcx
movq %rax, %rdi
movq %rcx, %rsi
callq _strcmp ; <- strcmp()
movl %eax, %ecx
xorb %dl, %dl
leaq L_.str2(%rip), %rdi
movl %ecx, %esi
movb %dl, %al
callq _printf ; <- printf()
movl $0, -8(%rbp)
movl -8(%rbp), %eax
movl %eax, -4(%rbp)
movl -4(%rbp), %eax
addq $32, %rsp
popq %rbp
ret
Leh_func_end1:
.section __TEXT,__cstring,cstring_literals
L_.str:
.asciz "a"
L_.str1:
.asciz "d"
L_.str2:
.asciz "%d\n"
.section __TEXT,__eh_frame,coalesced,no_toc+strip_static_syms+live_support
EH_frame0:
Lsection_eh_frame:
Leh_frame_common:
Lset0 = Leh_frame_common_end-Leh_frame_common_begin
.long Lset0
Leh_frame_common_begin:
.long 0
.byte 1
.asciz "zR"
.byte 1
.byte 120
.byte 16
.byte 1
.byte 16
.byte 12
.byte 7
.byte 8
.byte 144
.byte 1
.align 3
Leh_frame_common_end:
.globl _main.eh
_main.eh:
Lset1 = Leh_frame_end1-Leh_frame_begin1
.long Lset1
Leh_frame_begin1:
Lset2 = Leh_frame_begin1-Leh_frame_common
.long Lset2
Ltmp3:
.quad Leh_func_begin1-Ltmp3
Lset3 = Leh_func_end1-Leh_func_begin1
.quad Lset3
.byte 0
.byte 4
Lset4 = Ltmp0-Leh_func_begin1
.long Lset4
.byte 14
.byte 16
.byte 134
.byte 2
.byte 4
Lset5 = Ltmp1-Ltmp0
.long Lset5
.byte 13
.byte 6
.align 3
Leh_frame_end1:
.subsections_via_symbols
And the original source code:
#include <stdio.h>
#include <string.h>
int main()
{
const char *a = "a";
const char *d = "d";
printf("%d\n", strcmp("a", "d"));
printf("%d\n", strcmp(a, "d"));
printf("%d\n", strcmp(a, d));
return 0;
}
And the output it generated (screenshot for having a better proof):
Implement strcmp function in assembly 64 on linux
strcmp()
only guarantees the sign of the result. Something probably got optimized in the second case. You don't need to care that the magnitude is different, so it would be best if you didn't.
The compiler would be within its rights to optimize
printf("strcmp = %d\n",strcmp("hella world", "hello world"));
to
printf("strcmp = %d\n",-1);
When will strcmp not return -1, 0 or 1?
In the C99 standard, §7.21.4.2 The strcmp
function:
The
strcmp
function returns an integer greater than, equal to, or less than zero,
accordingly as the string pointed to bys1
is greater than, equal to, or less than the string pointed to bys2
.
Emphasis added.
It means the standard doesn't guarantee about the -1
, 0
or 1
; it may vary according to operating systems.
The value you are getting is the difference between w
and h
which is 15
.
In your case hello
and world
so 'h'-'w' = -15 < 0
and that's why strcmp
returns -15.
Getting incompatible integer to pointer conversion error in program. Unsure how/why exactly this is occurring but looking for an explanation
The function strcmp
has the following declaration
int strcmp(const char *s1, const char *s2);
As you can see the both its parameters have the pointer type const char *
.
But in you program in this call of strcmp
if (strcmp(p[i], "-") == 0)
you supplied the first argument p[i]
of the type char
. It seems you want to compare two characters
if ( p[i] == '-' )
You could use the function strcmp
but you have to supply a string as the first argument of the function something like
if ( strcmp( &p[i], "-" ) == 0 )
This call of strcmp
is semantically correct but the condition will evaluate to true only in one case when the string pointed to by the pointer expression &p[i]
also represents the string literal "-"
. In other cases the if statement will evaluates to false.
Pay attention to that the parameter of the function howmanyDash
should have the qualifier const
because the passed string is not changed within the function. And there is not necessary to use any standard C string function (though you could use the standard function strchr
).
The function can be declared and defined the following way.
size_t howmanyDash( const char s[] )
{
size_t count = 0;
for ( ; *s; ++s )
{
if ( *s == '-' )
{
++count;
}
}
return count;
}
And in main you can write
size_t dashCount = howManyDash(word);
printf("Dashes: %zu\n", dashCount);
With using the function strchr
the function howManyDash
can be written the following way
size_t howmanyDash( const char s[] )
{
size_t count = 0;
const char c = '-';
for ( ; ( s = strchr( s, c ) ) != NULL; ++s )
{
++count;
}
return count;
}
Comparing 2 Strings, one in a struct other not C programming
int checkProduct (char nameCheck[100])
Note that the type signature is a lie. The signature should be
int checkProduct(char *nameCheck)
since the argument the function expects and receives is a pointer to a char
, or, to document it for the user that the argument should be a pointer to the first element of a 0-terminated char
array
int checkProduct(char nameCheck[])
Arrays are never passed as arguments to functions, as function arguments, and in most circumstances [the exceptions are when the array is the operand of sizeof
, _Alignof
or the address operator &
] are converted to pointers to the first element.
{
product temp;
p.pName = nameCheck;
Arrays are not assignable. The only time you can have an array name on the left of a =
is initialisation at the point where the array is declared.
You probably want
strcpy(p.pName, nameCheck);
there.
rewind (pfp);
while (fread(&temp,STRUCTSIZE,1,pfp)==1)
{
if (strcmp (temp.pName,p.pName))
strcmp
returns a negative value if the first argument is lexicographically smaller than the second, 0 if both arguments are equal, and a positive value if the first is lexicographically larger than the second.
You probably want
if (strcmp(temp.pName, p.pName) == 0)
there.
gets (nameCheck);
Never use gets
. It is extremely unsafe (and has been remoed from the language in the last standard, yay). Use
fgets(nameCheck, sizeof nameCheck, stdin);
but that stores the newline in the buffer if there is enough space, so you have to overwrite that with 0
if present.
If you are on a POSIX system and don't need to care about portability, you can use getline()
to read in a line without storing the trailing newline.
checkProduct (nameCheck);
You check whether the product is known, but throw away the result. Store it in a variable.
while (checkProduct == 1)
checkProduct
is a function. In almost all circumstances, a function designator is converted into a pointer, hence the warning about the comparison between a pointer and an integer. You meant to compare to the value of the call you should have stored above.
{
printf ("Product Already Exists!\n Enter another!\n");
while (getchar() !='\n')
You read in characters without storing them. So you will never change the contents of nameCheck
, and then be trapped in an infinite loop.
{
continue;
}
If the only statement in a loop body is continue;
, you should leave the body empty.
}
p.pName = nameCheck;
Once again, you can't assign to an array.
Concerning the edit,
char *nameCheck;
nameCheck = "";
fgets(nameCheck,sizeof nameCheck, stdin);
you have changed nameCheck
from an array to a pointer. That means that sizeof nameCheck
now doesn't give the number of char
s you can store in the array, but the size of a pointer to char
, which is independent of what it points to (usually 4 on 32-bit systems and 8 on 64-bit systems).
And you let that pointer point to a string literal ""
, which is the reason for the crash. Attempting to modify string literals is undefined behaviour, and more often than not leads to a crash, since string literals are usually stored in a read-only segment of the memory nowadays.
You should have left it at
char nameCheck[100];
fgets(nameCheck, sizeof nameCheck, stdin);
and then you can use sizeof nameCheck
to tell fgets
how many characters it may read, or, alternatively, you could have a pointer and malloc
some memory,
#define NAME_LENGTH 100
char *nameCheck = malloc(NAME_LENGTH);
if (nameCheck == NULL) {
// malloc failed, handle it if possible, or
exit(EXIT_FAILURE);
}
fgets(nameCheck, NAME_LENGTH, stdin);
Either way, after getting input, remove the newline if there is one:
size_t len = strlen(nameCheck);
if (len > 0 && nameCheck[len-1] == '\n') {
nameCheck[len-1] = 0;
}
// Does windows also add a '\r' when reading from stdin?
if (len > 1 && nameCheck[len-2] == '\r') {
nameCheck[len-2] = 0;
}
Related Topics
Overloading Operator<< for a Templated Class
Lifetime of Object Is Over Before Destructor Is Called
How to Convert from Lpctstr to Std::String
Multiple Sfinae Class Template Specialisations Using Void_T
Making a Vector of Instances of Different Subclasses
Why Can't I Replace Std::Map with Std::Unordered_Map
How to Remove Straight Lines or Non-Curvical Lines in a Canny Image
Memory Alignment:How to Use Alignof/Alignas
Creating JSON Arrays in Boost Using Property Trees
Output Redirection Using Fork() and Execl()
When Pass a Variable to a Function, Why the Function Only Gets a Duplicate of the Variable
Portable End of Line (Newline)
Why Must Virtual Base Classes Be Constructed by the Most Derived Class
Uniform Initialization Fails to Copy When Object Has No Data Members