What Is the Proper Function for Comparing Two C-Style Strings

What is the proper function for comparing two C-style strings?

For general string comparisons, strcmp is the appropriate function. You should use strncmp to only compare some number of characters from a string (for example, a prefix), and memcmp to compare blocks of memory.

That said, since you're using C++, you should avoid this altogether and use the std::string class, which is much easier to use and generally safer than C-style strings. You can compare two std::strings for equality easily by just using the == operator.

Hope this helps!

Using the equality operator == to compare two strings for equality in C

Because argv[1] (for instance) is actually a pointer to the string. So all you're doing is comparing pointers.

C string comparison

Because in C you have to use strcmp for string comparison.

In C a string is a sequence of characters that ends with the '\0'-terminating byte, whose value is 0.

The string "exit" looks like this in memory:

+-----+-----+-----+-----+------+
| 'e' | 'x' | 'i' | 't' | '\0' |
+-----+-----+-----+-----+------+

where 'e' == 101, 'x' == 120, etc.

The values of the characters are determined by the codes of the ASCII Table.

&command != "exit"

is just comparing pointers.

while(strcmp(command, "exit") != 0);

would be correct. strcmp returns 0 when both strings are equal, a non-zero
value otherwise. See

man strcmp

#include <string.h>

int strcmp(const char *s1, const char *s2);

DESCRIPTION

The strcmp() function compares the two strings s1 and s2. It returns an integer less than, equal to, or greater than zero if s1 is
found, respectively, to be less than, to match, or be greater than s2.

But you've made another error:

scanf("%c", &command);

Here you are reading 1 character only, this command is not a string.

scanf("%s", command);

would be correct.

The next error would be

char command[4];

This can hold strings with a maximal length of 3 characters, so "exit" doesn't
fit in the buffer.

Make it

char command[1024];

Then you can store a string with max. length of 1023 bytes.

In general, of want to save a string of length n, you need a char array of
at least n+1 dimension.

How do I properly compare strings in C?

You can't (usefully) compare strings using != or ==, you need to use strcmp:

while (strcmp(check,input) != 0)

The reason for this is because != and == will only compare the base addresses of those strings. Not the contents of the strings themselves.

compare two c-style strings without strcmp() and operator overloading

If you are not allowed to use strcmp or the c++ string library,
you could either do the raw loop your self:

bool string_compare(const char* a, const char* b)
{
if (bool(a) != bool(b)) return false;
if (!a && !b) return true;

while (*a != '\0' && *b != '\0') {
if (*a++ != *b++) return false;
}

return *a == *b;
}

or you could use memcmp if it is allowed:

bool string_compare(const char* a, const char* b)
{
if (bool(a) != bool(b)) return false;
if (!a && !b) return true;

const auto a_size = std::strlen(a);
const auto b_size = std::strlen(b);
if (a_size != b_size) return false;

return std::memcmp(a, b, a_size) == 0;
}

It is of course better to use the c++ string library than any of the above methods

bool string_compare(const char* a, const char* b)
{
return std::string(a) == std::string(b); //for c++11
//return std::string_view(a) == std::string_view(b); //for c++17
}

What is the fastest way to compare two strings in C?

I'm afraid your reference imlementation for strcmp() is both inaccurate and irrelevant:

  • it is inaccurate because it compares characters using the char type instead of the unsigned char type as specified in the C11 Standard:

    7.24.4 Comparison functions

    The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp is determined by the sign of the difference between the values of the first pair of characters (both interpreted as unsigned char) that differ in the objects being compared.

  • It is irrelevant because the actual implementation used by modern compilers is much more sophisticated, expanded inline using hand-coded assembly language.

Any generic implementation is likely to be less optimal, especially if coded to remain portable across platforms.

Here are a few directions to explore if your program's bottleneck is comparing strings.

  • Analyze your algorithms, try and find ways to reduce the number of comparisons: for example if you search for a string in an array, sorting that array and using a binary search with drastically reduce the number of comparisons.
  • If your strings are tokens used in many different places, allocate unique copies of these tokens and use those as scalar values. The strings will be equal if and only if the pointers are equal. I use this trick in compilers and interpreters all the time with a hash table.
  • If your strings have the same known length, you can use memcmp() instead of strcmp(). memcmp() is simpler than strcmp() and can be implemented even more efficiently in places where the strings are known to be properly aligned.

EDIT: with the extra information provided, you could use a structure like this for your strings:

typedef struct string_t {
size_t len;
size_t hash; // optional
char str[]; // flexible array, use [1] for pre-c99 compilers
} string_t;

You allocate this structure this way:

string_t *create_str(const char *s) {
size_t len = strlen(s);
string_t *str = malloc(sizeof(*str) + len + 1;
str->len = len;
str->hash = hash_str(s, len);
memcpy(str->str, s, len + 1);
return str;
}

If you can use these str things for all your strings, you can greatly improve the efficiency of the matching by first comparing the lengths or the hashes. You can still pass the str member to your library function, it is properly null terminated.

Writing a string compare function

Your s_compare() function is having undefined behaviour because it end up dereferencing the NULL pointer if user pass a valid or empty string to str1 and NULL to str2 as arguments to s_compare() function, like this:

s_compare ("abc", NULL);
s_compare ("", NULL);

Both of these calls will result in undefined behaviour.

To cover these cases:

  • First string is longer.
  • Second string is longer.
  • Empty strings.

no need to call strlen() and get the length of string. To find out whether strings are same or not, you can use the fact that, in C, strings are actually one-dimensional array of characters terminated by a null character \0. Just iterate over them then either their character at a specific position may be different, or if their size is different and if the long string will have initial characters same as short string till the length of short string then the position at which the short string will have null character, at that position, long string will have some other character. In both cases, the strings are not same.

Since the function s_compare() is not supposed to modify the strings passed as arguments, you should declare the both the pointer parameters const.
Implementation:

bool s_compare (const char* str1, const char* str2) {

// if its the same pointer can return immediately
// also covers the case where both are null pointers
if (str1 == str2) {
return true;
}

// if one of them is null and other is not strings are not same
if ((str1 == NULL) || (str2 == NULL)) {
return false;
}

size_t i = 0;

// iterater over each character of string str1
while (str1[i]) {
// if any of the character in str1 and str2, at position i,
// is different that means strings are not same
if (str1[i] != str2[i]) {
return false;
}
++i;
}

// we reached here that means str1 is iterated till
// null character and str1 ith character is null character.
// So, if the str2 ith character is also null
// character than strings are same otherwise not
return str2[i] == '\0' ? true : false;
}

Driver program:

int main (void) {
printf ("Compare NULL and NULL : %d\n", s_compare (NULL, NULL));
printf ("Compare \"abc\" and NULL : %d\n", s_compare ("abc", NULL));
printf ("Compare \"\" and NULL : %d\n", s_compare ("", NULL));
printf ("Compare NULL and \"\" : %d\n", s_compare (NULL, ""));

char s1[10] = {0};
char s2[10] = {0};

printf ("Compare \"%s\" and \"%s\" : %d\n", s1, s2, s_compare (s1, s2));

strcpy (s1, "ABC");
strcpy (s2, "ABC");

printf ("Compare \"%s\" and \"%s\" : %d\n", s1, s2, s_compare (s1, s2));

strcpy (s1, "ab");
strcpy (s2, "ayz");

printf ("Compare \"%s\" and \"%s\" : %d\n", s1, s2, s_compare (s1, s2));

return 0;
}

Output:

Compare NULL and NULL : 1
Compare "abc" and NULL : 0
Compare "" and NULL : 0
Compare NULL and "" : 0
Compare "" and "" : 1
Compare "ABC" and "ABC" : 1
Compare "ab" and "ayz" : 0

Fastest way to compare strings in C

strcmp() - compare two strings.

const char *s1, *s2;

are the strings to be compared.

int i;
i = strcmp( s1, s2 );

gives the results of the comparison. i is zero if the strings are identical. i is positive if string s1 is greater than string s2, and is negative if string s2 is greater than string s1. Comparisons of "greater than" and "less than" are made according to the ASCII collating sequence.

strcmp() compares the string s1 to the string s2. Both strings must be terminated by the usual '\0' character.


strncmp()

const char *s1, *s2;

are the strings to be compared.

size_t N;

gives the number of characters to be examined.

int i;
i = strncmp( s1, s2, N );

gives the results of the comparison. i is zero if the first N characters of the strings are identical. i is positive if string "s1" is greater than string s2, and is negative if string "s2" is greater than string s1. Comparisons of "greater than" and "less than" are made according to the ASCII collating sequence.

strncmp() compares the first N characters of the string s1 to the first N characters of the string s2. If one or both of the strings is shorter than N characters (i.e. if strncmp() encounters a '\0'), comparisons will stop at that point. Thus N represents the maximum number of characters to be examined, not the exact number. (Note that if N is zero, strncmp() will always return zero -- no characters are checked, so no differences are found.)


memcmp()
const void *s1, *s2;

are the strings to be compared.
size_t N;

gives the number of characters to be examined.

int i;
i = memcmp( s1, s2, N );

gives the results of the comparison. i is zero if the first N characters of the strings are identical. i is positive if string "s1" is greater than string s2, and is negative if string s2 is greater than string s1. Comparisons of "greater than" and "less than" are made according to the ASCII collating sequence.

memcmp() compares the first N characters of the string "s1" to the first N characters of the string s2.

Unlike the function strncmp(), memcmp() does not check for a '\0' terminating either string. Thus it examines a full N characters, even if the strings are not actually that long.


wmemcmp()
int wmemcmp(const wchar_t *a1, const wchar_t *a2, size_t size);

The function wmemcmp() compares the size wide characters beginning at a1 against the size wide characters beginning at a2. The value returned is smaller than or larger than zero depending on whether the first differing wide character is a1 is smaller or larger than the corresponding character in a2.

If the contents of the two blocks are equal, wmemcmp() returns 0.

On arbitrary arrays, the memcmp() function is mostly useful for testing equality. It usually isn't meaningful to do byte-wise ordering comparisons on arrays of things other than bytes. For example, a byte-wise comparison on the bytes that make up floating-point numbers isn't likely to tell you anything about the relationship between the values of the floating-point numbers.


wcscmp()
int wcscmp(const wchar_t *ws1, const wchar_t *ws2);

The wcscmp function compares the wide character string ws1 against ws2. The value returned is smaller than or larger than zero depending on whether the first differing wide character is ws1 is smaller or larger than the corresponding character in ws2.

If the two strings are equal, wcscmp() returns 0.

A consequence of the ordering used by wcscmp() is that if ws1 is an initial substring of ws2, then ws1 is considered to be “less than” ws2.

wcscmp() does not take sorting conventions of the language the strings are written in into account. To get that one has to use wcscoll.


wcscasecmp()
int wcscasecmp(const wchar_t *ws1, const wchar_T *ws2)

This function is like wcscmp(), except that differences in case are ignored. How uppercase and lowercase characters are related is determined by the currently selected locale. In the standard "C" locale the characters Ä and ä do not match but in a locale which regards these characters as parts of the alphabet they do match.


strcmpi()

int strcmpi(const char *string1, const char *string2);

strcmpi() compares string1 and string2 without sensitivity to case. All alphabetic characters in the two arguments string1 and string2 are converted to lowercase before the comparison.

The function operates on null-ended strings. The string arguments to the function are expected to contain a null character '\0' marking the end of the string.

strcmpi() returns a value indicating the relationship between the two strings , as follows

Less than 0 string1 less than string2

0 string1 equivalent to string2

Greater than 0 string1 greater than string2.


strcasecmp()
int strcasecmp(const char *s1, const char *s2);

This function is like strcmp(), except that differences in case are ignored. How uppercase and lowercase characters are related is determined by the currently selected locale. In the standard "C" locale the characters Ä and ä do not match but in a locale which regards these characters as parts of the alphabet they do match.


strncasecmp()
int strncasecmp(const char *s1, const char *s2, size_t n);

This function is like strncmp(), except that differences in case are ignored. Like strcasecmp(), it is locale dependent how uppercase and lowercase characters are related.


Which approach is best is certainly dependent upon your requirements.



Related Topics



Leave a reply



Submit