How Are Strings Compared

Why is one string greater than the other when comparing strings in JavaScript?

Because, as in many programming languages, strings are compared lexicographically.

You can think of this as a fancier version of alphabetical ordering, the difference being that alphabetic ordering only covers the 26 characters a through z.

This answer is in response to a java question, but the logic is exactly the same. Another good one: String Compare "Logic".

How are strings compared?

From the docs:

The comparison uses lexicographical
ordering: first the first two items
are compared, and if they differ this
determines the outcome of the
comparison; if they are equal, the
next two items are compared, and so
on, until either sequence is
exhausted.

Also:

Lexicographical ordering for strings uses the Unicode code point number to order individual characters.

or on Python 2:

Lexicographical ordering for strings uses the ASCII ordering for individual characters.

As an example:

>>> 'abc' > 'bac'
False
>>> ord('a'), ord('b')
(97, 98)

The result False is returned as soon as a is found to be less than b. The further items are not compared (as you can see for the second items: b > a is True).

Be aware of lower and uppercase:

>>> [(x, ord(x)) for x in abc]
[('a', 97), ('b', 98), ('c', 99), ('d', 100), ('e', 101), ('f', 102), ('g', 103), ('h', 104), ('i', 105), ('j', 106), ('k', 107), ('l', 108), ('m', 109), ('n', 110), ('o', 111), ('p', 112), ('q', 113), ('r', 114), ('s', 115), ('t', 116), ('u', 117), ('v', 118), ('w', 119), ('x', 120), ('y', 121), ('z', 122)]
>>> [(x, ord(x)) for x in abc.upper()]
[('A', 65), ('B', 66), ('C', 67), ('D', 68), ('E', 69), ('F', 70), ('G', 71), ('H', 72), ('I', 73), ('J', 74), ('K', 75), ('L', 76), ('M', 77), ('N', 78), ('O', 79), ('P', 80), ('Q', 81), ('R', 82), ('S', 83), ('T', 84), ('U', 85), ('V', 86), ('W', 87), ('X', 88), ('Y', 89), ('Z', 90)]

How does string comparison work in JavaScript?

This is calculated using The Abstract Relational Comparison Algorithm in ECMA-5. The relevant part is quoted below.

4. Else, both px and py are Strings
    a) If py is a prefix of px, return false. (A String value p is a prefix 
       of String value q if q can be the result of concatenating p and some
       other String r. Note that any String is a prefix of itself, because 
       r may be the empty String.)
    b) If px is a prefix of py, return true.
    c) Let k be the smallest nonnegative integer such that the character 
       at position k within px is different from the character at position 
       k within py. (There must be such a k, for neither String is a prefix 
       of the other.)
    d) Let m be the integer that is the code unit value for the character 
       at position k within px.
    e) Let n be the integer that is the code unit value for the character 
       at position k within py.
    f) If m < n, return true. Otherwise, return false.

How do I compare strings in Java?

== tests for reference equality (whether they are the same object).

.equals() tests for value equality (whether they are logically "equal").

Objects.equals() checks for null before calling .equals() so you don't have to (available as of JDK7, also available in Guava).

Consequently, if you want to test whether two strings have the same value you will probably want to use Objects.equals().

// These two have the same value
new String("test").equals("test") // --> true 

// ... but they are not the same object
new String("test") == "test" // --> false 

// ... neither are these
new String("test") == new String("test") // --> false 

// ... but these are because literals are interned by 
// the compiler and thus refer to the same object
"test" == "test" // --> true 

// ... string literals are concatenated by the compiler
// and the results are interned.
"test" == "te" + "st" // --> true

// ... but you should really just call Objects.equals()
Objects.equals("test", new String("test")) // --> true
Objects.equals(null, "test") // --> false
Objects.equals(null, null) // --> true

You almost always want to use Objects.equals(). In the rare situation where you know you're dealing with interned strings, you can use ==.

From JLS 3.10.5. String Literals:

Moreover, a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the method String.intern.

Similar examples can also be found in JLS 3.10.5-1.

Other Methods To Consider

String.equalsIgnoreCase() value equality that ignores case. Beware, however, that this method can have unexpected results in various locale-related cases, see this question.

String.contentEquals() compares the content of the String with the content of any CharSequence (available since Java 1.5). Saves you from having to turn your StringBuffer, etc into a String before doing the equality comparison, but leaves the null checking to you.

Why is comparing strings 0(n), but comparing numbers 0(1)?

Numbers in computers are usually handled in fixed-size units. A int might be 32 or 64 bits in any given language and/or compiler/platform combination, but it will never be variable-length.

Therefore you have a fixed number of bits to compare when comparing numbers. It's very easy to build a hardware circuit that compares that many bits at once (i.e. as "one action").

Strings, on the other hand, have inherently variable lengths, so you just saying "string" doesn't tell you how many bits you'll have to compare.

There are exceptions however, as there are variable-length numbers, usually called something like BigInteger or BigDecimal which will behave very similar to String comparison as it might end up being O(n) to compare two BigDecimal values for equality (where n is the length of the BigDecimals, not either of their numeric values).

Compare two strings with '<' and '>' operators in JavaScript

As said above, the formal specification is in the standard: http://www.ecma-international.org/ecma-262/7.0/#sec-abstract-relational-comparison , in layman's terms the logic is like this:

1) String vs String

Split both strings into 16-bit code units and compare them numerically. Note that code units != characters, e.g. "cafè" < "cafè" is true (really).

2) String vs other primitive

Convert both to numbers. If one of them is NaN, return false, otherwise compare numerically. +0 and -0 are considered equal, +/-Infinity is bigger/smaller than anything else.

3) String vs Object

Try to convert the object to a primitive, attempting, in order, [Symbol.toPrimitive]("number"), valueOf and toString. If we've got string, proceed to 1), otherwise proceed to 2). For arrays specifically, this will invoke toString which is the same as join.

How are two strings compared?

The comparison logic is specified by the string's char traits, which for std::string is std::char_traits<char>::compare, which in turn specifies "lexicographic comparison". Each character is compared based on its numeric value, which is given by the encoding of the execution character set. On your platform, 'b' > 'a' is true, so s2 compares less than s1.

comparing strings with < or > operators (C)

With the indirection operator * you are actually comparing the values of the characters the pointers point to at the moment of the dereference.

So in your code, it's comparing 's' to 'a' as follows 's' > 'a' which is true.

The values have char type and hence it's well defined to use the <, > ==, >=, <=, != operators.

Be careful when declaring a pointer to a string literal, use the const qualifier to prevent accidentally modifying it because that would be undefined.

How are strings compared in haskell?

This is lexicographic ordering, which is what you should expect if you look something up in a dictionary.