C++ String::Find Complexity

C++ string::find complexity

Why the c++'s implemented string::substr() doesn't use the KMP algorithm (and doesn't run in O(N + M)) and runs in O(N * M)?

I assume you mean find(), rather than substr() which doesn't need to search and should run in linear time (and only because it has to copy the result into a new string).

The C++ standard doesn't specify implementation details, and only specifies complexity requirements in some cases. The only complexity requirements on std::string operations are that size(), max_size(), operator[], swap(), c_str() and data() are all constant time. The complexity of anything else depends on the choices made by whoever implemented the library you're using.

The most likely reason for choosing a simple search over something like KMP is to avoid needing extra storage. Unless the string to be found is very long, and the string to search contains a lot of partial matches, the time taken to allocate and free that would likely be much more than the cost of the extra complexity.

Is that corrected in c++0x?

No, C++11 doesn't add any complexity requirements to std::string, and certainly doesn't add any mandatory implementation details.

If the complexity of current substr is not O(N * M), what is that?

That's the worst-case complexity, when the string to search contains a lot of long partial matches. If the characters have a reasonably uniform distribution, then the average complexity would be closer to O(N). So by choosing an algorithm with better worst-case complexity, you may well make more typical cases much slower.

Problem figuring out time and space complexity?

I am not able to figure out the time and space complexity of my code. [...] Is this correct?

std::string::length runs in constant time (since C++11). The comparison and the concatenation run in linear time. But the overall algorithm could run in a non-linear time.

Indeed, the C++ standard does not actually require any specific algorithm or guarantee a complexity for std::string::find. Consequently, it is not possible to give an answers independent of the STL implementation you use.
If the implementation is naive or use a famous Boyer-Moore algorithm, the worst-case time-complexity is likely to be O(n^2) in your case (where n is the size of the input string). This could happen with inputs like s1="C++ String::Find Complexityaca" and s2="C++ String::Find Complexityaac". Despite std::search provide stronger guarantees, it does not provide any search algorithm running in linear-time. To ensure a linear-time complexity, you can use the KMP search algorithm (or better variants like the 2-way string-matching algorithm).

Thus, with the KMP algorithm, the complexity in time and space of your solution would be O(n). This is optimal as the input strings need to be read and stored somewhere (at least in your implementation).

Time complexity of finding a string in an unordered set of strings

std::unordered_set is implemented as a hash table. Its find method has an average-case complexity of O(1) and a worst-case complexity of O(set.size()) key comparisons as specified by [tab:container.hash.req].

By default, std::unordered_set<std::string> uses operator== to compare keys, so each key comparison runs in O(str.size()) as specified in [string.view.ops] (operator==(const std::string&, const std::string&) is defined to be equivalent to std::string_view(str1) == std::string_view(str2), which is defined to be equivalent to std::string_view(str1).compare(std::string_view(str2) == 0).

For an std::unordered_set<std::string>, the container must calculate a hash of the string to find. By default it uses std::hash<std::string> for this. The standard doesn't specify any complexity requirements for std::hash<std::string>, but it's most likely O(str.size()).

What's the time complexity of std::string::contains in C++23?

Going by the most recent draft, contains is:

Equivalent to:

return basic_string_view<charT, traits>(data(), size()).contains(x);

With the string_view function being:

Equivalent to: return find(x) != npos;

Since equality testing an integer with basic_string_view::npos is a constant-time operation, the time complexity will be that of basic_string_view::find:

Member functions in this subclause have complexity O(size() * str.size()) at worst, although implementations should do better.

Time Complexity of find operation

The Standard, §21.4.7.2, doesn't give any guarantees as to the complexity.

You can reasonably assume std::basic_string::find takes linear time in the length of the string being searched in, though, as even the naïve algorithm (check each substring for equality) has that complexity, and it's unlikely that the std::string constructor will build a fancy index structure to enable anything faster than that.

The complexity in terms of the pattern being searched for may reasonably vary between linear and constant, depending on the implementation.

What is the BigO time complexity of modifying a string in C++?

That's right, it's constant. According to this:

Complexity

Constant.

For C++11 that is, it's unspecified for C++98, but I wouldn't assume that an implenetation with non-constant std::string::operator[] would be commonplace.

time complexity between a ordered map string search vs int search

Is the worst time complexity right for the map2?

What is right depends on what complexity you are analysing. The logarithmic complexity of map lookup describes the number of comparisons between elements of the map. This is the same regardless of the type of the element in the map.

If you analyse the complexity in terms of sub-operations of the comparison function, then the total complexity will be a product of the lookup complexity, and the worst case complexity of the comparison (assuming we are analysing worst case complexity, and not average).

The complexity (number of character comparisons) of a string comparison is linear in terms the length of the longest common initial substring between the strings.

So, the complexity of comparison sub operations of a string map lookup is O(m * log n), where n is the size of the map, and m is the length of the longest common initial substring of the elements of the map. The upper bound for m is the length of the longest string.

C++ Unordered Set string hash time complexity?

This is a great question and it hits at some nuances in the way we analyze the cost of operations on hash tables.

There are actually several different ways to think about this. The first one is to think about the runtimes of the operations on a hash table measured with a runtime perspective that purely focuses on the size of the table. From that perspective, the cost of an insertion, lookup, or deletion does not scale as a function of the number of table elements. That is, the cost of looking something up does not depend on the number of elements in the table, the cost of inserting an element doesn't depend on the number of elements in the table, etc. From that vantage point, if we let n denote the number of elements in the table, then the cost of an insertion, deletion, or lookup is O(1), because there is no dependency on n.

In this sense, the big-O notation here is meant to be interpreted as "if the only variable is n, the number of elements in the table, how will things scale?" But that leaves a lot to be desired, since it completely ignores the cost of comparing strings for equality, the cost of evaluating the hash function, etc.

If you factor those details in, then yes, you're correct - the cost of looking up, inserting, or deleting a string of length m from a hash table with n elements is O(m), not O(1).

The perspective I've always found most helpful is the following. When a hash table says that all operations run in time O(1), what it really means is that each operation requires only O(1) total hash evaluations and comparisons. In that sense, it means "the cost of looking something up, or inserting something, or deleting something is some constant number of hash evaluations and comparisons." To then work out the total cost, you then multiply O(1) by the cost of a hash or comparison, which in the case of strings of length m would be O(m). That gives you the overall runtime of O(m), which matches your intuition.

C++ String::Find Complexity