Multiset, Map and Hash Map Complexity

C++: algorithmic complexity of std::next and std::nth_element for std::multiset

std::next is O(k) in the distance you move.

There is no other way to get the median, practically, if you have a multiset.

It is possible to write a more efficient advance operation on containers like this, but std does not contain it.

You could keep around two multisets and balance them using splicing operations such that the median is always the first element of the 2nd container. This will have asymptotically good performance characteristics.

It could make other operations a bit more of a pain.

C++, multiset, Time complexity

Indeed, your algorithm is O(nlogn) but this is not a garantee to not exceed a time limit. Remember that the multiplication constant for a big-O complexity may be too big to keep it under a certain time limit.

You are using a multiset only for keeping a count for each type of Pokemon. You lose much time to erase from that multiset and count again from it. You can do much faster than the multiset:

  1. By using a map to keep the count for each type and to update it

  2. Better yet, since pokemon types are encoded in single chars, you can use an array of 256 elements to keep track of the count. This way you can avoid the "log(n)" complexity of multiset (and map). Here's a refactored version of your code that should run much faster, and moreover it runs in O(n).

    int main()
    {
    int n;
    std::cin >> n;
    std::vector st(n);
    std::array count; // we could also use a map if we had a bigger set of types...

    // the way you read the input can also be speeded-up,
    // but I want to focus on speeding up the algorithm
    getchar();
    for (int i=0; i<n; ++i) {
    st[i]=getchar(); ++count[st[i]];
    }

    int pos1=0,pos2=n-1;
    for (int i=0; i < n; ++i) {
    if (count[st[i]] == 1) {
    pos1 = i;
    break;
    } else --count[st[i]];
    }
    for (int i=n-1; i>=0; --i) {
    if (s.count(st[i])==1) {
    pos2=i;
    break;
    } else --count[st[i]];
    }
    std::cout<<pos2-pos1+1;

    }

multiset & multimap - What's the point?

Some use cases:

multimap

  • With ZIP code as a key, all people which have that ZIP code
  • With account ID as key, all open orders of that person/account
  • A dictionary, with per keyword various explanations

multiset

is in essence a map with a key and a integer count.

  • The inventory of a shop, all products have their key and the amount
    still available is the value
  • accumulated sales data of a shop, every time a product is sold the
    product id get's added to the multiset thereby increasing the amount sold

What's the time complexity of iterating through a std::set/std::map?

In the draft C++11 standard N3337 the answer can be found in § 24.2.1 paragraph 8:

All the categories of iterators require only those functions that are
realizable for a given category in constant time (amortized).

Since each operation on an iterator must be constant time, iterating through n elements must be O(n).

Using Guava multisets or Collections.frequency()?

Guava documentation for Multiset#count()
has to say:

Note that for an Object.equals(java.lang.Object)-based multiset, this gives the same result as Collections.frequency(java.util.Collection, java.lang.Object) (which would presumably perform more poorly).

So, yes, I suspect that performance is the issue here.

I think Multiset#count is more efficient because Collections#frequency iterates through the entire collection. For an object o whose frequency you're checking, it goes through all elements e in the collection and checks (o == null ? e == null : o.equals(e)).

For Multiset (which is an interface), the exact implementation of count depends on the class. If it is a HashMultiset, for example, then it is backed by a HashMap. For details about how that is more efficient than iterating through the whole collection, take a look at this answer: How does a Java HashMap handle different objects with the same hash code?.

The Guava code is as follows

public int count(@Nullable Object element) {
Count frequency = Maps.safeGet(backingMap, element);
return (frequency == null) ? 0 : frequency.get();
}

Similarly, for a TreeMultiset, which maintains the ordering of its elements and is backed by an AVL tree, count can be obtained in O(log(n)) steps instead of O(n), where n is the size of the collection. The Guava code is as follows:

public int count(@Nullable Object element) {
try {
@SuppressWarnings("unchecked")
E e = (E) element;
AvlNode<E> root = rootReference.get();
if (!range.contains(e) || root == null) {
return 0;
}
return root.count(comparator(), e);
} catch (ClassCastException e) {
return 0;
} catch (NullPointerException e) {
return 0;
}
}

What's the difference between setpair and map in C++?

Set elements cannot be modified while they are in the set. set's iterator and const_iterator are equivalent. Therefore, with set<pair<key_class,value_class> >, you cannot modify the value_class in-place. You must remove the old value from the set and add the new value. However, if value_class is a pointer, this doesn't prevent you from modifying the object it points to.

With map<key_class,value_class>, you can modify the value_class in-place, assuming you have a non-const reference to the map.

Multimap with HashMultiset for values

I'm afraid this doesn't seem to be possible. You should file a feature request. I have a sneaking suspicion those crafty Google folks have a nifty kind of a Multimap that they could potentially release that might potentially help you.

Why is the complexity of the C++ STL map container O(log(n))?

The elements of a map or set are contained in a tree structure; every time you examine a node of the tree, you determine if the element you're trying to find/insert is less than or greater than the node. The number of times you need to do this (for a properly balanced tree) is log2(N) because each comparison throws out half of the possibilities.

Time complexity for hash tables when inserting and searching

Hash tables are used for more than just strings. The O(1) complexities for insert and lookup are for hash tables in general and only count the known operations.

Hashing and comparison are counted as O(1), because something must always be done for those, even if you're just storing integers, but we don't know what that is.

If you use a hash table for some data type (like strings) that multiplies the cost of those operations then it will multiply the complexity.

It is actually very important to consider this when measuring the complexity of a concrete algorithm that uses hash tables. Many of the string-based algorithms on this site, for example, are given complexities based on the assumption that the length of input strings is bounded by some constant. Thankfully that is usually the case.



Related Topics



Leave a reply



Submit