Is List::Size() Really O(N)

Is list::size() really O(n)?

Pre-C++11 answer

You are correct that the standard does not state what the complexity of list::size() must be - however, it does recommend that it "should have constant complexity" (Note A in Table 65).

Here's an interesting article by Howard Hinnant that explains why some people think list::size() should have O(N) complexity (basically because they believe that O(1) list::size() makes list::splice() have O(N) complexity) and why an O(1) list::size() is be a good idea (in the author's opinion):

  • http://howardhinnant.github.io/On_list_size.html

I think the main points in the paper are:

  • there are few situations where maintaining an internal count so list::size() can be O(1) causes the splice operation to become linear
  • there are probably many more situations where someone might be unaware of the negative effects that might happen because they call an O(N) size() (such as his one example where list::size() is called while holding a lock).
  • that instead of permitting size() be O(N), in the interest of 'least surprise', the standard should require any container that implements size() to implement it in an O(1) fashion. If a container cannot do this, it should not implement size() at all. In this case, the user of the container will be made aware that size() is unavailable, and if they still want or need to get the number of elements in the container they can still use container::distance( begin(), end()) to get that value - but they will be completely aware that it's an O(N) operation.

I think I tend to agree with most of his reasoning. However, I do not like his proposed addition to the splice() overloads. Having to pass in an n that must be equal to distance( first, last) to get correct behavior seems like a recipe for hard to diagnose bugs.

I'm not sure what should or could be done moving forward, as any change would have a significant impact on existing code. But as it stands, I think that existing code is already impacted - behavior might be rather significantly different from one implementation to another for something that should have been well-defined. Maybe onebyone's comment about having the size 'cached' and marked known/unknown might work well - you get amortized O(1) behavior - the only time you get O(N) behavior is when the list is modified by some splice() operations. The nice thing about this is that it can be done by implementors today without a change to the standard (unless I'm missing something).

As far as I know, C++0x is not changing anything in this area.

Is !list.isEmpty() and list.size() 0 equal?

If in doubt, read the Javadoc:

Collection.isEmpty():

Returns true if this collection contains no elements.

Collection.size():

Returns the number of elements in this collection

So, assuming the collection is implemented correctly:

collection.isEmpty() <=> collection.size() == 0

Or, conversely:

!collection.isEmpty() <=> collection.size() != 0

Since the number of elements should only be positive, this means that:

!collection.isEmpty() <=> collection.size() > 0

So yes, the two forms are equivalent.

Caveat: actually, they're only equivalent if your collection isn't being modified from another thread at the same time.

This:

!substanceList.isEmpty() && (substanceList.size() > 0)

is equivalent to, by the logic I present above:

!substanceList.isEmpty() && !substanceList.isEmpty()

You can only simplify this to

!substanceList.isEmpty()

if you can guarantee that its value doesn't change in between evaluations of substanceList.isEmpty().

Practically, it is unlikely that you need to care about the difference between these cases, at least at this point in the code. You might need to care about the list being changed in another thread, however, if it can become empty before (or while) executing createAmountText. But that's not something that was introduced by this refactoring.

TL;DR: using if (!substanceList.isEmpty()) { does practically the same thing, and is clearer to read.

Why the running time complexity of (List Object ) list.size() is O(1)?

You're specifically asking why List.of(1, 2, 3, 4, 5).size() is O(1).

While other answers have correctly said that implementations such as ArrayList and LinkedList explicitly store the lists size as a field of the list, that doesn't actually answer your question, because List.of​(E... elements) doesn't use those implementations.

List.of(E... elements) is varargs, which means that internally it is actually List.of(E[] elements) and the compiler builds a fixed array for you. In your case, that array is exactly 5 in size.

Although the implementation is really different, building an immutable list, the E[] is turned into a list similarly to how Arrays.asList​(T... a) does it, i.e. by wrapping the array in a List implementation that uses the array directly.

As such, the list size of very well known, because the size is the length of the backing array.

Why isn't std::list.size() constant-time?

There is a conflict between constant time size() and constant time list.splice. The committee chose to favour splice.

When you splice nodes between two lists, you would have to count the nodes moved to update the sizes of the two lists. That takes away a lot of the advantage of splicing nodes by just changing a few internal pointers.


As noted in the comments, C++11 has changed this by giving up O(1) for some rare(?) uses of splice:

void splice(const_iterator position, list& x, const_iterator first, const_iterator last);
void splice(const_iterator position, list&& x, const_iterator first, const_iterator last);

Complexity: Constant time if &x == this; otherwise, linear time.

Is checking the size of a list linear?

I've seen many answers online saying that checking the size of a list is constant time.

This fallacy may originate in a fundamental flaw in the Python language: arrays are called lists in Python. As Python gains popularity, the word list has become ambiguous.

Computing the length of a linked list is an O(n) operation, unless the length has been stored separately and maintained properly.

Retrieving the size of an array is performed in constant time if the size is stored along with the array, as is the case in Python, so a=[1,2,3]; len(a) is indeed very fast.

Computing the length of an array may be an O(n) operation if the array must be scanned for a terminating value, such as a null pointer or a null byte. Thus strlen() in C, which computes the number of bytes in a C string (a null terminated array of char) operates in linear time.

Should std::list::size have constant complexity in C++11?

This is not exactly a bug and you can read about it here:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49561

It's more of a case of compatibility with older versions of gcc.
Looks like they really don't want to add an additional "data member".

Quoting:

This patch made c++98 and c++11 code incompatible and is causing
serious problems for distros.

Where the patch is the fix they implemented for gcc 4.7 (it was O(1) in it).

Another quote:

maintaining ABI compatibility has been decided to be more important
for the current releases

Why is list.size() 0 slower than list.isEmpty() in Java?

Your testing code is flawed.

Just reverse the order, i.e call isEmpty first and size > 0 second and you'll get the opposite result. This is due to class loading, caching, etc.

Why doesn't the Scala List have a size field?

One cannot say the size field was dropped, as such list without the size have existed for 50 years since LISP where they are ubiquitous and they are very common in ML and Haskell too, both influential in scala.

The basic reason is that list is a recursive structure. A non empty List is Cons(head: A, tail: List[A]) — except that Cons is in fact called :: to allow a convenient infix notation. You can access the tail (the list without its head element) and that is a list too. And this is done just about all the time. So having the count in the list would not mean adding just one integer, but as many integers as there are elements. This is feasible, but certainly not free.

If you compare with java's LinkedList, LinkedList has a recursive implementation (based on Node, which is more or less like Cons, but with links in both direction). But a LinkedList is not a Node, it owns them (and keep their count). So while it has a recursive implementation, you cannot treat it recursively. It you wanted the tail of a LinkedList as a LinkedList, you would have to either remove the head and have your list changed or else copy all the tail elements to a new LinkedList. So scala's List and java's LinkedList are very different structures.



Related Topics



Leave a reply



Submit