Cost of len() function
It's O(1) (constant time, not depending of actual length of the element - very fast) on every type you've mentioned, plus set
and others such as array.array
.
What is the secret behind Python's len() builtin time complexity of O(1)
I guess you are missing one concept that is how a data structure can return its size in constant time i.e. O(1).
Roughly, think of a program like this:
void init(){
// code to initialize (or allocate memory) the container
size = 0;
}
void add(Something something){
container.add(something);
size++;
}
void remove(Something something){
//Code to search 'something'
if(found) {
container.remove(something);
size--;
}
}
int len(){
return size;
}
Now any time you call the method len()
, it is ready to return the integral value without any need to traverse the container
.
Why strlen
or any C related data structure doesn't work that way is because of the space overhead of having a counter like size
. But that doesn't mean you can't define one.
Hint:
Use struct and keep the size maintained there.
Time Complexity of Python dictionary len() method
Inspecting the c-source of dictobject.c
shows that the structure contains a member responsible for maintaining an explicit count (dk_size
)
layout:
+---------------+
| dk_refcnt |
| dk_size |
| dk_lookup |
| dk_usable |
| dk_nentries |
+---------------+
...
Thus it will have order O(1)
What is the big-o notation for the `len()` function in Python?
A Python list knows its own length; len
takes O(1) time. Lists are actually arrays, not linked lists as in Lisp, where length
takes linear time.
Understanding python's len() time complexity
len(obj)
simply calls obj.__len__()
:
>>> [1, 2, 3, 4].__len__()
4
It is therefore not correct to say that len()
is always O(1) -- calling len()
on most objects (e.g. lists) is O(1), but an arbitrary object might implement __len__
in an arbitrarily inefficient way.
max(obj)
is a different story, because it doesn't call a single magic __max__
method on obj
; it instead iterates over it, calling __iter__
and then calling __next__
. It does this n times (and also does a comparison each time to track the max item it's seen so far), so it must always be at least O(n). (It can be slower if __next__
or the comparison methods are slow, although that would be very unusual.)
For either of these, we don't count the time it took to build the collection as part of the cost of calling the operation itself -- this is because you might build a list once and then call len()
on it many times, and it's useful to know that the len()
by itself is very cheap even if building the list was very expensive.
Go: multiple len() calls vs performance?
There are two cases:
- Local slice: length will be cached and there is no overhead
- Global slice or passed (by reference): length cannot be cached and there is overhead
No overhead for local slices
For locally defined slices the length is cached, so there is no runtime overhead. You can see this in the assembly of the following program:
func generateSlice(x int) []int {
return make([]int, x)
}
func main() {
x := generateSlice(10)
println(len(x))
}
Compiled with go tool 6g -S test.go
this yields, amongst other things, the following lines:
MOVQ "".x+40(SP),BX
MOVQ BX,(SP)
// ...
CALL ,runtime.printint(SB)
What happens here is that the first line retrieves the length of x
by getting the value located 40 bytes from the beginning of x
and most importantly caches this value in BX
, which is then used for every occurrence of len(x)
. The reason for the offset is that an array has the following structure (source):
typedef struct
{ // must not move anything
uchar array[8]; // pointer to data
uchar nel[4]; // number of elements
uchar cap[4]; // allocated number of elements
} Array;
nel
is what is accessed by len()
. You can see this in the code generation as well.
Global and referenced slices have overhead
For shared values caching of the length is not possible since the compiler has to assume that the slice changes between calls. Therefore the compiler has to write code that accesses the length attribute directly every time. Example:
func accessLocal() int {
a := make([]int, 1000) // local
count := 0
for i := 0; i < len(a); i++ {
count += len(a)
}
return count
}
var ag = make([]int, 1000) // pseudo-code
func accessGlobal() int {
count := 0
for i := 0; i < len(ag); i++ {
count += len(ag)
}
return count
}
Comparing the assembly of both functions yields the crucial difference that as soon as the variable is global the access to the nel
attribute is not cached anymore and there will be a runtime overhead:
// accessLocal
MOVQ "".a+8048(SP),SI // cache length in SI
// ...
CMPQ SI,AX // i < len(a)
// ...
MOVQ SI,BX
ADDQ CX,BX
MOVQ BX,CX // count += len(a)
// accessGlobal
MOVQ "".ag+8(SB),BX
CMPQ BX,AX // i < len(ag)
// ...
MOVQ "".ag+8(SB),BX
ADDQ CX,BX
MOVQ BX,CX // count += len(ag)
Are len(string) and len(slice)s O(1) operation in Go?
A string header contains a pointer to the backing array and a length.
The len()
function returns the length field from string and slice headers. It's an O(1) operation.
Regarding builtin.go. The file says:
Package builtin provides documentation for Go's predeclared identifiers.
The items documented here are not actually in package builtin
but their descriptions here allow godoc to present documentation
for the language's special identifiers.
Related Topics
Pandas Dataframe to List of Lists
What Exactly Does "Import *" Import
Rewrite Multiple Lines in the Console
Importerror: No Module Named 'Encodings'
Working with an Access Database in Python on Non-Windows Platform (Linux or MAC)
String Replace Doesn't Appear to Be Working
Why Does the Print Function Return None
How to Install Packages Using Pip According to the Requirements.Txt File from a Local Directory
Differencebetween an Expression and a Statement in Python
Mkdir -P Functionality in Python
How to Reset Index in a Pandas Dataframe
Should You Always Favor Xrange() Over Range()
How to Convert a Pil Image into a Numpy Array