Bjarne Stroustrup Says We Must Avoid Linked Lists

Bjarne Stroustrup says we must avoid linked lists

Advantages of vector vs. linked list

The main advantage of vector versus linked lists is memory locality.

Usually, each element is allocated seperately in a linked list. As a consequence those elements probably are not next to each other in memory.
(Gaps between the elements in memory.)

A vector is guaranteed to store all contained elements contiguously. (Items next to each other, no gaps;)

Note: Oversimplifications may occur... ;)

Imo, the simplified key points about the superior performance of a contiguously stored data storage pattern versus non-contiguous storage are

1. cache misses

Modern CPUs do not fetch single bytes from memory but slightly larger chunks. Thus, if your data objects size is less than the size of those chunks and if storage is contiguous, you can get more than one element at a time because multiple elements may be in a single chunk.

Example:

A 64byte block (usual cache line size) fits sixteen 32bit integers at a time. Therefore, a cache miss (data not in cache yet -> load from main memory required) occurs at the earliest after processing 16 elements from the moment first one has been brought to cache. If a linked list is used, the first element might well be the only one within a 64byte block. It might in theory happen that a cache miss occurs for every single element of the list.

Concrete example:

std::vector<std::uint32_t> v;
// somehow fill 64 values into v
std::uint32_t s{};
for(std::size_t i{0}; i<v.size(); ++i)
{
s += v[i];
}

Imagine the contents of v not being cached yet.

What happens during the processing of the data in the for loop?

1)Check whether element v[0] is in cache. --> Nope

2)Fetch 64 bytes starting at the address of v[0] from main memory into a cache line

3)Load v[0] from cache and process by adding its value to s

4)Is element v1 in cache? --> Yes loaded with previous fetch because neighbouring v[0]

5)Load v1 from cache and process by adding its value to s

6)Is element v[2] in cache? --> Yes ...

7) Load v[2] from cache and process by adding its value to s

... etc...

34)Is element v[16] in cache? --> Nope

35) Fetch 64 bytes starting at the address of v[16] from main memory into a cache line

36)Load v[16] from cache and process by adding its value to s

37)Is element v[17] in cache? --> Yes loaded with previous fetch because neighbouring v[16]

etc...

Fetching data from main memory into the cache takes much more time than loading data from cache into processor registers and perform simple operations. Therefore, the fact that multiple values may reside on a single cache line can boost performance significantly.

Linked lists do not provide a contiguous storage guarantee and you cannot hope to get this performance boost. This is also the reason why random iteration (accessing elements randomly) performs worse than forward iteration (accessing elements in order) for contiguous containers.

2. prefetching

The above effect is amplified by a CPU feature called "prefetcher".

If a chunk has been loaded from main memory, the prefetcher prepares loading the next chunk / already puts it into cache, significantly reducing the penality of loading stuff from that part of the main memory.

This is of course effective if and only if you in fact need data from the next prepared chunk.

How do vectors usually work (internally)?

See: c++ Vector, what happens whenever it expands/reallocate on stack?

Why not use vectors and lists together?

Linked lists used to be more important because they are stored non-contiguously which is better for memory management. Linked lists and vectors / arrays both have a search time complexity of O(N). It's only faster to access array elements if you know the index in advance. Linked lists are for niche cases where you are frequently inserting elements into the beginning of the array. Linked lists let you do this in O(1) time as opposed to arrays O(n) because the other elements need to be shifted.

Linked List vs Vector

Vector is another name for dynamic arrays. It is the name used for the dynamic array data structure in C++. If you have experience in Java you may know them with the name ArrayList. (Java also has an old collection class called Vector that is not used nowadays because of problems in how it was designed.)

Vectors are good for random read access and insertion and deletion in the back (takes amortized constant time), but bad for insertions and deletions in the front or any other position (linear time, as items have to be moved). Vectors are usually laid out contiguously in memory, so traversing one is efficient because the CPU memory cache gets used effectively.

Linked lists on the other hand are good for inserting and deleting items in the front or back (constant time), but not particularly good for much else: For example deleting an item at an arbitrary index in the middle of the list takes linear time because you must first find the node. On the other hand, once you have found a particular node you can delete it or insert a new item after it in constant time, something you cannot do with a vector. Linked lists are also very simple to implement, which makes them a popular data structure.

How to implement Insertion Sorting on a Linked List String Alphabetically

There are some apparent issues:

if (this.lastName.CompareTo(another.LastName) < 0)
return -1;
else
if (this.lastName.CompareTo(another.LastName) == 0)
return this.firstName.CompareTo(another.FirstName);

Why are you starting by comparing lastnames if you wanted to primarily sort by firstname?

sorted.val >= newnode.val

Why are you sorting by a value if you wanted to sort by name? Just call your comparison function if you want to compare nodes by the first/last name.

The rest of the code looks ok for a learning exercise as far as I can see. If you have issues I would recommend to

  1. Write unit tests! It becomes much easier to find bugs when you can run several sets of test-data to your algorithm designed to find various edge cases. Especially for something like sorting where it is trivial to verify your result.
  2. Learn how to use the debugger. The behavior of the program becomes much easier to understand when you can stop at various points and verify that the variables match your expectation.

See How to debug small programs for more details.

Writing code like this can be very useful as a learning exercise, but please do not use code like this for anything serious. There is perfectly fine sorting functions built into the framework that will be both faster and easier to understand. Also note that linked lists are rarely used in real life, I do not think I have used one even once outside of school. See also we must avoid linked lists.

How to improve linked list searching. C++

Searching a linked list is linear so you need to iterate from beginning one by one so it is O(n). Linked lists are not the best if you will use it for searching, you can utilize more suitable data structures such as binary trees.

Ordering elements does not help much because still you need to iterate each element anyway.

Wikipedia article says:

In an unordered list, one simple heuristic for decreasing average search time is the move-to-front heuristic, which simply moves an element to the beginning of the list once it is found. This scheme, handy for creating simple caches, ensures that the most recently used items are also the quickest to find again.

Another common approach is to "index" a linked list using a more
efficient external data structure. For example, one can build a
red-black tree or hash table whose elements are references to the
linked list nodes. Multiple such indexes can be built on a single
list. The disadvantage is that these indexes may need to be updated
each time a node is added or removed (or at least, before that index
is used again).

So in the first case you can slightly improve (by statistical assumptions) your search performance by moving items found previously closer to the beginning of the list. This assumes that previously found elements will be searched more frequently.

Second method requires to use other data structures.

If using linked lists is not a hard requirement, consider using hash tables, sorted arrays (random access) or balanced trees.

Move Semantics C++11 (Bjarne Stroustrup book, pg75)

First and foremost, there is no move-construction here. There is NRVO occurring (as per comments on your question) but if you want to testify of move-construction occurring, you should try to copy a temporary. Try re-implementing your operator as follows:

// Note that the first parameter is taken by value.
Vector operator+(Vector a, const Vector &b)
{
cout << "Intermediate result for sum created or moved from a" << endl;

// NOTE: Could be implemented as operator+= and written a += b.
if(a.size()!=b.size()) {
throw length_error{"Length of vectors must match for summing"};
}

for(long unsigned int i=0; i<a.size();i++){
a[i] += b[i];
}
// END NOTE

return a;
}

Note that your operator now requires a local "copy" as first parameter. On the first sum, since a Vector& will be provided as argument (the object has a name), the copy constructor will be called and create an object with a new buffer (what you are doing with your result local variable). Then, a temporary will be formed, so on the second call to your operator+, a Vector&& will be provided as first argument and the move-constructor will be called, effectively stealing the buffer of the temporary.

If you prevent constructor elision as per comments, you should be able to see some move constructors called.

EDIT: A more detailed explanation of why you won't have a move in your case. The short version is that C++ does not do magic :)

The aim of a move in the case of chained additions is to avoid repeatedly creatingthe buffers of your Vectors due to the creation of temporaries. Instead of losing the buffers to the destruction of the temporaries, the aim is to reuse the buffer of the temporary to store the result.

In your case, you're not even trying to copy the buffer of the Vector. You're explicitly creating a new buffer on this line:

Vector result(a.size());

You're saying "Please create a vector of the right size for the result", and doing it once for every addition. You had better say "Please steal the buffer of a" if you know that a will die soon. It is written:

Vector result(a);

But for this to work, a must refer to an object that will die soon, and this means a must be of type Vector&&. When this is done, result already contains the values of a, so all you have to do is add the values of the elements of b.

for(long unsigned  int i=0; i<result.size();i++){
result[i] += b[i];
}

(Expressing + in terms of += is idiomatic)
So to testify of move construction, you should have a definition of your operator+ whose signature is:

Vector operator+(Vector&& a, const Vector &b)  {
if(a.size()!=b.size()) {
throw length_error{"Length of vectors must match for summing"};
}

Vector result(a);
cout << "Intermediate result for sum moved from a" << endl;
for(long unsigned int i=0; i<result.size();i++){
result[i] += b[i];
}
return result;
}

But this won't work with the first addition, since it is not adding a temporary to another named object, but two named objects. You should have an overload of your operator+ that copes with lvalue references:

Vector operator+(const Vector& a, const Vector &b) {
if(a.size()!=b.size()) {
throw length_error{"Length of vectors must match for summing"};
}

Vector result(a); // In this case, this is a copy
cout << "Intermediate result for sum created" << endl;
for(long unsigned int i=0; i<result.size();i++){
result[i] += b[i];
}
return result;
}

Notice that both versions share the same text. Which brings to another idiomatic way of doing: letting the compiler decide of whether to do the move or the copy by requiring the left side of operator+ to be passed by value, requiring only one overload. (This is the version at the beginning of my answer.



Related Topics



Leave a reply



Submit