Strategy: Stop Using Linked-Lists
What data structure is more sacred than the link list? If we get rid of it what silly interview questions would we use instead? But not using linked-lists is exactly what Aater Suleman recommends in Should you ever use Linked-Lists?
In The Secret To 10 Million Concurrent Connections one of the important strategies is not scribbling data all over memory via pointers because following pointers increases cache misses which reduces performance. And there’s nothing more iconic of pointers than the link list.
Here are Aeter's reasons to be anti-linked-list:
- They reduce the benefit of out-of-order execution.
- They throw off hardware prefetching.
- They reduce DRAM and TLB locality.
- They cannot leverage SIMD.
- They are harder to send to GPUs.
He also demolishes the pros of linked-lists, finding arrays a better option in almost every case. Good discussion in the comment section as not everyone agrees.
Patrick Wyatt details how a linked-list threading bug repeatedly crashed Starcraft. Also a good comment discussion with a lot of disagreement.
KjellKod is also against linked-lists: Number crunching: Why you should never, ever, EVER use linked-list in your code again. The idea is mathematical complexity does not reflect real-life performance in situ on actual hardware:
Big-O notations tells nothing about how one (data structure with algorithm) fare against another. Big-O will only tell you how performance will degrade as n increases. So comparing one a data structure that is RAM intensive to another data structure that is cache friendly from an abstract Big-O point-of-view is just pointless.
The key as we've seen before is locality of reference. When memory is local and you can avoid cache misses and avoid paging then operations like copying are dirt cheap, when in the mind of the programmer they are expensive. When you don't have locality of reference the thing that programmers think is dirt cheap, following pointers, is extremely expensive. While the first part of the article is frustratingly vague, the last half is full of excellent explanatory detail. Well worth reading. And there are even more good comments in the comment section.
But people really like their link lists. Probably as much as people used to love goto statements. So if performance really matters to you then this is something to think about rather than immediately dismiss as heresy.
Related Articles
- Discuss on Hacker News and on Reddit
- On C Linked Lists (Profiling and Optimizing)
- High performance alternative to bounded queues for exchanging data between concurrent threads - linked lists make poor queues also. Ring buffers backed by arrays are much better (from jhawk28)
Reader Comments (10)
The argument is somewhat correct when applied to classical, naive linked lists. However, few of the linked lists I see in real software are designed this way and when they are it is because it offers a real advantage (e.g. when memory cannot be moved or the copy would be more expensive on average). Most are cache-aware implementations composed from or within array-like structures; the specifics depend on the use case and memory management. Compared to these, using a simple array is also a naive solution that requires e.g. an excessive number of expensive memory copies for operations like insertion.
There are a few cases where a simple linked list really is the best data structure given the characteristics of the use case. As with many things, it comes down to using the right tool for the job. Linked lists have relatively rare use cases compared to arrays but they are still a useful and necessary tool for some kinds of performance-sensitive code so discounting them completely would be incorrect.
I certainly don't care what you think. Everybody can decide for themselves. I don't use the STL but if I did, I might use compound arrays.
Why the fuck does every 20-year-old programmer consider himself a fucken guru?
You have typos in this article.
"Here’s Aeter's reasons to be anti-linked-list" -> "Here are Aeter's reasons to be anti-linked-list"
"They are harder to send of GPUs" -> "They are harder to send to GPUs"
"He also demolishes the pros of for linked-lists" -> "He also demolishes the pros of linked-lists"
"The idea is mathematical complexity do not reflect ..." -> " The idea is mathematical complexity does not reflect ..."
"They key as we've seen before ..." -> "The key as we've seen before ..."
"And there even more ..." -> "And there are even more ..."
Ironic. According to my Richie C book malloc manages memory via a linked list.
I will stop using linked lists when I decide, not when some pimple faced 'code ninja' tells me to.
Agreed that the cache hitting rate really matters. In the same line is a discussion on which basic data structure in C++ is better for an "obvious" example "Vector v.s. list" in Bjarne Stroustrup's talk http://www.youtube.com/watch?v=0iWb_qi2-uI (starting around 44:48).
OK, interesting, didn even thought about that.
But, all modern frameworks & APIs (and yes, IDEs) and laguages make it *fairly* easy to use linked-lists, because all functions are returing usually some type of linked list :-(
It's good advice that many people don't seem to have yet got and is worth repeating.
There are few situations where the performance of linked lists is better than a simple vector. Even the ones that it's supposed to be good for like inserting at the start a vector will often outperform a list.
The main time I use them is when I have to splice together multiple lists quickly. It still seems to have some advantage there,.
It's funny that there's so many people butthurt over the suggestion that Linked Lists aren't performant. Probably because none of them have ever designed a system at scale.
Age and pimple (or pimple free) has nothing to do with how good the idea is, and in this case, the author is correct.
LInked lists have a ton of pointers, and a ton of them are bad for the branching lookahead units. ArrayLists are always in contiguous memory, and pay an amortized growth penalty.
It's not hard to code up an example that thrashes both array types from multiple threads on multiple cores. Spend the 5 minutes and look for yourself.
Linked lists only really make sense when the node pointers are embedded in a larger contiguous data structure e.g. a large block of memory, a queue message etc.; unfortunately the Java Collections framework does not support factories for container elements or nodes, this immediately introduces a level of indirection, a Linked List just compounds this cost with a new node object containing three references. A map can be nearly as inefficient, especially a LinkedHashMap; however the trade-off makes sense there.