« Paper: Calvin: Fast Distributed Transactions for Partitioned Database Systems | Main | The Tumblr Architecture Yahoo Bought for a Cool Billion Dollars »
Wednesday
May222013

Strategy: Stop Using Linked-Lists

What data structure is more sacred than the link list? If we get rid of it what silly interview questions would we use instead? But not using linked-lists is exactly what Aater Suleman recommends in Should you ever use Linked-Lists?

In The Secret To 10 Million Concurrent Connections one of the important strategies is not scribbling data all over memory via pointers because following pointers increases cache misses which reduces performance. And there’s nothing more iconic of pointers than the link list.

Here are Aeter's reasons to be anti-linked-list:

  1. They reduce the benefit of out-of-order execution.
  2. They throw off hardware prefetching.
  3. They reduce DRAM and TLB locality.
  4. They cannot leverage SIMD.
  5. They are harder to send to GPUs.

He also demolishes the pros of linked-lists, finding arrays a better option in almost every case. Good discussion in the comment section as not everyone agrees.

Patrick Wyatt details how a linked-list threading bug repeatedly crashed Starcraft. Also a good comment discussion with a lot of disagreement.

KjellKod is also against linked-lists: Number crunching: Why you should never, ever, EVER use linked-list in your code again. The idea is mathematical complexity does not reflect real-life performance in situ on actual hardware:

Big-O notations tells nothing about how one (data structure with algorithm) fare against another. Big-O will only tell you how performance will degrade as n increases. So comparing one a data structure that is RAM intensive to another data structure that is cache friendly from an abstract Big-O point-of-view is just pointless.  

The key as we've seen before is locality of reference. When memory is local and you can avoid cache misses and avoid paging then operations like copying are dirt cheap, when in the mind of the programmer they are expensive. When you don't have locality of reference the thing that programmers think is dirt cheap, following pointers, is extremely expensive. While the first part of the article is frustratingly vague, the last half is full of excellent explanatory detail. Well worth reading. And there are even more good comments in the comment section.

But people really like their link lists. Probably as much as people used to love goto statements. So if performance really matters to you then this is something to think about rather than immediately dismiss as heresy.

Related Articles 

Reader Comments (10)

The argument is somewhat correct when applied to classical, naive linked lists. However, few of the linked lists I see in real software are designed this way and when they are it is because it offers a real advantage (e.g. when memory cannot be moved or the copy would be more expensive on average). Most are cache-aware implementations composed from or within array-like structures; the specifics depend on the use case and memory management. Compared to these, using a simple array is also a naive solution that requires e.g. an excessive number of expensive memory copies for operations like insertion.

There are a few cases where a simple linked list really is the best data structure given the characteristics of the use case. As with many things, it comes down to using the right tool for the job. Linked lists have relatively rare use cases compared to arrays but they are still a useful and necessary tool for some kinds of performance-sensitive code so discounting them completely would be incorrect.

May 22, 2013 | Unregistered CommenterJ. Andrew Rogers

I certainly don't care what you think. Everybody can decide for themselves. I don't use the STL but if I did, I might use compound arrays.

Why the fuck does every 20-year-old programmer consider himself a fucken guru?

May 22, 2013 | Unregistered CommenterTerry A. Davis

You have typos in this article.

"Here’s Aeter's reasons to be anti-linked-list" -> "Here are Aeter's reasons to be anti-linked-list"
"They are harder to send of GPUs" -> "They are harder to send to GPUs"
"He also demolishes the pros of for linked-lists" -> "He also demolishes the pros of linked-lists"
"The idea is mathematical complexity do not reflect ..." -> " The idea is mathematical complexity does not reflect ..."
"They key as we've seen before ..." -> "The key as we've seen before ..."
"And there even more ..." -> "And there are even more ..."

May 22, 2013 | Unregistered CommenterJon Forrest

Ironic. According to my Richie C book malloc manages memory via a linked list.

May 22, 2013 | Unregistered CommenterEdward Capriolo

I will stop using linked lists when I decide, not when some pimple faced 'code ninja' tells me to.

May 22, 2013 | Unregistered CommenterPatrick Vaginus

Agreed that the cache hitting rate really matters. In the same line is a discussion on which basic data structure in C++ is better for an "obvious" example "Vector v.s. list" in Bjarne Stroustrup's talk http://www.youtube.com/watch?v=0iWb_qi2-uI (starting around 44:48).

May 22, 2013 | Unregistered CommenterZhiqiang Ma

OK, interesting, didn even thought about that.
But, all modern frameworks & APIs (and yes, IDEs) and laguages make it *fairly* easy to use linked-lists, because all functions are returing usually some type of linked list :-(

May 23, 2013 | Unregistered CommenterVomitorium

It's good advice that many people don't seem to have yet got and is worth repeating.
There are few situations where the performance of linked lists is better than a simple vector. Even the ones that it's supposed to be good for like inserting at the start a vector will often outperform a list.

The main time I use them is when I have to splice together multiple lists quickly. It still seems to have some advantage there,.

May 23, 2013 | Unregistered CommenterJB

It's funny that there's so many people butthurt over the suggestion that Linked Lists aren't performant. Probably because none of them have ever designed a system at scale.

Age and pimple (or pimple free) has nothing to do with how good the idea is, and in this case, the author is correct.

LInked lists have a ton of pointers, and a ton of them are bad for the branching lookahead units. ArrayLists are always in contiguous memory, and pay an amortized growth penalty.

It's not hard to code up an example that thrashes both array types from multiple threads on multiple cores. Spend the 5 minutes and look for yourself.

May 23, 2013 | Unregistered CommenterDB

Linked lists only really make sense when the node pointers are embedded in a larger contiguous data structure e.g. a large block of memory, a queue message etc.; unfortunately the Java Collections framework does not support factories for container elements or nodes, this immediately introduces a level of indirection, a Linked List just compounds this cost with a new node object containing three references. A map can be nearly as inefficient, especially a LinkedHashMap; however the trade-off makes sense there.

May 30, 2013 | Unregistered CommenterInfernoz

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>