# Queue data structure for Go [![GoDoc](https://godoc.org/github.com/phf/go-queue/queue?status.png)](http://godoc.org/github.com/phf/go-queue/queue) [![Go Report Card](https://goreportcard.com/badge/github.com/phf/go-queue)](https://goreportcard.com/report/github.com/phf/go-queue) A double-ended queue (aka "deque") built on top of a slice. All operations are (amortized) constant time. Benchmarks compare favorably to [container/list](https://golang.org/pkg/container/list/) as well as to Go's channels. Not safe for concurrent use. I tried to stick to the conventions established by [container/list](https://golang.org/pkg/container/list/) even though I disagree with them (see [`RANT.md`](https://github.com/phf/go-queue/blob/master/RANT.md) for details). In other words, this data structure is ready for the standard library (hah!). ## Background In 2013 I was hacking a breadth-first search in Go and needed a queue, but all I could find in the standard library was [container/list](https://golang.org/pkg/container/list/). Now in *principle* there's nothing wrong with [container/list](https://golang.org/pkg/container/list/), but I had just admonished my students to *always* think carefully about the number of memory allocations their programs make. In other words, it felt wrong for me to use a data structure that allocates memory for *every* single vertex we visit during a breadth-first search. After I got done with my project, I decided to clean up the queue code a little and to push it here to give everybody else what I really wanted to find in the standard library: A queue abstraction that doesn't allocate memory on every single insertion. ## Performance **Please read [`BENCH.md`](https://github.com/phf/go-queue/blob/master/BENCH.md) for some perspective. The numbers below are most likely "contaminated" in a way that makes our queues appear *worse* than they are.** Here are the numbers for my (ancient) home machine: ``` $ go test -bench=. -benchmem -count=10 >bench.txt $ benchstat bench.txt name time/op PushFrontQueue-2 97.7µs ± 1% PushFrontList-2 163µs ± 1% PushBackQueue-2 98.0µs ± 1% PushBackList-2 165µs ± 3% PushBackChannel-2 145µs ± 1% RandomQueue-2 172µs ± 1% RandomList-2 292µs ± 1% GrowShrinkQueue-2 121µs ± 1% GrowShrinkList-2 174µs ± 1% name alloc/op PushFrontQueue-2 40.9kB ± 0% PushFrontList-2 57.4kB ± 0% PushBackQueue-2 40.9kB ± 0% PushBackList-2 57.4kB ± 0% PushBackChannel-2 24.7kB ± 0% RandomQueue-2 45.7kB ± 0% RandomList-2 90.8kB ± 0% GrowShrinkQueue-2 57.2kB ± 0% GrowShrinkList-2 57.4kB ± 0% name allocs/op PushFrontQueue-2 1.03k ± 0% PushFrontList-2 2.05k ± 0% PushBackQueue-2 1.03k ± 0% PushBackList-2 2.05k ± 0% PushBackChannel-2 1.03k ± 0% RandomQueue-2 1.63k ± 0% RandomList-2 3.24k ± 0% GrowShrinkQueue-2 1.04k ± 0% GrowShrinkList-2 2.05k ± 0% $ go version go version go1.7.5 linux/amd64 $ cat /proc/cpuinfo | grep "model name" | uniq model name : AMD Athlon(tm) 64 X2 Dual Core Processor 6000+ ``` That's a [speedup](https://en.wikipedia.org/wiki/Speedup) of 1.45-1.70 over [container/list](https://golang.org/pkg/container/list/) and a speedup of 1.48 over Go's channels. We also consistently allocate less memory in fewer allocations than [container/list](https://golang.org/pkg/container/list/). (Note that the number of allocations seems off: since we grow by *doubling* we should only allocate memory *O(log n)* times.) The same benchmarks on one of our department's servers: ``` $ go test -bench=. -benchmem -count=10 >bench.txt $ benchstat bench.txt name time/op PushFrontQueue-8 88.8µs ± 3% PushFrontList-8 156µs ± 5% PushBackQueue-8 88.3µs ± 1% PushBackList-8 159µs ± 2% PushBackChannel-8 132µs ± 2% RandomQueue-8 156µs ± 7% RandomList-8 279µs ±10% GrowShrinkQueue-8 117µs ± 0% GrowShrinkList-8 164µs ± 4% name alloc/op PushFrontQueue-8 40.9kB ± 0% PushFrontList-8 57.4kB ± 0% PushBackQueue-8 40.9kB ± 0% PushBackList-8 57.4kB ± 0% PushBackChannel-8 24.7kB ± 0% RandomQueue-8 45.7kB ± 0% RandomList-8 90.8kB ± 0% GrowShrinkQueue-8 57.2kB ± 0% GrowShrinkList-8 57.4kB ± 0% name allocs/op PushFrontQueue-8 1.03k ± 0% PushFrontList-8 2.05k ± 0% PushBackQueue-8 1.03k ± 0% PushBackList-8 2.05k ± 0% PushBackChannel-8 1.03k ± 0% RandomQueue-8 1.63k ± 0% RandomList-8 3.24k ± 0% GrowShrinkQueue-8 1.04k ± 0% GrowShrinkList-8 2.05k ± 0% $ go version go version go1.7.5 linux/amd64 $ cat /proc/cpuinfo | grep "model name" |uniq model name : Intel(R) Xeon(R) CPU E5440 @ 2.83GHz ``` That's a [speedup](https://en.wikipedia.org/wiki/Speedup) of 1.76-1.80 over [container/list](https://golang.org/pkg/container/list/) and a speedup of 1.49 over Go's channels. The same benchmarks on a *different* department server: ``` $ go test -bench=. -benchmem -count=10 >bench.txt $ benchstat bench.txt name time/op PushFrontQueue-24 89.1µs ± 8% PushFrontList-24 176µs ± 8% PushBackQueue-24 86.8µs ± 5% PushBackList-24 178µs ± 6% PushBackChannel-24 151µs ±12% RandomQueue-24 180µs ±24% RandomList-24 334µs ± 7% GrowShrinkQueue-24 117µs ± 3% GrowShrinkList-24 187µs ± 6% name alloc/op PushFrontQueue-24 40.9kB ± 0% PushFrontList-24 57.4kB ± 0% PushBackQueue-24 40.9kB ± 0% PushBackList-24 57.4kB ± 0% PushBackChannel-24 24.7kB ± 0% RandomQueue-24 45.7kB ± 0% RandomList-24 90.8kB ± 0% GrowShrinkQueue-24 57.2kB ± 0% GrowShrinkList-24 57.4kB ± 0% name allocs/op PushFrontQueue-24 1.03k ± 0% PushFrontList-24 2.05k ± 0% PushBackQueue-24 1.03k ± 0% PushBackList-24 2.05k ± 0% PushBackChannel-24 1.03k ± 0% RandomQueue-24 1.63k ± 0% RandomList-24 3.24k ± 0% GrowShrinkQueue-24 1.04k ± 0% GrowShrinkList-24 2.05k ± 0% $ go version go version go1.7.4 linux/amd64 $ cat /proc/cpuinfo | grep "model name" |uniq model name : Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz ``` That's a [speedup](https://en.wikipedia.org/wiki/Speedup) of 1.86-**2.05** over [container/list](https://golang.org/pkg/container/list/) and a speedup of 1.74 over Go's channels. The same benchmarks on an old [Raspberry Pi Model B Rev 1](https://en.wikipedia.org/wiki/Raspberry_Pi): ``` $ benchstat bench.txt name time/op PushFrontQueue 788µs ±24% PushFrontList 2.74ms ±14% PushBackQueue 1.11ms ± 3% PushBackList 2.73ms ±14% PushBackChannel 1.25ms ± 3% RandomQueue 1.50ms ± 1% RandomList 4.92ms ± 6% GrowShrinkQueue 1.26ms ± 0% GrowShrinkList 2.88ms ± 2% name alloc/op PushFrontQueue 16.5kB ± 0% PushFrontList 33.9kB ± 0% PushBackQueue 16.5kB ± 0% PushBackList 33.9kB ± 0% PushBackChannel 8.45kB ± 0% RandomQueue 16.5kB ± 0% RandomList 53.4kB ± 0% GrowShrinkQueue 24.6kB ± 0% GrowShrinkList 33.9kB ± 0% name allocs/op PushFrontQueue 12.0 ± 0% PushFrontList 1.03k ± 0% PushBackQueue 12.0 ± 0% PushBackList 1.03k ± 0% PushBackChannel 1.00 ± 0% RandomQueue 12.0 ± 0% RandomList 1.63k ± 0% GrowShrinkQueue 20.0 ± 0% GrowShrinkList 1.03k ± 0% $ go version go version go1.3.3 linux/arm $ cat /proc/cpuinfo |grep "model name" model name : ARMv6-compatible processor rev 7 (v6l) ``` That's a [speedup](https://en.wikipedia.org/wiki/Speedup) of **2.46-3.48** over [container/list](https://golang.org/pkg/container/list/) but only a speedup of 1.13 over Go's channels. (Note that I had to manually repeat the benchmarks and then run `benchtest` elsewhere since those features/tools are not available for Go 1.3; however, the number of allocations seems to be correct here for the first time, maybe there's some breakage in the more recent benchmarking framework?) ### Go's channels as queues Go's channels *used* to beat our queue implementation by about 22% for `PushBack`. That seemed sensible considering that channels are built into the language and offer a lot less functionality: We have to size them correctly if we want to use them as a simple queue in an otherwise non-concurrent setting, they are not double-ended, and they don't support "peeking" at the next element without removing it. That all changed with [two](https://github.com/phf/go-queue/commit/5652cbe39198516d853918fe64a4e70948b42f1a) [commits](https://github.com/phf/go-queue/commit/aa6086b89f98eb5cfd8df918e57612271ae1c137) in which I replaced the "manual" loop when a queue has to grow with [copy](https://golang.org/ref/spec#Appending_and_copying_slices) and the `%` operations to wrap indices around the slice with equivalent `&` operations. (The code was originally written without these "hacks" because I wanted to show it to my "innocent" Java students.) Those two changes *really* paid off. (I used to call channels "*ridiculously* fast" before and recommended their use in situations where nothing but performance matters. Alas that may no longer be good advice. Either that, or I am just benchmarking incorrectly.) ## Kudos Hacking queue data structures in Go seems to be a popular way to spend an evening. Kudos to... - [Rodrigo Moraes](https://github.com/moraes) for posting [this gist](https://gist.github.com/moraes/2141121) which reminded me of Go's [copy](https://golang.org/ref/spec#Appending_and_copying_slices) builtin and a similar trick I had previously used in Java. - [Evan Huus](https://github.com/eapache) for sharing [his queue](https://github.com/eapache/queue) which reminded me of the old "replace % by &" trick I had used many times before. - [Dariusz Górecki](https://github.com/canni) for his [commit](https://github.com/eapache/queue/commit/334cc1b02398be651373851653017e6cbf588f9e) to [Evan](https://github.com/eapache)'s queue that simplified [Rodrigo](https://github.com/moraes)'s snippet and hence mine. If you find something in my code that helps you improve yours, feel free to run with it! ## Why use this queue over... - Your own? Well, I spent a reasonable amount of time on this one, making sure that it works well as a general-purpose queue data structure. But go ahead, you can probably do better. - https://github.com/eapache/queue is not double-ended, panics unlike the standard library, has a "strange" `get` that's not really a queue operation - https://gist.github.com/moraes/2141121 is not double-ended, doesn't shrink to free up memory, uses an "extra" `Node` type - https://github.com/ErikDubbelboer/ringqueue/ is not double-ended seems to shrink too early if I am reading the code correctly, uses `%` instead of `&` which (sadly) still seems to make a difference - https://github.com/Workiva/go-datastructures/tree/master/queue is full of semi-arcane concurrency stuff that a simple data structure doesn't need - https://github.com/oleiade/lane uses [container/list](https://golang.org/pkg/container/list/) and thus has the same performance problems, has a bunch of concurrency stuff that a simple data structure doesn't need