Fork from github.com/phf/go-queue with exposed internal list

Peter Froehlich 56a3a60120 Slightly refined rant.		8 years ago
queue	e9a553b868 RPi benchmark, tiny go fmt fix.	8 years ago
LICENSE	c4fbaaf69f Added license.	12 years ago
RANT.md	56a3a60120 Slightly refined rant.	8 years ago
README.md	e9a553b868 RPi benchmark, tiny go fmt fix.	8 years ago

Queue data structure for Go

A double-ended queue (aka "deque") built on top of a slice. All operations except pushes are constant-time; pushes are amortized constant-time. Benchmarks compare favorably to container/list as well as channels (see below).

I tried to stick close to the conventions container/list seems to follow even though I disagree with several of them (see RANT.md). In other words, it's ready for the standard library (hah!).

Background

In 2013 I was hacking a breadth-first search in Go and needed a queue, but all I could find in the standard library was container/list.

Now in principle there's nothing wrong with container/list, but I had just admonished my students to always think carefully about the number of memory allocations their programs make. In other words, it felt wrong for me to use a data structure that will allocate memory for every single vertex we visit during a breadth-first search.

After I got done with my project, I decided to clean up the queue code a little and to push it here to give everybody else what I really wanted to find in the standard library: A queue abstraction that doesn't allocate memory on every single insertion.

Performance comparison

The benchmarks are not very sophisticated but we seem to be almost twice as fast as container/list (speedup of 1.85-1.93). We're also a bit faster than Go's channels (speedup of 1.38). Anyway, here are the (latest) numbers:

$ go test -bench . -benchmem
BenchmarkPushFrontQueue-2    	   20000	     85886 ns/op	   40944 B/op	    1035 allocs/op
BenchmarkPushFrontList-2     	   10000	    158998 ns/op	   57392 B/op	    2049 allocs/op
BenchmarkPushBackQueue-2     	   20000	     85189 ns/op	   40944 B/op	    1035 allocs/op
BenchmarkPushBackList-2      	   10000	    160718 ns/op	   57392 B/op	    2049 allocs/op
BenchmarkPushBackChannel-2   	   10000	    117610 ns/op	   24672 B/op	    1026 allocs/op
BenchmarkRandomQueue-2       	   10000	    144867 ns/op	   45720 B/op	    1632 allocs/op
BenchmarkRandomList-2        	    5000	    278965 ns/op	   90824 B/op	    3243 allocs/op
PASS
ok  	github.com/phf/go-queue/queue	12.472s
$ go version
go version go1.7.5 linux/amd64
$ uname -p
AMD Athlon(tm) 64 X2 Dual Core Processor 6000+
$ date
Sat Apr 22 11:26:40 EDT 2017

(The number of allocations seems off, since we grow by doubling we should only allocate memory O(log n) times.)

The same benchmarks on an old Raspberry Pi Model B Rev 1:

$ go test -bench . -benchmem
PASS
BenchmarkPushFrontQueue     2000            788316 ns/op           16469 B/op         12 allocs/op
BenchmarkPushFrontList      1000           2629835 ns/op           33904 B/op       1028 allocs/op
BenchmarkPushBackQueue      2000            776663 ns/op           16469 B/op         12 allocs/op
BenchmarkPushBackList       1000           2817162 ns/op           33877 B/op       1028 allocs/op
BenchmarkPushBackChannel    2000           1229474 ns/op            8454 B/op          1 allocs/op
BenchmarkRandomQueue        2000           1325947 ns/op           16469 B/op         12 allocs/op
BenchmarkRandomList          500           4929491 ns/op           53437 B/op       1627 allocs/op
ok      github.com/phf/go-queue/queue   17.798s
$ go version
go version go1.3.3 linux/arm
$ cat /proc/cpuinfo | grep model
model name      : ARMv6-compatible processor rev 7 (v6l)
$ date
Sat Apr 22 18:04:16 UTC 2017

Here we're over three times faster than container/list and almost 60% faster than Go's channels. (Also the number of allocations seems to be correct. And in terms of raw performance, Go's memory allocator seems to have improved quite a bit in later versions.)

Go's channels as queues

Go's channels used to beat our queue implementation by about 22% for PushBack. That seemed sensible considering that channels are built into the language and offer a lot less functionality: We have to size them correctly if we want to use them as a simple queue in an otherwise non-concurrent setting, they are not double-ended, and they don't support "peeking" at the next element without removing it. Apparently replacing the "manual" loop when a queue has to grow with copy has paid off.

(In fact I used to call channels "ridiculously fast" before and recommended their use in situations where nothing but performance matters. Alas that may no longer be good advice. Either that, or I am just benchmarking incorrectly.)

Kudos

Hacking queue data structures in Go seems to be a popular way to spend an evening. Kudos to...

Rodrigo Moraes for posting this gist which reminded me of Go's copy builtin and a similar trick I had previously used in Java.
Evan Huus for sharing his queue which reminded me of the old "replace % by &" trick I had used many times before.
Dariusz Górecki for his commit to Evan's queue that simplified Rodrigo's snippet and hence mine.

If you find something in my code that helps you improve yours, feel free to run with it!

README.md

Queue data structure for Go

Background

Performance comparison

Go's channels as queues

Kudos