Fork from github.com/phf/go-queue with exposed internal list

Peter H. Froehlich 4e3c0348ab Updated year. 8 years ago
queue e9a553b868 RPi benchmark, tiny go fmt fix. 8 years ago
LICENSE 4e3c0348ab Updated year. 8 years ago
RANT.md 56a3a60120 Slightly refined rant. 8 years ago
README.md 3a80a3b421 Some copy-editing, benchmarks from laptop. 8 years ago

README.md

Queue data structure for Go

GoDoc Go Report Card

A double-ended queue (aka "deque") built on top of a slice. All operations except pushes are constant-time; pushes are amortized constant-time. Benchmarks compare favorably to container/list as well as to Go's channels.

I tried to stick close to the conventions container/list seems to follow even though I disagree with several of them (see RANT.md). In other words, it's ready for the standard library (hah!).

Background

In 2013 I was hacking a breadth-first search in Go and needed a queue, but all I could find in the standard library was container/list.

Now in principle there's nothing wrong with container/list, but I had just admonished my students to always think carefully about the number of memory allocations their programs make. In other words, it felt wrong for me to use a data structure that allocates memory for every single vertex we visit during a breadth-first search.

After I got done with my project, I decided to clean up the queue code a little and to push it here to give everybody else what I really wanted to find in the standard library: A queue abstraction that doesn't allocate memory on every single insertion.

Performance

The benchmarks are not very sophisticated but we seem to be almost twice as fast as container/list (speedup of 1.85-1.93). We're also a bit faster than Go's channels (speedup of 1.38). Anyway, here are some numbers from my old home machine:

$ go test -bench . -benchmem
BenchmarkPushFrontQueue-2    	   20000	     85886 ns/op	   40944 B/op	    1035 allocs/op
BenchmarkPushFrontList-2     	   10000	    158998 ns/op	   57392 B/op	    2049 allocs/op
BenchmarkPushBackQueue-2     	   20000	     85189 ns/op	   40944 B/op	    1035 allocs/op
BenchmarkPushBackList-2      	   10000	    160718 ns/op	   57392 B/op	    2049 allocs/op
BenchmarkPushBackChannel-2   	   10000	    117610 ns/op	   24672 B/op	    1026 allocs/op
BenchmarkRandomQueue-2       	   10000	    144867 ns/op	   45720 B/op	    1632 allocs/op
BenchmarkRandomList-2        	    5000	    278965 ns/op	   90824 B/op	    3243 allocs/op
PASS
ok  	github.com/phf/go-queue/queue	12.472s
$ go version
go version go1.7.5 linux/amd64
$ uname -p
AMD Athlon(tm) 64 X2 Dual Core Processor 6000+
$ date
Sat Apr 22 11:26:40 EDT 2017

(The number of allocations seems off, since we grow by doubling we should only allocate memory O(log n) times.) The same benchmarks on a more recent laptop:

$ go test -bench=. -benchmem
PASS
BenchmarkPushFrontQueue-4 	   10000	    107377 ns/op	   40944 B/op	    1035 allocs/op
BenchmarkPushFrontList-4  	   10000	    205141 ns/op	   57392 B/op	    2049 allocs/op
BenchmarkPushBackQueue-4  	   10000	    107339 ns/op	   40944 B/op	    1035 allocs/op
BenchmarkPushBackList-4   	   10000	    204100 ns/op	   57392 B/op	    2049 allocs/op
BenchmarkPushBackChannel-4	   10000	    174319 ns/op	   24672 B/op	    1026 allocs/op
BenchmarkRandomQueue-4    	   10000	    190498 ns/op	   45720 B/op	    1632 allocs/op
BenchmarkRandomList-4     	    5000	    364802 ns/op	   90825 B/op	    3243 allocs/op
ok  	github.com/phf/go-queue/queue	11.881s
$ go version
go version go1.6.2 linux/amd64
$ cat /proc/cpuinfo | grep "model name" | uniq
model name	: AMD A10-4600M APU with Radeon(tm) HD Graphics
$ date
Fri Apr 28 17:20:57 EDT 2017

So that's a speedup of 1.90 over container/list and of 1.62 over Go's channels. The same benchmarks on an old Raspberry Pi Model B Rev 1:

$ go test -bench . -benchmem
PASS
BenchmarkPushFrontQueue     2000            788316 ns/op           16469 B/op         12 allocs/op
BenchmarkPushFrontList      1000           2629835 ns/op           33904 B/op       1028 allocs/op
BenchmarkPushBackQueue      2000            776663 ns/op           16469 B/op         12 allocs/op
BenchmarkPushBackList       1000           2817162 ns/op           33877 B/op       1028 allocs/op
BenchmarkPushBackChannel    2000           1229474 ns/op            8454 B/op          1 allocs/op
BenchmarkRandomQueue        2000           1325947 ns/op           16469 B/op         12 allocs/op
BenchmarkRandomList          500           4929491 ns/op           53437 B/op       1627 allocs/op
ok      github.com/phf/go-queue/queue   17.798s
$ go version
go version go1.3.3 linux/arm
$ cat /proc/cpuinfo | grep "model name"
model name      : ARMv6-compatible processor rev 7 (v6l)
$ date
Sat Apr 22 18:04:16 UTC 2017

So that's a speedup of 3.34-3.72 over container/list and of 1.58 over Go's channels. (Also the number of allocations seems to be correct here for some reason?)

Go's channels as queues

Go's channels used to beat our queue implementation by about 22% for PushBack. That seemed sensible considering that channels are built into the language and offer a lot less functionality: We have to size them correctly if we want to use them as a simple queue in an otherwise non-concurrent setting, they are not double-ended, and they don't support "peeking" at the next element without removing it.

It all changed with two commits that replaced the "manual" loop when a queue has to grow with copy and the % operations to wrap indices around the slice with equivalent & operations. (The code was originally written without these "hacks" because I wanted to show it to my "innocent" Java students.) Those two changes really paid off.

(I used to call channels "ridiculously fast" before and recommended their use in situations where nothing but performance matters. Alas that may no longer be good advice. Either that, or I am just benchmarking incorrectly.)

Kudos

Hacking queue data structures in Go seems to be a popular way to spend an evening. Kudos to...

If you find something in my code that helps you improve yours, feel free to run with it!