Fork from github.com/phf/go-queue with exposed internal list
|
|
преди 8 години | |
|---|---|---|
| queue | преди 8 години | |
| LICENSE | преди 8 години | |
| RANT.md | преди 8 години | |
| README.md | преди 8 години |
A double-ended queue (aka "deque") built on top of a slice. All operations except pushes are constant-time; pushes are amortized constant-time. Benchmarks compare favorably to container/list as well as to Go's channels.
I tried to stick to the conventions
container/list seems to
follow even though I disagree with them (see
RANT.md
for details).
In other words, it's ready for the standard library (hah!).
In 2013 I was hacking a breadth-first search in Go and needed a queue, but all I could find in the standard library was container/list.
Now in principle there's nothing wrong with container/list, but I had just admonished my students to always think carefully about the number of memory allocations their programs make. In other words, it felt wrong for me to use a data structure that allocates memory for every single vertex we visit during a breadth-first search.
After I got done with my project, I decided to clean up the queue code a little and to push it here to give everybody else what I really wanted to find in the standard library: A queue abstraction that doesn't allocate memory on every single insertion.
The benchmarks are not very sophisticated but we seem to be almost twice as fast as container/list (speedup of 1.85-1.93). We're also a bit faster than Go's channels (speedup of 1.38). Here are the numbers from my (ancient) home machine:
$ go test -bench . -benchmem
BenchmarkPushFrontQueue-2 20000 85886 ns/op 40944 B/op 1035 allocs/op
BenchmarkPushFrontList-2 10000 158998 ns/op 57392 B/op 2049 allocs/op
BenchmarkPushBackQueue-2 20000 85189 ns/op 40944 B/op 1035 allocs/op
BenchmarkPushBackList-2 10000 160718 ns/op 57392 B/op 2049 allocs/op
BenchmarkPushBackChannel-2 10000 117610 ns/op 24672 B/op 1026 allocs/op
BenchmarkRandomQueue-2 10000 144867 ns/op 45720 B/op 1632 allocs/op
BenchmarkRandomList-2 5000 278965 ns/op 90824 B/op 3243 allocs/op
PASS
ok github.com/phf/go-queue/queue 12.472s
$ go version
go version go1.7.5 linux/amd64
$ uname -p
AMD Athlon(tm) 64 X2 Dual Core Processor 6000+
$ date
Sat Apr 22 11:26:40 EDT 2017
(Note that the number of allocations seems off: since we grow by doubling we should only allocate memory O(log n) times.) The same benchmarks on my (slightly more recent) laptop:
$ go test -bench=. -benchmem
PASS
BenchmarkPushFrontQueue-4 10000 107377 ns/op 40944 B/op 1035 allocs/op
BenchmarkPushFrontList-4 10000 205141 ns/op 57392 B/op 2049 allocs/op
BenchmarkPushBackQueue-4 10000 107339 ns/op 40944 B/op 1035 allocs/op
BenchmarkPushBackList-4 10000 204100 ns/op 57392 B/op 2049 allocs/op
BenchmarkPushBackChannel-4 10000 174319 ns/op 24672 B/op 1026 allocs/op
BenchmarkRandomQueue-4 10000 190498 ns/op 45720 B/op 1632 allocs/op
BenchmarkRandomList-4 5000 364802 ns/op 90825 B/op 3243 allocs/op
ok github.com/phf/go-queue/queue 11.881s
$ go version
go version go1.6.2 linux/amd64
$ cat /proc/cpuinfo | grep "model name" | uniq
model name : AMD A10-4600M APU with Radeon(tm) HD Graphics
$ date
Fri Apr 28 17:20:57 EDT 2017
So that's a speedup of 1.90 over container/list and of 1.62 over Go's channels. The same benchmarks on an old Raspberry Pi Model B Rev 1:
$ go test -bench . -benchmem
PASS
BenchmarkPushFrontQueue 2000 788316 ns/op 16469 B/op 12 allocs/op
BenchmarkPushFrontList 1000 2629835 ns/op 33904 B/op 1028 allocs/op
BenchmarkPushBackQueue 2000 776663 ns/op 16469 B/op 12 allocs/op
BenchmarkPushBackList 1000 2817162 ns/op 33877 B/op 1028 allocs/op
BenchmarkPushBackChannel 2000 1229474 ns/op 8454 B/op 1 allocs/op
BenchmarkRandomQueue 2000 1325947 ns/op 16469 B/op 12 allocs/op
BenchmarkRandomList 500 4929491 ns/op 53437 B/op 1627 allocs/op
ok github.com/phf/go-queue/queue 17.798s
$ go version
go version go1.3.3 linux/arm
$ cat /proc/cpuinfo | grep "model name"
model name : ARMv6-compatible processor rev 7 (v6l)
$ date
Sat Apr 22 18:04:16 UTC 2017
So that's a speedup of 3.34-3.72 over container/list and of 1.58 over Go's channels. (Also the number of allocations seems to be correct here, maybe the memory allocator in older versions of Go was more sane if less performant?)
Go's channels used to beat our queue implementation by about 22%
for PushBack.
That seemed sensible considering that channels are built into the
language and offer a lot less functionality:
We have to size them correctly if we want to use them as a simple
queue in an otherwise non-concurrent setting, they are not
double-ended, and they don't support "peeking" at the next element
without removing it.
That all changed with
two
commits
in which I replaced the "manual" loop when a queue has to grow with
copy
and the % operations to wrap indices around the slice with
equivalent & operations.
(The code was originally written without these "hacks" because I wanted to
show it to my "innocent" Java students.)
Those two changes really paid off.
(I used to call channels "ridiculously fast" before and recommended their use in situations where nothing but performance matters. Alas that may no longer be good advice. Either that, or I am just benchmarking incorrectly.)
Hacking queue data structures in Go seems to be a popular way to spend an evening. Kudos to...
If you find something in my code that helps you improve yours, feel free to run with it!