The ravings of a sane person.

Sometimes filled with information.

  • 1
Python usually sticks to one implementation that works well for common cases. You call list.sort() and most people don't need to stop and think about what sorting algorithm it uses. And of course list and hash are like the swiss army knives of Python data structures, although the new collections module shows there's room for some more specialized/optimized data types.

Certainly, people using Python tend to use what is available. But Python typically includes some fairly optimized implementations of data structures, and that the module isn't very optimized is surprising.


You've got WAY too much time on your hands to be writing all this.

I wrote it at work, as a discussion about what I'm going to be doing with my 20% time that Google gives us to work on free software.

Also, I spend at least an hour outside every day. Riding my bicycle to/from work, surfing, and/or just plain walking on the beach.

I'd have to see actual runtimes to be sure, but I suspect in a compiled programming language the "fast but not great method" (lazy deletion with periodic cleaning) beats the solution of using a binary heap together with a hash table to keep track of heap locations. A lot of data structure research seems to be done in unnecessary fear of hash tables, but if you have to do a hash table access every time you swap two items in your heap, that seems like a big slowdown. For Python, of course, a hash table access is as fast as any other single operation, so less of an issue.

In real-world performance, using a hash table to keep track of locations is definitely a loss in comparison to "fast but not great". And in the case of Python, it wouldn't be so bad, except that the heapq module has a C implementation that only works with list objects as the heap, and is easily 4x faster than the Python implementation. Toss in the simplicity of the "fast but not great" method, and the choice in algorithms/structures becomes clear.

  • 1

Log in