The clients use blocking sockets with an automatically refilled socket pool (for disconnects, increasing/decreasing thread counts, etc.).
I wrote two different servers, one using asynchronous sockets and one using blocking sockets. The async version uses an internal state machine for handling the different portions of the read, the threaded version uses sock.sendall(data) and a similar sock.get_bytes(count) (the latter being my addition). Both servers use internal callbacks to handle the case when the tuple you want isn't there.
(the below tests ran on my 1.3 ghz P3 mobile laptop with 768 megs PC2600 DDR, I'm up in San Jose doing work)
Asocket Tsocket local (numbers are per insertion/removal) insert size 1 120us 117us 9.87us insert size 9 230us 206us 118us remove exact 775us 373us 38.8us remove wildcard 914us 503us 158us
It seems as though insertions on a socket suffer around a 110 microsecond latency over the socket for the one-way push. Removals when the server is asynchronous seem to incur a round trip delay of around 740 microseconds, with threaded sockets running with a delay of 340 microseconds.
If one needed proof for socket latency, there you go. Is this still usable? That all depends on your application. Inserting reasonably sized tuples can be done on the order of 4,000-8,500 tuples/second on a machine like my laptop. Removing those same tuples run between 1,100-2,700 tuples/second on a machine like my laptop. A faster machine, multiple processors (server on one processor, tuple inserters/removers on another), etc., could increase speed significantly.
When doing both insertions and removals simultaneously (two processes), insertions generally run twice as fast as removals, unless you increase the priority of the removal process. Considering that the threaded version knows exactly when it needs to be reading and writing from/to the socket, and the asynchronous version checks both readability and writability on every pass, this is understandable.
For the tests I performed, I am seriously processor constrained; with zlib compressing the cPickled data, it ran at less than 1/5 the speed it does now without zlib.
The tuple space implementation I have is actually able to store tuples, lists, and dictionaries. You can match tuples exactly, but lists and dictionaries only by type. That is, you insert ('foo', [1,2,3]), but you can only match it via ('foo', list). Actually supporting this took a bit of thought, but ended up being quite easy to implement.
A custom poller with a readable/writable dictionary (instead of a per-pass call to readable()/writable()) could probably speed up the asynchronous version to near the speed of the threaded version.
Restricting data types may allow for a simplified encoder/decoder, which may be able to beat cPickle.
Unix domain sockets, polled mmap'd files (or other shared memory objects), or perhaps pipes may offer better speed on certain platforms.
The code right now is horribly ugly (I hacked it together this afternoon), but it is functional. Before I even consider offering it up to the general public, I'm going to clean it up significantly and add a bit of documentation.
I had a (not so short) conversation with my employer, and we got to talking about the running time and such. With a rewrite, I could get the insertion/deletion time down to O(len(tuple)), but it would result in only being able to either search for exact matches, or exact up to a prefix with anything afterwards...
(1,2,3) would be accessable via:
What could also be gained is a priority matching, where searching for (1,2) would find the object with an identical prefix, whose remainder has the 'highest priority', essentially being the value of the remaining portion of the tuple. So a search for (1,2) would return (1,2,3) before (1,2,4).
There is the option of not having priorities and various other things.
One interesting feature of this is that except for mutable objects, this offers one the exact same semantics as a python function with variable arguments and keywords.
Another interesting bit is that with "minor" work, I can get rid of that pesky O(time_to_hash_tuple*len(tuple)**2) insertion/deletion time, and replace it with O(time_to_hash_tuple + len(tuple)) insertion/deletion time with an iterative hash function. Whether or not this asymptotic speedup beats the fast (but asymptotically slower) built-in Python functions is another matter entirely.