Josiah Carlson (chouyu_31) wrote,
Josiah Carlson

Tuple Spaces

After a bit of a conversation about multi processing/parallel processing/threading in Python on python-dev, I pointed out Linda/PyLinda as a good model for inclusion into the standard library.

Once I had read a bit more of the PyLinda documentation and perusing the source code, I thought to myself, "Self, if his tuplespace is slow, why don't you make a fast one? That's a good question Self, maybe I should!"

After hacking together a very simple prototype which only supports wildcard matches on the right side *, I'm pretty happy with it. Now there still needs to be a server component (easy) with support for pluggable persistant storage backends, and an interface which supports blocking, non-blocking, and timed tuple requests (I've already built the data structures necessary for such functionality in other projects). The client should be thread safe, though multiplexing multiple threaded requests on a single async socket would be a pain in the ass...I'll probably use a connection pool. A work in progress, to be sure.

How does it perform? The set of tuples I have is a random assortment of random lengths between 1 and 9 (average length for this sample: 5.00)
                      mine    PyLinda
Inserting:            6.09      5.89
Deleting (exact):     3.59      2.20
Deleting (wildcard):  8.83    972.2

Insert 1-tuple:       9.76us   1.34us
Insert 9-tuple:      96.6us   72.9us

PyLinda uses a Trie to embed the tuples themselves one element at a time. Mine uses a two-level dictionary with the original elements themselves, as well as various pre-processed versions of the tuple for template matching*. If all you ever do is insert and remove exact tuples, PyLinda will probably be faster due to it only ever embedding the exact tuple. If, on the other hand, you use template matching to extract a call signature (tuple_space.get(('service_1', int, int, float))), because PyLinda will perform a DFS on the trie through matching/partially matching branches, it may be slow if you have more than a handful of entries which are close to matches. Mine is fast on extraction regardless of tuple length and number of entries already in the table**.

One interesting thing is that PyLinda jumps by a factor of 50+ when inserting a tuple of length 9 compared with a tuple of length 1. I would imagine this is caused by the relatively high number of new trie nodes that need to be constructed with a length 9 tuple compared with a length 1 tuple.

* The tuple (1, 2, 3, 4) is available via:
(1, 2, 3, 4)
(1, 2, 3, int)
(1, 2, int, int)
(1, int, int, int)
(int, int, int, int)

Sorry, those wildcards can't be just anywhere, only on the right side...

** Technically insertion and deletion is quadratic in the length of the tuple, resulting from the need to construct x tuples of length x during insertion and removal of a tuple. With a little extra memory, one can reduce the effect of that quadratic term if there are many duplicate tuples. I personally don't see it as necessary, considering the quadratic term has a very small constant, and that one is more likely to be dealing with a large number of tuples, rather than large tuples themselves.
Tags: algorithms, code

  • Parsing Python

    As work continues on the next version of PyPE, I've discovered a few cases where having the AST for the module being edited is actually the most…

  • Python AES

    I've recently found myself in desire of a pure Python AES implementation. I had started writing one a while ago, but I dropped it because the…

  • Firefox and speed

    I totally know people who are (or who know people) famous to geeks... Early last summer, my friend Andreas came to me with a question. Since…

  • Post a new comment


    Comments allowed for friends only

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded