Josiah Carlson (chouyu_31) wrote,
Josiah Carlson

Tuple Spaces (part 3)

There is an O(len(tuple) + time(hash(tuple))) algorithm for both put and get for the following option:

(option 1):
(1,2,3,4) available via...

This offers exact prefix matching, without worrying about the length of the remaining portion of the tuple. One can further offer fifo or 'random' (dictionary) input/output orderings without increasing asymptotic running time. For priority scheduling based on first (int or float, or even long in a particular numeric range) wildcard, it would add O(len(tuple) * log(#tuples)) to insertion/removal times.

There are also options 2 and 3 (below), which with certain precautions, while not technically having the same asymptotic running time as option 1 above, for all practical purposes is equivalent (it relies on the uniqueness of 64+ bit* hashes)

(option 2 typed/untyped):
(1,2,3,4) available via...

(option 3 typed/untyped):
(1,2,3,4) available via...

Note: there may be other wildcard options which are possible to handle reasonably fast as the above, if someone suggests something reasonable, I'll see what I can do.

Option 2 is essentially that you can match a tuple specified length, with some prefix matched exactly and the last portion matched either by type or not matched at all. Option 3 offers you the ability for typed or untyped wildcards in any single position in a specified length tuple with exact matches everywhere else. Note that a suffix version of option 1 or 2 doesn't get you anything more than the prefix matching would.

There are also obvious algorithms for handling Kamaelia-style subscription channels (broadcast to all listeners), persistant tuples (sent to all new listeners of a channel when they subscribe, but persists until deleted via special mechanism), persistant tuples that are not removed by standard get() (must be deleted via special mechanism), and probably a few other useful things which don't spring to mind at the moment.

* It is actually two 32 bit hashes concatenated together with two particular parameters from the tuple. It turns out that after generating various sets of 1+ million random tuples (length 1...9, of integers in the range 1...1000), there tends to be only around 10-15 nonmatching tuple collisions when there are 500k tuples in the tuple space. This suggests that while the hash functions aren't perfect (possibly even unreasonable for birthday collisions), they do reasonably well even when the tuple space is pretty packed (25-35 at 1 million tuples, and 60-70 at 1.3 million tuples). These should probably be at least a factor of a thousand over what most people will have in their tuples space at any one time (it sucks up 700+ megs on my laptop of tuples of integers...). I figure we could call it good.

  • Post a new comment


    Comments allowed for friends only

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded