poetix

this time for sure

Tuple Spaces in Python

For the moment I’m pretending PyLinda doesn’t exist, but you don’t have to. If you want a working implementation of the ideas discussed here, it’s probably a good place to start.

I’ve been working on my own tuple space implementation in Python, and the results so far can be found here. Running the script will result in either (Bob, Alice) or (Alice, Bob) being printed to STDOUT. Not much to look at.

What does the script do? A high-level account would go something like this:

  • Create a tuple space - a common store for tuples, that can be searched using simple pattern-matching.
  • Put a tuple into the tuple space that consists of the word Users and an empty tuple: (Users, ())
  • Put a tuple in that consists of the words Userlist ready
  • Start two concurrent processes (let’s call them Bob and Alice).
  • Each process looks to see if there’s a tuple that matches the pattern (Userlist ready). It will block until there is such a tuple.
  • Each process then tries to take a tuple matching the pattern (Users, ANY) out of the tuple space. Because there is only one such tuple, only one will succeed. The other will block until some such tuple becomes available.
  • Whichever process successfully grabbed the Users tuple then puts a tuple into the tuple space that updates the Users list with its own name: (Users, (Bob, )) or (Users, (Alice, )) as may be.
  • Now that there is a tuple in the tuple space matching the pattern (Users, ANY), the second process can retrieve it. It does the same thing the first process did, writing a tuple back into the tuple space that updates the Users list. The list will now read either (Bob, Alice) or (Alice, Bob).
  • Just to round things off nicely, each process will put a tuple into the tuple space saying that it’s finished. The main thread will try to retrieve both of these tuples from the tuple space, and will block until both are available. It will then shut down the thread servicing tuple requests, and the script will terminate.

OK. Now, why?

Tuple spaces provide a co-ordination language that allow multiple processes to exchange information, synchronize with each other, maintain or service worklists and generally…co-ordinate their behaviour. The model I’ve implemented is very simple, and consists of three primitives: tupleOut which writes a tuple into the tuple space, tupleIn which removes a tuple from the tuple space, and tupleRead which gets a copy of a tuple leaving the original in place. There is a little magic involved in the pattern matching, which allows us to provide a template specifying the kind of tuple we want. Whatever tuples match the template provided, only one will be returned and we have no control over which one it will be. As the Alice and Bob example above shows, a degree of nondeterminism is characteristic of the system. At the same time, it’s possible to choreograph interactions between processes in entirely deterministic ways if so desired. We could with relatively little effort ensure that Bob could not add himself to the user list before Alice had done so.

My interest in tuple spaces was initially sparked by reading the discussion of dataflow variables in CTM, and thinking about how that approach to modelling concurrency could be applied in Python. Using tuple spaces for concurrency differs from a message-passing approach in that none of the tuples has an explicit recipient: any process that can access the space can read/remove any tuple in the space. The coupling between processes is thus extremely loose by default; and yet tighter coupling is possible without a great deal of extra effort.

As far as the implementation is concerned, my immediate priority is to improve the efficiency of the search algorithm, which currently finds all the tuples matching a query and then returns only one of them - very wasteful! Next up is improving the way blocking requests are matched with newly-added tuples. Once the core is robust, I want to start wrapping some protocols around it - it needs to start speaking HTTP, so that processes in separate locations can communicate with each other.