Categories
elixir patterns

Actor model

Motivation

One of my resolutions this year is to review my notes from each conference I visit and read/learn/build something. Quite recently I have been for the second time to Lambda Days in Kraków, where I got interested in Elixir. It is functional, what is dear to me but more importantly it is built for concurrent computation. This is achieved by shared nothing architecure and actor model. It means that each actor, or process in Elixir land, is independent and share no memory or disk storage. Communication between processes is achieved by sending messages. This allows building systems that are concurrent with less effort. Since it is so helpful I’ll try to understand what actor is and hopefully explain it here.

This is not a full explanation by any means but rather a primer before I implement my latest project using this computation model. It is possible that my knowledge will be revised along with a new post.

What is the Actor Model

Actor is a model of concurrent computation. It has the following properties or axioms. (I have shuffled them a bit to emphasise messaging as IMHO important part of this model).

  • Can designate how to handle next received message
  • Can create actors
  • Can send messages to actors

Let’s unpack those properties to make it more clear. "Can designate how to handle next received message", so actors communicate with messages. Each actor has an address as well, where messages can be send. And it is up to an actor how will it respond if at all.

"Can create actors" is pretty simple, each actor can spawn other actors if required by performed task.

"Can send messages to actors" as mentioned while describing first axiom communication is done via messages. Actors send messages to each other.

One actor is not really an actor model, as mentioned in one of the articles, actor come in systems.

This is short and simple, it is the jist of it with focus on most important parts of actor model. What I find valuable is the fact that this model brings software failures to the front and forces solution designer to appreciate and expect them.

I find it similar to OOP created by Alan Key and described here

OOP to me means only messaging, local retention and protection and hiding of state-process, and extreme late-binding of all things. It can be done in Smalltalk and in LISP. There are possibly other systems in which this is possible, but I’m not aware of them.

When to use it?

If you are having a task that can be split into stages, may be linked or independent. In such case I find actors more palatable than locks and threading. This is kind of thing that Broadway library for Elixir is trying to solve. Actor model may also be used when thinking about OOP, it might not be possible to implement actors in such way that they are independent at the level this model expects, but thinking in such terms may improve resilience of the project.

Resources

I know I have skimmed this topic and if you are interested please have a look at resources I used to grasp the idea of actor model.

Categories
patterns python

Iterator fun

PyCon US 2017 finished more than a month ago. By the miracle of the technology everyone not able to attend can watch all the talks conveniently in ones own home. So I did, starting with a list of talks recommended in one of the recent episodes of Talk Python To Me podcast.

At this point it’s easy to guess that this post will be about PyCon and most probably about one of the talks I enjoyed. The talk I’d like to focus on is Instagram Keynote delivered by Lisa Guo and Hui Ding. Really good performance and really good slides, in my opinion top notch delivery.

Both speakers have been talking about migration of Instagram from Legacy Python (2.7.x) to Modern Python (>3.5). Massive user base and codebase, outdated third party packages without Python3 support, outdated Django (super old), and Legacy Python. Sounds like a typical software endeavour, maybe except user base being massive. When speaking of challenges Lisa mentioned something that really surprised me. I’m talking about the issue with iterators and builtin function any. Here is the slide she used to illustrate the problem and below is the code in question.

CYTHON_SOURCES = [a.pyx, b.pyx, c.pyx]
builds = map(BuildProcess, CYTHON_SOURCES)
while any(not build.done() for build in builds):
    pending = [build for build in builds if not build.started()]
    <do some work>

This piece of code works fine under Legacy Python but when run under Modern Python first element of builds list is lost in the while loop. builds list inside of while is missing an element. The cause of it is the change of return value of map function in Modern Python from list into iterator. This is something new to me and I really had to know the reason why first value is lost.

Why does it happen?

It would be much easier to reason about this with simpler code example.

Python 3.5.3

>>> m = map(str, [1, 2])
>>> any(m)
>>> print(list(m))

['2']

Puzzling, I was wondering if same thing would happen with Legacy Python. In order to make it work result of map has to be converted into iterator.

Python 2.7.13

>>> m = iter(map(str, [1, 2]))
>>> any(m)
>>> print(list(m))

['2']

This behaviour is shared among Python versions and most likely is intended behaviour, although not documented one I guess. But why does this happen? Answer to this question as usual requires us to look into how Python is implemented. Implementation of any function needs to be examined. So whip up Python/bltinmodule.c file out, it can be any version of Python. Below is mentioned function taken from source of Python 3.5.1 as it’s most recent source I have.

static PyObject *
builtin_any(PyModuleDef *module, PyObject *iterable)
{
    PyObject *it, *item;
    PyObject *(*iternext)(PyObject *);
    int cmp;

    it = PyObject_GetIter(iterable);
    if (it == NULL)
        return NULL;
    iternext = *Py_TYPE(it)->tp_iternext;

    for (;;) {
        item = iternext(it);
        if (item == NULL)
            break;
        cmp = PyObject_IsTrue(item);
        Py_DECREF(item);
        if (cmp < 0) {
            Py_DECREF(it);
            return NULL;
        }
        if (cmp == 1) {
            Py_DECREF(it);
            Py_RETURN_TRUE;
        }
    }
    Py_DECREF(it);
    if (PyErr_Occurred()) {
        if (PyErr_ExceptionMatches(PyExc_StopIteration))
            PyErr_Clear();
        else
            return NULL;
    }
    Py_RETURN_FALSE;
}

Looking at body of any we can tell that there is no difference if you call it with iterator or iterable. Both objects can be iterated on, difference is in what PyObject_GetIter returns when either list or iterator is used.

As we know iterators work tirelessly till exhausted, but list is not an iterator. List is an iterable. List when iterated returns fresh iterator each time, thus is never exhausted. Each for loop operating on a list gets fresh iterator with all the values. In our case important lines are these:

    it = PyObject_GetIter(iterable);  // This retrieves an iterator
    ...
    
    item = iternext(it);  // This consumes an element

any returns True if any element is true. When running on Modern Python any will consume each element up to one that is true. Cause of it is the loop and iternext call that subsequently removes elements from the iterator. This simple example illustrates the behaviour.

Python 3.5.3

>>> m = map(bool, [False, False, True, False])
>>> any(m)
True
>>> print(list(m))
[False]

One other function came to my mind. How would all behave? Mechanics of it seems to be the same as with any. Answer is pretty simple when we look at the source.

static PyObject *
builtin_all(PyModuleDef *module, PyObject *iterable)
{
    PyObject *it, *item;
    PyObject *(*iternext)(PyObject *);
    int cmp;

    it = PyObject_GetIter(iterable);
    if (it == NULL)
        return NULL;
    iternext = *Py_TYPE(it)->tp_iternext;

    for (;;) {
        item = iternext(it);
        if (item == NULL)
            break;
        cmp = PyObject_IsTrue(item);
        Py_DECREF(item);
        if (cmp < 0) {
            Py_DECREF(it);
            return NULL;
        }
        if (cmp == 0) {
            Py_DECREF(it);
            Py_RETURN_FALSE;
        }
    }
    Py_DECREF(it);
    if (PyErr_Occurred()) {
        if (PyErr_ExceptionMatches(PyExc_StopIteration))
            PyErr_Clear();
        else
            return NULL;
    }
    Py_RETURN_TRUE;
}

As we can see this function also loops over iterable using an iterator. Condition however is slightly different as it needs to verify if all elements are true. This will cause all to consume each element up to first false element. Again simple example will illustrate this best.

>>> m = map(bool, [True, True, True, False, True])
>>> all(m)
False
>>> print(list(m))
[True]

It has been bugging me since I watched the video, now I have my closure. I learned a bit while investigating this and I’m happy to share it.

P.S. Yes, it is Legacy Python.