Monthly Archives: August 2016

Comparing STM with optimal locking

Software Transactional Memory (STM) is a useful, but expensive tool. It eliminates whole classes of problems inherent to multi-threaded programming, making the developer’s work easier, but its overhead is too much for many projects. The cost of keeping track of access to transactional fields is high, particularly for performance-critical systems, or for all those whose safety can be ensured by simple locking. To better understand its potential applicability, it may be worthwhile to look more closely at how STM compares to a lock-based solution. This post will try to do that, specifically looking at the behavior of the Shielded STM library.

The run of an STM based program will, in effect, be equivalent to an optimally granular locking solution. If you observe the simplest case, with two threads “simultaneously” writing in one field, only one will succeed from the first try – the one that goes into the commit check first. (This is specific to commit-time locking variants of STM, like Shielded.) If we observe only those transaction repetitions that end successfully, we can see that they must execute serially. If their runtimes overlapped, and they were writing into the same field, the conflict would be detected by the STM and one of those overlapping runs would be discarded. The runs that leave an effect are thus clearly serialized.

How do we know STM is so optimal? It automatically detects conflicts, and only discards those runs that actually conflict with another, successful one. If two transactions access different fields, they will run with only minimal interactions, and commit from the first go. To achieve this with locking, you must do it with optimal granularity. If you lock too broadly, it will cause needless waiting.

The main comparative costs of the STM solution are the need for tracking access to fields, and, of course, the cost of the runs that end up discarded. Both of these are non-existent in a lock-based solution. Taking and releasing locks is generally fast, after which any access is roughly as fast as in non-concurrent code. And the discarded runs are gone, replaced with sleeping threads – a lock based solution allows just one thread to run at a time, while others wait for the locks to be released.

Since Shielded does MVCC and commit-time locking, Shielded is best compared to an optimal locking solution that employs read-write locks. This follows from the fact that reads generally proceed unobstructed (in MVCC, writes just create new versions…), and that read-only transactions always complete without repetitions. A locking solution can only come close to this by using read-write locks, allowing multiple readers to run in parallel, and blocking only when one of them wants to write. This will, of course, further complicate things. And it’s important to note that a read-write lock puts everyone to sleep when one thread wants to write, while Shielded can still allow parallel read-only transactions to proceed, with only minor blocks under some circumstances, necessary to guarantee consistent reading.

But the equivalence between STM and optimally granular locking does not actually hold, because a locking solution can only be as optimal as an STM solution, or worse. Naturally, I’m ignoring the cost of needless runs here, but bear with me – it is still interesting to note that achieving optimally granular locking is not only hard, it is sometimes impossible. The simplest example – a transaction that reads one field, and then decides whether to access another field based on the first field’s value. Locking at access time cannot be safely done “manually”. The existence of another transaction, which does the same but accesses the fields in the opposite order, will result in deadlocks. In practice, you will lock both fields up front, needlessly blocking some runs of these transactions which would not have actually conflicted, or perhaps resort to some optimistic approach, retrying the whole thing if you end up needing more locks than you took. Neither approach is as good at correctly minimally serializing these transactions as an STM would be.

To top it off, both of the above approaches suffer from the classic problems of lock based solutions – weak controls on developer behavior, allowing access to fields without taking all the needed locks, and the inability to easily compose smaller transactions into bigger ones, which is probably the most touted capability of STM. No matter how many smaller operations you stuff together in a big one, with an STM conflicts will be correctly determined and the transactions will run as parallel as possible.

The issue of needless runs is important, but locks have an issue of their own in sleeping threads. An interesting difference between Shielded transactions and ordinary lock-based ones is when you consider a simple example of several threads trying to run the same transaction in parallel. Since Shielded locks only at commit-time, they will all get to run. The one that enters the commit check first, wins. A lock-based solution takes locks for a longer span of time, typically taking them up front and holding until fully complete. Should the thread that holds the locks be blocked, all of them are blocked. The likelihood of this happening is much higher with a regular lock-based solution. Shielded does use locks (one global regular lock, and individual partial locks, version-based, per field), but for a shorter time, and never during a transaction run. So most of the time, exactly by allowing useless runs, STM gives advantage to threads that are currently running, which may increase overall performance. But this gain is significantly offset by the very time spent on the needless runs.

Very contentious systems will profit from utilizing CPU time more optimally when using locks, than when using Shielded transactions. This should be obvious since the needless runs, though maybe speeding the resolution of one conflict, are slowing the resolution of another if they crowd out non-conflicting, “useful” threads from the CPU. Other STM implementations, specifically the ones using encounter-time locking, do not suffer as much from this problem, but they cannot provide the same level of unobstructed reading, which directly depends on not taking locks for any longer time than needed.

In conclusion, the overall picture that all of this paints is not very favorable for STM. It isn’t the concurrent programming panacea it was once believed to be. But I hope this text illustrates some of the specifics of its behavior well enough to make clear the trade-offs that come with the decision to use STM. The advantages are clear – much easier writing of correct concurrent code, effortlessly achieving optimal parallelization. Complex multi-threaded systems may reap benefits in reduced development effort. Yet the cost is there – transactional code is slower, and the useless runs hurt performance under high contention. The sweet spot for STM are primarily systems which can afford a performance hit in exchange for easier and safer multi-threading, and also systems with complex transactions whose varying access patterns are difficult to predict. Perhaps anytime you find yourself doing canonic lock ordering, you might consider using an STM instead. All in all, it is a very useful tool, but one that should be applied only where it makes sense.