Tag Archives: STM

The power of STM semantics

Software Transactional Memory is a concept which offers many advantages when writing concurrent code. Most importantly, it removes the burden of writing correct multi-threaded code from the programmer, allowing her to focus more on the logic of the problem domain. STM will automatically take care of correctness. However, this comes with a cost – the bookkeeping. Tracking every access to a transactional field in order to properly detect and resolve conflicts introduces a cost penalty. But, at the same time, it also introduces the possibility to create and expose to the programmer some very interesting new semantics.

Shielded has for some time already contained one such semantic addition – the Shield.Conditional static method. With it one may define a transaction which should run whenever a certain condition is satisfied by the current, committed “state of the world”. By virtue of its bookkeeping, Shielded can determine the exact fields your condition depends on. This way it easily knows, when another transaction commits into one of those fields, that your condition might have changed, and will re-evaluate it. (It will even track your condition for any changes in its access pattern between calls. This is important since, due to boolean expressions’ short-circuit evaluation, it is very easy for a condition to change its access pattern, and that must be taken into account.)

This feature was inspired by the Haskell STM’s “retry” command, but it does not go as far as that. The Haskell “retry” can be called anywhere in a transaction, and it will block it until any of the fields, that it has accessed before calling retry, changes. Shielded does not do any such thing, it is a deliberate design decision to not include any threading or blocking constructs into the library. The idea is to allow Shielded to easily be combined with any such construct, depending on what the programmer wants to use, and not lock the programmer into a decision made by this library. And so, the Conditional does not even try executing straight away, but merely subscribes a transaction for future execution.

The power of Haskell’s retry and orElse can be achieved by using Reactive Extensions. Combining Shielded with Rx.NET should not be a difficult task – create an IObservable by creating a Conditional which will call methods on an IObserver (tip – make such calls to IObserver as SideEffects of the Conditional transaction, otherwise you might be triggering observers repeatedly, and with non-committed data!). Once you have such an IObservable, the powerful Rx.NET library allows you to easily create very complex time-based logic, which can easily achieve what Haskell’s “retry” and “orElse” offer, and much more. We might call the combination of these two libraries a Reactive Transactional Memory system!

A recent addition to the Shielded library is yet another powerful semantic tool – the Shield.PreCommit static method. It is very similar to Conditional, but with one very important difference – it will execute within a transaction that plans to affect its condition, and it will execute just before that transaction attempts to commit.

The PreCommit method creates a lot of interesting possibilities. For a couple of examples, have a look at the PreCommit tests. Most importantly, you get the ability to prevent the other transaction from committing. Like in the NoOdds test, for example, where a PreCommit check looks like this:

Shield.PreCommit(() => (x.Read & 1) == 1, () => {
    Shield.Rollback(false);
});

This will cause any transaction, that tries to commit an odd number into the field x (which is a Shielded<int>, of course), to roll back and fail.

The Rollback(false) method will quietly abort the transaction, which the code that started the transaction might not expect. It would probably be better to throw an exception, like in the Validation test in the same file. It throws an exception whenever a transaction tries to commit incorrect state, thus guaranteeing that an invariant will always hold!

Last, but by far not the least, this introduces the ability to easily and arbitrarily define what Shielded has so far been missing the most – prioritization. It is often pointed out that STM allows for easy prioritization schemes to be grafted on it, completely eliminating the so-called priority inversion problem. In order to achieve this before, you would have to use the Changed event of a Shielded<> object, and prevent lower-priority transactions from committing from that event’s handler. However, this would require Shielded to expose events for any single piece of information which may be interesting. Instead, by using Shield.PreCommit, you define conditions which can depend on any piece of transaction-aware information. Then, in the PreCommit body, you may check the priority and perhaps block a transaction from committing. Here is a snippet from the above linked tests:

var x = new Shielded<int>();
int ownerThreadId = -1;
Shield.PreCommit(() => { int a = x; return true; }, () => {
    var threadId = ownerThreadId;
    if (threadId > -1 && threadId != Thread.CurrentThread.ManagedThreadId)
        Shield.Rollback(true);
});

This is perhaps the simplest possible prioritization scheme. The condition of the PreCommit just reads the x field and returns true, meaning it will run in every transaction which makes any kind of write into x. It then checks the ownerThreadId, and if any thread has put its ManagedThreadId into that variable, then only that thread will be able to commit. All others will be repeatedly retried, until the field is released (or they stop trying to write into it). It would be possible to insert a SideEffect call before the Rollback, and have an onRollback lambda in it which would cause the thread to block until a certain signal is issued by the thread that owns the field. That would improve this simple solution to eliminate the busy waiting on non-privileged threads.

The beauty of these schemes is that, both with Conditional and PreCommit, the “other” transactions do not need to know anything about them. Their code is equal regardless of whether a conditional or pre-commit are defined or not. In the prioritization test you can see, for example, that the lower priority transactions have no knowledge about the prioritization scheme baked into them, and yet they fully comply with it.

These powerful semantics, the easily achieved composition and correctness, are arguably the biggest advantage of an STM. I believe that in a project of sufficient complexity, paying the price of STM bookkeeping is worth it.

UPDATE – After reading this post again, the problems with Shield.Rollback(false) calls started to seem a little to serious to allow, so the library no longer supports this method. Rollback receives no arguments, and will always do an immediate retry. This way, anyone can be sure that if the InTransaction method exits normally, the transaction has completed. (Of course, just because your InTransaction call returned, does not mean you are necessarily out of a transaction – nesting…)

An update on STM speed

This continues on the previous post, Benchmarking an STM.

After the last post and those benchmark results, I’ve been busy lately doing various optimizations on Shielded. In the post I announced that I would be changing the Shielded<T>.Assign method to no longer be commutable, due to the cost it incurs. But I’ve also managed to find many other improvements, which you can check out in the project commit log. By far the biggest improvement came from optimizing the process of triggering conditional transactions.

A small but interesting optimization was achieved by just simplifying the commit code. It used to copy the enlisted fields HashSet into a List, instead of iterating over it directly, so that in case of conflict it can iterate backwards, rolling back only the potentially locked fields, and then completing the roll-back out of lock. However, the list allocation and copying is actually slower than any gain there, particularly given that conflicts are not that common. (Seriously, they are quite uncommon. A test launching 10,000 Tasks each trying to increment one of 100 int fields completes with just ~10-15% repetitions due to conflicts.) A fine example of premature optimization.

Here are the new results, with some new data points not collected before (and again, Mono 2.10, Ubuntu 13.10 64 bit, i5-2430M @ 2.4 GHz, 4 GB RAM):

cost of empty transaction = 1.050 us
cost of the closure in InTransaction<T> = 0.190 us
cost of the first read = 0.720 us
cost of an additional read = 0.104 us
cost of Modify after read = 0.970 us
cost of Assign after read = 1.005 us
cost of the first Modify = 1.790 us
cost of an additional Modify = 0.051 us
cost of a Read after Modify = 0.042 us
cost of the first Assign = 1.635 us
cost of an additional Assign = 0.048 us
cost of the first commute = 3.975 us
cost of an additional commute = 0.904 us

Performance is much better on almost all points. Read-transactions are more or less the same, but reducing the calls to Shield.Enlist has reduced repeated access to an already modified field down to ~50 ns, and the cost of the write operations themselves is ~3x smaller. Even the commute is faster, although it is still more expensive.

For comparison, here are the results of the same test, on the same machine, but executed in a virtual Windows machine on Microsoft .NET 4.5:

cost of empty transaction = 0.485 us
cost of the closure in InTransaction<T> = 0.030 us
cost of the first read = 0.210 us
cost of an additional read = 0.120 us
cost of Modify after read = 1.135 us
cost of Assign after read = 1.235 us
cost of the first Modify = 1.320 us
cost of an additional Modify = 0.058 us
cost of a Read after Modify = 0.039 us
cost of the first Assign = 1.245 us
cost of an additional Assign = 0.055 us
cost of the first commute = 1.865 us
cost of an additional commute = 0.235 us

Some numbers are pretty similar, but there are also some striking differences. Empty transactions are twice as fast. Cost of the closure is ridiculously small compared to the same on Mono. And you may notice that every “first” operation on a field is much faster on MS.NET, while any second operation on the same field is roughly equally fast. It matters little whether the second operation is a read or a write, which indicates that the bookkeeping is probably causing the expense. Also interesting is that writes after a read are persistently a little slower on MS.NET, which I can’t explain (but note that the sum of empty trans + first read + a modification, the total time of one simple read-then-write transaction, is still better).

Bookkeeping and the closure are operations that involve allocating objects, so I presume that the fault for the slower score lies in Mono 2.10’s conservative garbage collector. I’m looking forward to Ubuntu 14.04, which will be packing Mono 3.2 with his generational garbage collector (which is included in 2.10 as an option, but is just as slow there, and in some tests much slower) and hopefully fixes for the bugs in concurrent collections.

So, a million empty transactions per second, several hundreds of thousands of simpler transactions per second, and the most complex test, the BetShop (note that it uses structs, it is older than the class proxy-generator feature), running at several tens of thousands. Plus, repeated access to a field is close to negligible. Given the benefits that Software Transactional Memory can bring to more complicates projects, I think this is OK. For now, at least.

Benchmarking an STM

While recently discussing with a colleague about a project we’re working on, I mentioned how it might be a good fit for using an STM. I actually think it is a perfect fit. Working on that project is what spurred my interest in the STM concept itself and inspired me to write Shielded. However, the colleague disagreed, and made a remark which struck me as significant. He’s a talented guy, very pragmatic, with a keen gift for smelling problems way before most programmers. His Titanic would see that fatal iceberg even before leaving Southampton. So, the remark?

“If we use it, every access to a field has to go through it.”

Now, granted, this is not big news. The fact that STM slows things down is well known and is probably what makes the concept unappealing to a lot of programmers. But what particularly struck me is the phrase “every access”. A complex system is sure to make a lot of reads and writes within one transaction, many of those hitting the same fields. Cost of the first access in STM should compare well with the cost of taking an ordinary lock, because an STM is roughly equivalent to perfectly granular locking. (And consider how difficult it is to do perfectly granular locking…) But in locking solutions, additional access to a field is negligible.

So, I decided I should measure and see how much it costs to access a field once, and how much to access it after that, in the same transaction. The test I wrote can be seen here. (Please forgive the mess, the ConsoleTests project is a quick test bed, I don’t bother much with keeping it tidy.) Here are the cost calculation results:

cost of empty transaction = 0.890 us
cost of the closure in InTransaction<T> = 0.270 us
cost of the first read = 0.920 us
cost of an additional read = 0.100 us
cost of the first Modify = 5.470 us
cost of an additional Modify = 0.084 us
cost of a Read after Modify = 0.068 us
cost of the first Assign = 8.370 us
cost of an additional Assign = 0.648 us
cost of the first degenerated Assign = 5.280 us
cost of an additional degenerated Assign = 0.563 us

This is all in microseconds. Only the first non-zero digit should be considered relevant, due to variations between test runs. The test was executed on my laptop – an i5-2430M processor ticking at 2.4 GHz, with 4 GB RAM, running Ubuntu 13.10 (64 bit) and Mono 2.10.8.1.

Empty transactions is a simple measure – just a NOP delegate passed into the InTransaction method. The “cost of the closure in InTransaction<T>” refers to a handy overload of the same method which takes a Func<T> instead of an Action, runs it transactionally and returns it’s final result. This comes in very handy. To do this, it has to create a closure, which it passes to the non-generic InTransaction. The measure indicates the additional cost of creating that closure. I’m not very happy with that score, but there’s nothing I can do about it, it is the result of how this is implemented in Mono. On the plus side, it means some of the key Shielded operations are close to the same order of magnitude as allocating a simple closure.

Further results indicate how much it costs to perform certain operations for the first time, and for every additional time, within a transaction. Note that this is not just the running time of the particular method call – adding a field to a transaction’s footprint means that some pre-commit checking will now also be done, depending on the operation.

The cost of an additional access is roughly an order of magnitude smaller than the cost of the first access. After a Modify call it is even better. This might surprise you, but bear in mind that a write prepares a field’s thread-local storage. This is probably the most expensive part of it, but any future read or write is then executed directly against that local storage, making it faster.

Overall, the results seem OK. Something, however, completely sticks out, and that’s the performance of Assign. As a commutable operation, introducing it into a transaction includes the commute sub-transaction mechanism, and the hit this introduced is big. Also, note how, since commutes involve delegates and the creation of closures (specifically, at least two closures will be allocated), additional Assign calls also remain slow when compared to additional Modify calls, whose delegates in the test were not closing over any local variables. It cannot go below the cost of a closure. When degenerated (when the field was read in the transaction, which disables commuting over it) it executes directly, and it’s first call performance is roughly equal to Modify, as would be expected. But additional calls again remain too slow – again, most likely, due to the closures.

To address this, Assign will be made non-commutable, an ordinary op, which should make it faster than Modify. The current method will remain, only bearing a different name, for use when commutability is actually needed. Commutables are very useful when they don’t degenerate, since the price of a needless conflict is far bigger than these microseconds here.

Let me know in the comments what you think about these results, how you feel about the costs, and if you see something here that I have missed. Would you consider using Shielded?

The Importance of Commutables

While developing the first more complicated tests for Shielded, at a time when the library had just the basics of transactional protection covered, I realized the importance of commutable operations for concurrency, even when not using an STM.

One of the most basic things a test needs – a counter of processed items – was automatically a horrible bottleneck. All the parallel transactions would read the current value, increment it, and store the result. But only one can succeed and write into the counter! All of them were in conflict with each other, fighting over that one field. The end result was that transactions would effectively execute serially. For an STM this meant a lot of useless repetitions, transactions working, then trying to make their commit, and then all but one of them failing and starting over. Not cool.

When using locks in a situation like this, the simplest approach is to wrap the counter incrementing together in the same lock block with the rest of your work, but you must use a global lock to protect a global counter, so then you also have pure serial code. Nobody (?) does that. Instead, you typically keep a more granular lock for the “payload” part of the job, and obtain a global counter lock just during incrementing. Or, if you know what you’re doing, you do this:

lock (itemLock)
{
<payload here>
Interlocked.Increment(ref _counter);
}

This works great. You leave your mark on the counter, regardless of how much competition you have. Perhaps you never noticed, but this works only because the individual increments are commutable – they can be freely reordered in time and still have the same net effect. With the introduction of a commutable, your transactions can now run in parallel (provided that they use different item locks). Congratulations, you’ve got pretty good concurrent code.

Naturally, I wanted to support this in Shielded. To be able to do something like this:

x.Commute((ref int n) => n++);

…and have the increment perform in parallel with other transactions, without causing your transaction to conflict with them. The main advantage of an STM is it’s simplicity of use, but not having something like this, and treating a simple, commutable increment as a conflict, makes it almost useless.

I had already encountered commutables in Clojure’s built-in STM. Clojure executes a commute under the global commit lock, by taking the value encountered in the field at that time and performing the op. This is perfectly safe – no one else is in that lock with you, no conflict possible, works great.

But, the Clojure implementation has two drawbacks. The first is, of course, running arbitrary code under the global lock. From my experience in experimenting with Shielded, small changes in time spent under that lock have noticeable effects on performance. The second drawback is more subtle, but maybe even more serious – if you read from the field after you have defined a commute op on it, Clojure throws an exception. The main advantage of STM, composability, is compromised – if you compose two operations into one, you better not compose one that commutes a field with another that reads from it. They work just fine by themselves, but together, boom.

In Shielded, I wanted to do this differently. For one thing, the commutes are executed just before entering the global lock! They will get a fresh reading stamp, so they can read the latest data, which reduces their chance of conflict. If they still conflict, only they get retried.

The other difference – if after defining a commute you read from that field, Shield detects this (as an STM, the library has to be notified of access to a field anyway) and automatically executes the commute sooner, right then and there, and you see it’s result. This guarantees safe composability. However, the commute is not a commute any more. It’s not a commutable increment if you know the result is 4. If another transaction jumps in and writes in 4 before you commit, your increment would be lost unless we repeat it. And since you read and used the number in your transaction, the entire transaction is compromised. The key is not to look – we can repeat just the commute, but only if it was not read.

Using Shielded, any kind of commutable operation can be defined, and the system takes care of consistency. If the rest of the transaction does not interfere with the operation, it will behave as a commute, minimizing conflicts. If the rest of the transaction does interfere, a commute is gracefully reduced into an ordinary part of your transaction, ensuring consistency.

For examples of various commutes, and how it all behaves in Shielded, check out the method ComplexCommute() in BasicTests.cs. You may notice how methods Append() and Clear() of a ShieldedSeq are both defined as commutables – they won’t conflict if you don’t read the sequence. That means no conflicts when adding things to queues.

Choosing the optimal granularity for locks, not forgetting to take a lock wherever it is needed, avoiding deadlocks – all of these problems disappear when using an STM. It just works, safely and consistently. And, with commutable operations support added in the mix, your code is automatically as concurrent as logically possible!