Benchmarking an STM

While recently discussing with a colleague about a project we’re working on, I mentioned how it might be a good fit for using an STM. I actually think it is a perfect fit. Working on that project is what spurred my interest in the STM concept itself and inspired me to write Shielded. However, the colleague disagreed, and made a remark which struck me as significant. He’s a talented guy, very pragmatic, with a keen gift for smelling problems way before most programmers. His Titanic would see that fatal iceberg even before leaving Southampton. So, the remark?

“If we use it, every access to a field has to go through it.”

Now, granted, this is not big news. The fact that STM slows things down is well known and is probably what makes the concept unappealing to a lot of programmers. But what particularly struck me is the phrase “every access”. A complex system is sure to make a lot of reads and writes within one transaction, many of those hitting the same fields. Cost of the first access in STM should compare well with the cost of taking an ordinary lock, because an STM is roughly equivalent to perfectly granular locking. (And consider how difficult it is to do perfectly granular locking…) But in locking solutions, additional access to a field is negligible.

So, I decided I should measure and see how much it costs to access a field once, and how much to access it after that, in the same transaction. The test I wrote can be seen here. (Please forgive the mess, the ConsoleTests project is a quick test bed, I don’t bother much with keeping it tidy.) Here are the cost calculation results:

cost of empty transaction = 0.890 us
cost of the closure in InTransaction<T> = 0.270 us
cost of the first read = 0.920 us
cost of an additional read = 0.100 us
cost of the first Modify = 5.470 us
cost of an additional Modify = 0.084 us
cost of a Read after Modify = 0.068 us
cost of the first Assign = 8.370 us
cost of an additional Assign = 0.648 us
cost of the first degenerated Assign = 5.280 us
cost of an additional degenerated Assign = 0.563 us

This is all in microseconds. Only the first non-zero digit should be considered relevant, due to variations between test runs. The test was executed on my laptop – an i5-2430M processor ticking at 2.4 GHz, with 4 GB RAM, running Ubuntu 13.10 (64 bit) and Mono 2.10.8.1.

Empty transactions is a simple measure – just a NOP delegate passed into the InTransaction method. The “cost of the closure in InTransaction<T>” refers to a handy overload of the same method which takes a Func<T> instead of an Action, runs it transactionally and returns it’s final result. This comes in very handy. To do this, it has to create a closure, which it passes to the non-generic InTransaction. The measure indicates the additional cost of creating that closure. I’m not very happy with that score, but there’s nothing I can do about it, it is the result of how this is implemented in Mono. On the plus side, it means some of the key Shielded operations are close to the same order of magnitude as allocating a simple closure.

Further results indicate how much it costs to perform certain operations for the first time, and for every additional time, within a transaction. Note that this is not just the running time of the particular method call – adding a field to a transaction’s footprint means that some pre-commit checking will now also be done, depending on the operation.

The cost of an additional access is roughly an order of magnitude smaller than the cost of the first access. After a Modify call it is even better. This might surprise you, but bear in mind that a write prepares a field’s thread-local storage. This is probably the most expensive part of it, but any future read or write is then executed directly against that local storage, making it faster.

Overall, the results seem OK. Something, however, completely sticks out, and that’s the performance of Assign. As a commutable operation, introducing it into a transaction includes the commute sub-transaction mechanism, and the hit this introduced is big. Also, note how, since commutes involve delegates and the creation of closures (specifically, at least two closures will be allocated), additional Assign calls also remain slow when compared to additional Modify calls, whose delegates in the test were not closing over any local variables. It cannot go below the cost of a closure. When degenerated (when the field was read in the transaction, which disables commuting over it) it executes directly, and it’s first call performance is roughly equal to Modify, as would be expected. But additional calls again remain too slow – again, most likely, due to the closures.

To address this, Assign will be made non-commutable, an ordinary op, which should make it faster than Modify. The current method will remain, only bearing a different name, for use when commutability is actually needed. Commutables are very useful when they don’t degenerate, since the price of a needless conflict is far bigger than these microseconds here.

Let me know in the comments what you think about these results, how you feel about the costs, and if you see something here that I have missed. Would you consider using Shielded?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s