Comments on The Perils of Parallel: Transactional Memory in Intel Haswell: The Good, and a Possible Ugly

2- why the lock-based solution need extra copy. in...

2013-01-08T03:57:35.290-07:00

2- why the lock-based solution need extra copy. in fact i suspect the transactional one to have extra copy. all you need is 20bit mask for marking what is locked and what not !?!?!?!?!?!?!

1. Reads outside locked region would show inconsis...

2012-09-02T02:42:39.117-06:00

1. Reads outside locked region would show inconsistent state before. I think the code that reads values outside "Transaction On!" is not guaranteed to NOT see B inconsistent with A. Too many nots? OK, only transactional code is required to see A and B consistent.

2. If I wanted to modify 2 out of 20 fields in a object, the lock-based solution would need to create a copy of the object with the 2 fields modified by a single thread, then "swap-in" the mutated object. The big win of transactional memory is that now we don't need to create new objects to represent the whole state of the system anymore, we can mutate the state in place.

I might be overlooking something obvious, but it s...

2012-03-19T16:12:21.379-06:00

I might be overlooking something obvious, but it seems to me that you could solve this problem with another state in your coherence protocol. There's no need for a pipeline flush -- anything already brought in from DRAM is still good to compute with, until it points to a memory location not yet brought into the processor, at which point it sees an invalidation and rolls out to the abort block. It seems that what you *can't* have is actual instruction retirement from a transactional region when another has acquired the xaction? Regardless, the cache coherence protocol -- as you note at the beginning -- ought provide most of the mechanics for this.

No sure why you assume that cache invalidates to t...

2012-02-23T10:48:35.876-07:00

No sure why you assume that cache invalidates to take ownership have to happen at "commit" as opposed to when the write happens. That might cause excessive serialization at release but is not neccessary.

BG/Q was designed to have only one chip per node, ...

2012-02-18T18:41:37.140-07:00

BG/Q was designed to have only one chip per node, so I don't see how it's a problem that TM only works in this context. Similarly, the question of cache coherence between multiple chips in a node is irrelevant because this configuration will never exist.

nice post, spectacular blog

2012-02-15T06:46:59.689-07:00

nice post, spectacular blog

BG/Q - I've not read any architecture document...

2012-02-14T20:45:32.649-07:00

BG/Q - I've not read any architecture documents on that, but my understanding is that it does, except that in BG/Q it's limited to working within a single chip, not across multiple chips in a node. This makes software exploitation more difficult.

However, I'm not certain that BG/Q has cache coherence across the multiple chips in a node, so transactional memory across the chips wouldn't be as useful anyway.

Does this work the same way as the transactional m...

2012-02-14T20:11:58.566-07:00

Does this work the same way as the transactional memory on the IBM BG/Q processor ?