The Perils of Parallel: WNPoTs and the Conservatism of Hardware Development

Monday, June 14, 2010

WNPoTs and the Conservatism of Hardware Development

There are some things about which I am undoubtedly considered a crusty old fogey, the abominable NO man, an ostrich with its head in the sand, and so on. Oh frabjous day! I now have a word for such things, courtesy of Charlie Stross, who wrote:

Just contemplate, for a moment, how you'd react to some guy from the IT sector walking into your place of work to evangelize a wonderful new piece of technology that will revolutionize your job, once everybody in the general population shells out £500 for a copy and you do a lot of hard work to teach them how to use it, And, on closer interrogation, you discover that he doesn't actually know what you do for a living; he's just certain that his WNPoT is going to revolutionize it. Now imagine that this happens (different IT marketing guy, different WNPoT, same pack drill) approximately once every two months for a five year period. You'd learn to tune him out, wouldn't you?

I've been through that pack drill more times than I can recall, and yes, I tune them out. The WNPoTs in my case were all about technology for computing itself, of course. Here are a few examples; they are sure to step on number of toes:

Any new programming language existing only for parallel processing, or any reason other than making programming itself simpler and more productive (see my post 101 parallel languages)
Multi-node single system image (see my post Multi-Multicore Single System Image)
Memristors, a new circuit type. A key point here is that exactly one company (HP) is working on it. Good technologies instantly crystallize consortia around themselves. Also, HP isn't a silicon technology company in the first place.
Quantum computing. Primarily good for just one thing: Cracking codes.
Brain simulation and strong artificial intelligence (really "thinking," whatever that means). Current efforts were beautifully characterized by John Horgan, in a SciAm guest blog: 'Current brain simulations resemble the "planes" and "radios" that Melanesian cargo-cult tribes built out of palm fronds, coral and coconut shells after being occupied by Japanese and American troops during World War II.'

Of course, for the most part those aren't new. They get re-invented regularly, though, and drooled over by ahistorical evalgelists who don't seem to understand that if something has already failed, you need to lay out what has changed sufficiently that it won't just fail again.

The particular issue of retred ideas aside, genuinely new and different things have to face up to what Charlie Stross describes above, in particular the part about not understanding what you do for a living. That point, for processor and system design, is a lot more important than one might expect, due to a seldom-publicized social fact: Processor and system design organizations are incredibly, insanely, conservative. They have good reason to be. Consider:

Those guys are building some of the most, if not the most, intricately complex structures ever created in the history of mankind. Furthermore, they can't be fixed in the field with an endless stream of patches. They have to just plain work – not exactly in the first run, although that is always sought, but in the second or, at most, third; beyond that money runs out.

The result they produce must also please, not just a well-defined demographic, but a multitude of masters from manufacturing to a wide range of industries and geographies. And of course it has to be cost- and performance-competitive when released, which entails a lot of head-scratching and deep breathing when the multi-year process begins.

Furthermore, each new design does it all over again. I'm talking about the "tock" phase for Intel; there's much less development work in the "tick" process shrink phase. Development organizations that aren't Intel don't get that breather. You don't "re-use" much silicon. (I don't think you ever re-use much code, either, with a few major exceptions; but that's a different issue.)

This is a very high stress operation. A huge investment can blow up if one of thousands of factors is messed up.

What they really do to accomplish all this is far from completely documented. I doubt it's even consciously fully understood. (What gets written down by someone paid from overhead to satisfy an ISO requirement is, of course, irrelevant.)

In this situation, is it any wonder the organizations are almost insanely conservative? Their members cannot even conceive of something except as a delta from both the current product and the current process used to create it, because that's what worked. And it worked within the budget. And they have their total intellectual capital invested in it. Anything not presented as a delta of both the current product and process is rejected out of hand. The process and product are intertwined in this; what was done (product) was, with no exceptions, what you were able to do in the context (process).

An implication is that they do not trust anyone who lacks the scars on their backs from having lived that long, high-stress process. You can't learn it from a book; if you haven't done it, you don't understand it. The introduction of anything new by anyone without the tribal scars is simply impossible. This is so true that I know of situations where taking a new approach to processor design required forming a new, separate organization. It began with a high-level corporate Act of God that created a new high-profile organization from scratch, dedicated to the new direction, staffed with a mix of outside talent and a few carefully-selected high-talent open-minded people pirated from the original organization. Then, very gradually, more talent from the old organization was siphoned off and blended into the new one until there was no old organization left other than a maintenance crew. The new organization had its own process, along with its own product.

This is why I regard most WNPoT announcements from a company's "research" arm as essentially meaningless. Whatever it is, it won't get into products without an "Act of God" like that described above. WNPoTs from academia or other outside research? Fuggedaboudit. Anything from outside is rejected unless it was originally nurtured by someone with deep, respected tribal scars, sufficiently so that that person thinks they completely own it. Otherwise it doesn't stand a chance.

Now I have a term to sum up all of this: WNPoT. Thanks, Charlie.

Oh, by the way, if you want a good reason why the Moore's Law half-death that flattened clock speeds produced multi- / many-core as a response, look no further. They could only do more of what they already knew how to do. It also ties into how the very different computing designs that are the other reaction to flat clocks came not from CPU vendors but outsiders – GPU vendors (and other accelerator vendors; see my post Why Accelerators Now?). They, of course, were also doing more of what they knew how to do, with a bit of Sutherland's Wheel of Reincarnation and DARPA funding thrown in for Nvidia. None of this is a criticism, just an observation.

5 comments:

Unknown said...: Great blog Greg. One of the reasons for multicore ubiquity is software. It's mind boggling to program a functionally decomposed system with specific cores having differing functionalities. It was viewed as much more do-able to just kill problems in software with more general purpose identical cores. More cost effective too.; June 15, 2010 at 11:22 AM
Greg Pfister said...: Thanks, Dave.

I agree with your comment about functionally-different cores, but "kill problems in software with more... cores" -- I don't think so, at least for clients. Major rework is needed to get any single program to make use of more than one core.

Commercial servers, with many separate transactions running, yes; piece of cake. (Well, roughly. Nothing's that easy.)

(I've a blog post on that, titled roughly "IT Department Should Not Fear Multicore".); June 15, 2010 at 1:20 PM
Anonymous said...: I don't buy that "[i]t's mind boggling to program a functionally decomposed system with specific cores having differing functionalities."

It's hard, yes, and something relatively new (to the commercial programming world, anyway) that we've not yet gotten very used to and is still not widely used. But games programmers have been dealing with this sort of thing to some degree for a decade or so, what with separate GPUs and the progression of programmable shaders. More recently Playstation 3 programmers have finally been making some quite good progress in utilising the Cell processor's SPUs for both graphical and non-graphical tasks. While the "the PS3 is harder to program than the XBox 360" meme is still widespread, in recent interviews the multi-platform game developers have more and more been saying that the two platforms are simply different: it's no longer terribly difficult to use the PS3s SPUs to make up for the PS3's graphics chip being less powerful than the XBox 360's.

To me, it seems that we're simply in a second great wave of software engineering (if you will forgive my use of that term) progress where a small group is developing more sophisticated programming techniques that will, in a couple of decades become common.

The first wave was of course the structured and OO programming revolution of the 60s and 70s. Their techniques really took until the '90s or so to start becoming unremarkable in the world at large.

Interestingly enough, both the first and second waves were and are driven by hardware changes, but in diametrically opposite ways.

The first came about because we finally were developing hardware powerful enough to allow easier ways of programming. However the old techniques continued to be able to take advantage of the more powerful hardware. In other words, that revolution was about being able to write and maintain programs more quickly and cheaply.

In this second one, our current programming techniques simply can't take real advantage of the better hardware, because it's too different from what came before. So this revolution is driven by wanting first to be able to effectively use the hardware at all, and second by then trying to bring the complexity of its use back down to something approaching what we had in the old, if I may use this term loosely, "single system" world.

(We won't go back to that world, of course, because the whole underlying model is different. But in the 90s we all started to understand about this sort of change, from NUMA--be it as simple as a cache for main memory or much more complex--to why RPC over a network is not the same as a function call.)

So, to come back to the quote to which I am responding: "[i]t's mind boggling to program a functionally decomposed system with specific cores having differing functionalities": it may seem so now. But imagine your typical commercial programmer of 1980 contemplating the use of, or even contemplating the idea of, Smalltalk or ML. That's more or less where we are now, except that rather than a small few having an Alto and everybody wanting one, everybody has one and nobody wants it because they can't figure out how to use it.; June 17, 2010 at 8:52 PM
Andrew Richards said...: Another great post, Greg, thanks. You give a really good inside explanation into how chips are designed. I think there is a problem with processor design, and you really highlight why that is happening. If it isn't 100% clear what you should change, and why it's safe to change, then why change something? And it isn't 100% clear what new processors should like like and why it would be safe to produce such a new processor. So, we are stuck with multiplying up components that worked in the past, using design processes that worked in the past.

As an engineer who now runs a small business, I have had to learn how to do sales. And it has been drummed into me how important it is to understand what someone's job is and what problems they have that they might consider buying a solution for. And one of the technologies we produce is number one on your list of things you won't buy!

Hi Dave, like Curt, I'm going to disagree with your comment that it's mind boggling to program a functionally decomposed system, but giving a different reason. If the cores (and the interconnects between them) match the application, then it's actually easier to program such a system than a system with lots of general-purpose cores. The reason is that if the original problem decomposes easily into a specific set of tasks, and if the cores match those individual tasks, and if the interconnects match the way the data is sent around the system, then the software and hardware designs match up nicely. My examples would be graphics (GPUs do graphics in a way which makes sense to graphics programmers), cell-phones (splitting the application processor from the modem and audio processors and DSPs separates tasks that need fixed performance and security into cores that have exactly the right OS and security for the task) and all the embedded processors out there that make a complex device have a nice simple interface (like a keyboard that has a processor in it to convert key presses into serial messages).

Whereas, lots of general purpose cores can be very difficult to program in many cases. Parallelism is hard to do, and achieving good performance on lots of general-purpose cores is much much harder and less predictable than achieving reasonable performance on 2 general-purpose cores.

If the cores don't match the task, then you have an interesting, but probably fruitless, task on your hands; June 19, 2010 at 9:16 AM
Greg Pfister said...: Hi, Curt & Andrew.

I had a long reply put together in answer to your great comments.

But it was long enough and the issue important enough that I think I'll make it a standalone post. Watch for it fairly soon. I hope we can continue the discussion there.; June 21, 2010 at 2:10 PM

Monday, June 14, 2010

WNPoTs and the Conservatism of Hardware Development

5 comments:

Post a Comment