The Perils of Parallel: October 2008

Friday, October 31, 2008

Microsoft Azure Just Says NO to Multicore Apps in the Cloud

At their recent PDC’08, Microsoft unveiled their Azure Services Platform, Microsoft’s full-throttle venture into Cloud Computing. Apparently you shouldn’t bother with multithreading, since Azure doesn’t support multicore applications. It scales only “out,” using virtualization, as I said server code would generally do in IT Departments Should NOT Fear Multicore. I’ll give more details about that below; first, an aside about why this is important.

“Cloud Computing” is the most hyped buzzword in IT these days, all the way up to a multi-article special report in The Economist (recommended). So naturally its definition has been raped by the many people who apparently reason “Cloud computing is good. My stuff is good. Therefore my stuff is cloud computing.”

My definition, which I’m sure conflicts with many agendas: Cloud computing is hiring someone out on the web to host your computing, where “host your computing” can range across a wide spectrum: from providing raw iron (hardware provisioning), through providing building blocks of varying complexity, through providing standard commercial infrastructure, to providing the whole application. (See my cloud panel presentation at HPDC 2008.)

Clouds are good because they’re easy, cheap, fast, and can easily scale up and down. They’re easy because you don’t have to purchase anything to get started; just upload your stuff and go. They’re fast because you don’t have to go through your own procurement cycle, get the hardware, hire a sysadmin, upgrade your HVAC and power, etc. They’re cheap because you don’t have to shell out up front for hardware and licenses before getting going; you pay for what you use, when you use it. Finally, they scale because they’re on some monster compute center somewhere (think Google, Amazon, Microsoft – all cloud providers with acres of systems, and IBM’s getting in there too) that can populate servers and remove them very quickly – it’s their job, they’re good at it (or should be) – so if your app takes off and suddenly has huge requirements, you’re golden; and if your app tanks, all those servers can be given back. (This implicitly assumes “scale out,” not multicore, but that’s what everybody means by scale, anyway.)

It is possible, if you’re into such things, to have an interminable discussion with Grid Computing people about whether a cloud is a grid, grid includes cloud, a cloud is a grid with a simpler user interface, and so on. Foo. Similar discussions can revolve around terms like utility computing, SaaS (Software as a Service), PaaS (Platform as …), IaaS (Infrastructure …) and so on. Double foo – but with a nod to the antiquity of “utility computing.” Late 60s. Project MAC. Triassic computing.

Microsoft Azure Services slots directly into the spectrum of my definition at the “provide standard commercial infrastructure” point: Write your code using Microsoft .NET, Windows Live, and similar services; upload it to a Microsoft data center; and off you go. Its presentation is replete with assurances that people used to Microsoft’s development environment (.NET and related) can write the same kind of things for the Microsoft cloud. Code doesn’t port without change, since it will have to use different services – Azure’s storage services in particular look new, although SQL Services are there – but it’s the same kind of development process and code structure many people know and are comfortable with.

That sweeps up a tremendous number of potential cloud developers, and so in my estimation bodes very well for Microsoft doing a great hosting business over time. Microsoft definitely got out in front of the curve on this one. This assumes, of course, that the implementation works well enough. It’s all slideware right now, but a beta-ish Community Technology Preview platform is supposed to be available this fall (2008).

For more details, see Microsoft’s web site discussion and a rather well-written white paper.

So this is important, and big, and is likely to be widely used. Let’s get back to the multicore scaling issues.

That issue leaps out of the white paper with an illustration on page 13 (Figure 6) and a paragraph following. That wasn’t the intent of what was written, which was actually intended to show why you don’t have to manage or build your own Windows systems. But it suffices. Here’s the figure:

[Figure explanation: IIS is Microsoft’s web server (Internet Information Services) that receives web HTTP requests. The Web Role Instance is user code that initially processes that, and passes it off to the Worker Role Instance through queues via the Agents. This is all apparently standard .NET stuff (“apparently” because I can’t claim to be a .NET expert). So the two sets of VM boxes roughly correspond to the web tier (1^st tier), with IIS instead of Apache, and application tier (2^nd tier) in non-Microsoft lingo.]

Here’s the paragraph:

While this might change over time, Windows Azure’s initial release maintains a one-to-one relationship between a VM [virtual machine] and a physical processor core. Because of this, the performance of each application can be guaranteed—each Web role instance and Worker role instance has its own dedicated processor core. To increase an application’s performance, its owner can increase the number of running instances specified in the application’s configuration file. The Windows Azure fabric will then spin up new VMs, assign them to cores, and start running more instances of this application. The fabric also detects when a Web role or Worker role instance has failed, then starts a new one.

The scaling point is this: There’s a one-to-one relationship between a physical processor core and each of these VMs, therefore each role instance you write runs on one core. Period. “To increase an application’s performance, its owner can increase the number of running instances” each of which is a separate single-core virtual computer. This is classic scale out. It simply does not use multiple cores on any given piece of code.

There are weasel words up front about things possibly changing, but given the statement about how you increase performance, it’s clear that this refers to initially not sharing a single core among multiple VMs. That would be significantly cheaper, since most apps don’t use anywhere near 100% of even a single core’s performance; it’s more like 12%. Azure doesn’t share cores, at least initially, because they want to ensure performance isolation.

That’s very reasonable; performance isolation is a big reason people use separate servers (there are 5 or so other reasons). In a cloud megacenter, you don’t want your “instances” to be affected by another company’s stuff, possibly your competitor, suddenly pegging the meter. Sharing a core means relying on scheduler code to ensure that isolation, and, well, my experience of Windows systems doing that is somewhat spotty. The biggest benefit I’ve gotten out of dual core on my laptop is that when some application goes nuts and sucks up the CPU, I can still mouse around and kill it because the second core is not being used.

Why do this, when multicore is a known fact of life? I have a couple of speculations:

First, application developers in general shouldn’t have anything to do with parallelism, since it’s difficult, error-prone, and increases cost; developers who can do it don’t come cheap. That’s a lesson from multiple decades of commercial code development. Application developers haven’t dealt with parallelism since the early 1970s, with SMPs, where they wrote single-thread applications that ran under transaction monitors, instantiated just like Azure is planning (but not on VMs).

Second, it’s not just the applications that would have to be robustly multithreaded; there’s also the entire .NET and Azure services framework. That’s got to be multiple millions of lines of code. Making it all really rock for multicore – not just work right, but work fast – would be insanely expensive, get in the way of adding function that can be more easily charged for, and is likely unnecessary given the application developer point above.

Whatever the ultimate reasons, what this all means is that one of the largest providers of what will surely be one of the most used future programming platforms has just said NO! to multicore.

Bootnote: I’ve seen people on discussion lists pick up on the fact that “Azure” is the color of a cloudless sky. Cloudless. Hmmm. I’m more intrigued by its being a shade of blue: Awesome Azure replacing Big Blue?

Monday, October 20, 2008

XKCD on Twitter

Here.

Posted because anybody who enjoyed the type of humor in In Search of Clusters is sure to like that one. Enjoy.

Getting Back to Comments

It’s more than time to respond to some of the comments that have been posted. I’ve copied the shorter ones here, and summarized the longer ones. If your comment isn’t here, it’s because no response seemed needed, or I already responded with a comment of my own.

Steve, on IT Departments Should NOT Fear Multicore: I think you're spot on concerning why IT shouldn't fear multicore. Where they fear it, it's because they've had innovation beaten out of them.

I agree, Steve. Another possibility is that they are so very conservative they fear changing anything – which may be saying the same thing. It’s possible Claunch was trying to wake up the change-fearing crowd, but why he didn’t virtualize (or why Dignan didn’t repeat it) I just don’t understand, except in the most cynical possible way (“Oooh, it’s scary! Hire us to tell you what to do!”).

Several folks commented on Vive la (Killer App) Revolution!

Stu: isn't the killer app 'information retrieval' ? i.e. google search, or commoditized parallel data warehousing and mining?

Stu, sure those use multicore, but they’re server apps; see IT Departments Should NOT Fear Multicore for more about how server apps soak up multicore quite well. That article was looking for, and arguing that we need, client apps.

Steve: Embedded systems could soak up a lot of low power multiprocessor chips. At least a couple of start-ups are working on them. Intellasys has a very low power 40 core chip and XMOS has a 4 core chip with 8 hardware threads per core.

They surely can soak up some, Steve, but I don’t think enough. The high end, in things like MRI scanners, has really miniscule volumes. The low end, in things like dishwasher controllers, has enormous volume but only requires trivial capabilities; it doesn’t take much brains to run a dishwasher. The only high-volume fairly high-price case is PC clients, where new CPUs go for $100s – and possibly game systems. I don’t know the relative volumes of PCs and game systems. I suspect the game systems are pretty big, but not in the PC range, since they’re not on every desk in every business.

Philip Machanick: What is really needed is a killer app that addresses some real mass-market need. The history of computing tells us that technology packaged for low cost eventually overtakes technology packaged for speed, first on price:performance and eventually on performance. More at my blog.

Philip, I share much of your astonishment and unbelieving about multicore, but I think what you’re referring to with “low cost eventually overtakes” is the Disruptive Technology process of Clayton Christensen’s "The Innovator's Dilemma." I don’t think that’s what’s happening here, or can happen here. I think multicore is more like the process in which cars eventually just got fast enough for everybody, so to keep selling them manufacturers invented tail fins. I’m not sure that has a name other than “marketing.”

TerryMcIntyre: [My summary of his long comment: He thinks killer apps include crosses between games and tutors, managing your photo collection including face recognition and recognition-driven auto-tagging, optimizing use of applications, and smart implementations of Go (and the like).]

Those are all possibilities, Terry. One requirement I’d put on things, however, is that somebody know how to do it, meaning a practical algorithm is known. I think that rules out things like auto-tagging everything looking like the Grand Canyon from some examples. I’m not sure about Go.

Robert on Clarifying the Black Hole: Greg, With this sentence: "Unfortunately, servers alone don't produce the semiconductor volumes that keep an important fraction of this industry moving forward – a fraction, not all of the industry, but certainly the part that's arguably key, to say nothing of the loudest, and will make the biggest ruckus as it increasingly runs into trouble." you tap dance around this "important fraction of the industry" without actually saying who they are. Who, precisely, are these people who can't easily parallelize, and really need continuing performance gains, and constitute a significant fraction of the computing market? Before we can find new solutions for them we need to know who they are and what they are trying to accomplish.

Ok, Robert, a spade a spade: I believe that the technological trend of multicore hurts the traditional business models of Intel and Microsoft, and their derivative PC hardware and software vendors: HP, Dell, and company. That’s not to say they’ll have financial problems; finances are affected by a lot more than technology.

John on 101 Parallel Languages (Part 3, the last): [My summary of his long comment: My conclusion shouldn’t be so negative; we need to find new modes of expression expressing parallelism.]

John, all I can say is that’s not the evidence I see. Some language extensions can help; but I’m thinking here of relatively low-level support like, say, truly low-weight thread spawning in Java. But I think the attempts to provide parallelism in ordinary, programming-for-the-rest-of-us things are fundamentally misguided.

dvadell on Background: Me and This Blog: I'm very happy I found your blog. I can finally say: Thank you for "In search of clusters"! I couldn't get it here in Argentina so a friend of mine sent it to me from USA. I enjoyed (and laughed) a lot.

Well, thanks, dvadell. I’m glad you found it enjoyable. I hope I manage to get another out there for you in the foreseeable future.

Thursday, October 16, 2008

IT Departments Should NOT Fear Multicore

"Gartner analyst Carl Claunch ... argued Thursday that IT types should fear–yes fear–multicore technology."

This statement appears in "Does IT really need to fear multicore?" on ZDNet, to which I was alerted by Surendra and Sujana Byna's excellent Multicoreinfo.com. So this is a fourth-level indirection to the source: Here to Multicoreinfo.com to the summary by Larry Dignan in ZDNet of a talk by Gartner analyst Carl Claunch at Gartner's Symposium/ITxpo (whew).

I'm bringing up this family tree because I don't have access to the original, only to the ZDNet summary; and, as the title of this blog entry indicates, I don't agree. Unfortunately, I don't know for sure whether I'm disagreeing with Claunch, of Gartner; or disagreeing with Digman's (ZDNet) summary rendition of what Claunch said. So Carl, if what I say interprets what you said wrongly, I apologize.

Let's start off quoting Claunch of Gartner (quote lifted from ZDNet):

"We have had the easy fix for decades to deal with existing applications whose requirements have outgrown their current system — we refreshed the technology and the application ran better. This led to relatively long life cycles for applications, allowed operations to address application performance problems easily and required little advance notice of capacity shortfalls."

This is completely true.

In contrast, with multicore, the simple process above becomes more complex. Of the five bullet points detailing that added complexity, three make sense to me:

IT departments will need new coding skills
IT departments will need new development tools;
what you get out of it, the achieved performance increase, will vary, a lot, by application.

(I'll talk at the end about the ones that I consider flat-out wrong.)

I think this is entirely too alarmist.

Well. Given my past posts, you may be excused for thinking of bristly 300-lb hogs daintily hovering before tiny flowers, sipping nectar like hummingbirds; and of course the sky has fallen (thud!) (didn't even bounce). But I've not said multicore is a problem for server systems.

The issues brought up above are certainly problems if you have a single-thread production program and you need it to go faster. If that's your case, I'd be even more vehement about how you are in deep trouble, how much those skills are going to cost, etc.; see prior posts in this blog. Given the whole spectrum of business processing, I've no doubt such situations exist.

However: The vast majority of server workloads already scale out. They've been developed or recoded over the last decade or so to use multiple whole computers (clusters, farms) to provide greater performance. (Scale out is opposed to scale up, meaning using more CPUs in a single multicore system.) The price-performance of commodity rack systems is just too staggeringly better than the alternatives for anybody to do otherwise.

That server workloads scale out is not speculation. I and others have trudged through all the categories and sub-categories of International Data Corp's (IDC's) Server Workload taxonomy (see this report for a list of those categories, at the end, in Table 11), asking application experts whether their case scaled out: CRM? Check, scales out. Email? Check; got the diagrams. Data mining? Double-triple check, scales way out. And so on, through a dismaying long list.

(Those are sample applications in IDC's categories "Business Processing," "Collaboration,"and "Business Intelligence," respectively.)

In fact, there is so much scale out of commercial workloads that the issue has confused many in the cloud computing crowd completely: Are you talking about computing out in the Internet cloud somewhere, or are you talking about computing on a cloud of systems (cluster)? Usual implicit answer: Both. They assume scale out so deeply that it's built into some of the names of cloud facilities, like Amazon's "Elastic Computing Cloud"; the claimed elasticity arises only if you scale out. But hey, everything scales out. That's the assumption. In fact, in most software discussions, "scales" is used to mean "scales out."

Now, a few of the server workloads in IDC's taxonomy don't scale out without issues. For example:

OLTP -- short transaction processing, like ATM transactions -- is problematical, despite the best efforts of Oracle, IBM, and others (IBM comes closer) (or maybe Tandem).
Batch depends a whole lot on what you're batching; if what you're batching is totally single-thread, it can be a big problem.

Those two, however, amount to less than 10% of server revenue, and... hey, they hark back to old-fashioned mainframe days. Mainframes have been SMP (multicore) since the 1970s, so at least some adaptation has taken place; OLTP in particular scales like a banshee on multicore; all the top record-holders of the OLTP benchmark, TPC-C, are SMPs (= multiprocessors) (= multicores). Batch is not always terrible either, but that's a longer story.

So IT shops already scale out, a whole lot, and if they don't they probably scale up and directly use multicore already. Where does that lead us?

Well, if you already scale out, have I got a deal for you: Virtualization.

Just partition that 16-core monster into 16 single-core pussycats, each running as a separate, independent, system.

This isn't completely painless. You will need a large amount of memory per system, but probably not much more than would scale up due to the claimed increased performance of the multicore CPU set. As for IO, well, that will hopefully be relieved when PCIe 3.0 starts deploying. It can be a problem now, on some systems. And virtualization adds yet more pain to systems management, which we definitely do not need; this is probably the worst aspect of the virtualization deal. But it totally beats parallelizing all your code, hands down and match over, which is what the article says Claunch says you have to do. I don't think so.

I suppose virtualization isn't the best long-term solution, simply because it seems it would be much cleaner to use a multicore system as one parallel thing. Partitioning it up seems artificial and inelegant. However, it least buys some time for a beleagured IT manager, and, well, interim solutions that work always last longer than one would anticipate.

Now, about the other bullet points appearing in the ZDNet article:

There are multiple core designs that will affect your software.

This differs from the existing situation how? The next generation of hardware has always been had a design different from the previous one, requiring at least recompilation to tap the claimed performance. Moreover, with Intel apparently (we don't know for sure, but I'd bet on it) ditching their interconnect busses for point-to-point connections like AMD, multicore designs are converging, not diverging.

Your existing software licensing deals become more complicated.

Just to pick the key example they used: Oracle has been licensing its products on multicore/multiprocessor systems since at least the mid-1980s. See comments above about breaking records. It's not the same as licensing on multiple separate systems, true, and if that's all you've ever dealt with in your IT career, you probably have some education required. But that doesn't mean it's horrid. IT shops have dealt with it for decades.

Really, though, licensing points to a problem with my solution: Licensing across multiple virtual machines has been a quagmire of the first order. It's inching towards something more rational, but there are still problems. Fortunately, the particular case of Oracle (and DB2, and other databases) is irrelevant for the multicore case, since they scale up to many processors and have for a long time; so you don't need to play the virtualization game.

Overall, multicore is actually a pretty good fit for server workloads. That's one aspect of multicore that won't need the alarm bells ringing. If servers alone had the volumes to keep this train rolling, many of us would sleep better at night.

(By the way, apologies to all for neglecting this blog for a while. I've got a lot more to post; I just got tangled up for a while. Comments will be answered.)

((By the way)^2: If somebody knows of an equivalent of the IDC workload taxonomy but for clients, I'd be much obliged if they could give me a pointer to it.)