The Perils of Parallel: Impressions of a Newbie at Intel Developer Forum (IDF)

Out of the blue (which in this case is a pun), I received an invitation from an Intel representative to attend the 2011 Intel Developer Forum (IDF), in San Francisco, at Intel’s expense. Yes, I accepted. Thank you, Intel in general; and thank you in particular to the very nice lady who invited me and shepherded me through the process.

[There are some updates below, marked in this color.]

I’d never attended an IDF before, so I thought I’d spend an initial post on my overall impressions, describing the things that stood out to this wide-eyed IDF newbie. It may be boring to long-time IDF attendees – and there are very long-timers; a friend of mine has been to every domestic IDF for the last 12 years. But what the heck, they impressed me.

I do have some technical gee-whiz later in this post, but I’ll primarily go into more technical detail in subsequent posts. Those will including recountings of the three private interviews that were arranged for me with Intel HPC and MIC (Many-Integrated Core) executives (John Hengeveld, James Reinders, and Joe Curley), as well as other things I picked up along the way, primarily about MIC.

Here are my summary impressions: (1) Big. Very Big. (2) Incredibly slick and polished. (3) A fine attempt at Borgilation.

IDF is gigantic. It doesn’t surpass the mother of all trade shows, the Consumer Electronics show, but I wouldn’t be surprised to find that it is the largest single-company trade show. The Moscone Center West, filled by IDF on all three floors, is almost 300,000 sq. ft. Justin Rattner (Intel Fellow & CTO) said in his keynote that there were over 5,000 attendees, and that hauling in the gear and exhibits required 500 semis. I believe it.

There was of course the usual massive collection of trade-show booths covering one huge exhibit area (see photo of the center aisle of the exhibit area, below). That alone filled 100,000 sq. ft of exhibit space, completely.

In addition, all the large open areas each had their large well-manned pavilion dedicated to one thing or another: One had a bevy of ultrabooks (ultrabook = Intel’s push for a viable non-Apple MacBook Air) that you could play with. Another was an “Extreme Zone” with a battery of four high-end gaming systems (mostly playing what looked like Wolfenstein-y game). Another was a multi-player racing game with several drivers’ seats with steering wheels, etc. Another demoed twenty or thirty so different sizes and shapes of laptops (in addition to the displays in the exhibit area). Another was a contraption of pipes and random stuff spitting plastic balls onto pseudo-xylophones, cymbals, and so on, physically mimicking the famous YouTube video of several years back, demonstrating industrial controllers run by Atom processors. It didn’t actually play the music, but the video’s a pure animation so it’s one up on that. [Intel has a press release on this which seems to indicate that it actually played the music. Didn't seem like it to me, but might be.]

Everywhere could be found fanatic attention to detail and production values, extending down to even small details.

The keynotes were marvels of production; I’ve been to many IBM affairs, and nothing I saw over the years compared with these in slick, polished execution. Movies were theatre-quality cinematic productions (despite typical marketing fluff plots with occasional cheesy humor), and every one queued in at exactly the right instant, no hiccups. Every on-stage demo went right on the money, and even when one crashed – a momentary screen showing a windows driver crash – another was seamlessly switched in what seemed less than 2 seconds; I strongly suspect a hot backup, since no way does Windows recover that fast.

But smaller things had their share of attention, too. The technical sessions I attended all had fluent, personable speakers; meticulously designed slides; and perfect audio with nary a glitch in microphone use or (&deity. forbid) feedback. Even the backpacks handed out were high quality and custom-made. Simple customization is no big deal, but these came with Intel logos on the zipper pulls and a custom lining emblazoned with their chip-layout banner theme (see photos).

Speaking of that banner theme, it blared out at you over the entrance to each hall, on a photo at least 20 ft. high and 100 feet long (photo again), a huge illustration: You are a dull, chalky, dead, white – until Intel’s silicon brings you to vibrant, colored life. Not exactly subtle symbolism, but that’s marketing.

And speaking of marketing, the unmistakable overall message was: We will dominate everything. Everything with a processor in it, that is. Servers, with volumes ever-increasing at huge rates? Check. High-end 10+ core major stompers? Check. Midrange? Check. Low end? Super checkety-check-check-check. Ultrabook (future) with 14-day standby. (Standby? Do we really care?) Even a cell phone, demoed, run by an Intel processor. It’s the little black rectangle at the center-right of this pic:

(I couldn’t get a better picture, since after every keynote there was a “photo opportunity” that produced a paparazzi-dense melee/feeding frenzy on the stage. This is, I'm told, and IDF tradition. I’m not sufficiently a press-banger to elbow my way through that wall of bodies.)

The low-power demo that impressed me, though, was of a two-watt processor in a system showing a squee-worthy kitty video (and something else, but who noticed?), powered by a small solar panel. This was a demo of the future potential of near-threshold voltage operation, also touted (not, I’m sure, by accident) (not at all) in the Intel Fellows’ panel the day before. They used an old Pentium to do it, undoubtedly for reasons I’m not enough of a circuit jocky to understand. There was even what appeared – horrors! – to be an on-stage ad lib (!!) about “dumpster diving” for it. (Hey, eBay! Did they just call you a dumpster? The perils of ad libbing.) Some blatant futurism followed this, talking about 100 GF in that same 2W envelope; no hint when, fantastic if it ever happens.

There are chinks in the armor, though. You have to look seriously to find them, or have some comparisons on your side.

A friend happened to note to me, for example, that this IDF was three keynotes short of the usual full house of six. There was Otellini’s (CEO) general keynote, and Mooley Eden’s laptop ultrabook keynote, and Justin Rattner’s “futures” presentation in which he laughs too much for my taste. Those are regulars at every IDF. However, there was no keynote specifically devoted to Servers; understandable, I suppose, because they’re between big releases and have nothing major to announce (but they said a whole lot about the next-gen Ivy Bridge and the future server market in a media-only briefing). There was also no keynote for Digital Home; they are wrapped up with Sony [and other partners] on that one, and likely it hasn’t any splashes to make at this time (or else everybody’s figured out that connecting your TV to the Internet isn’t yet a world-shaking idea). And… dang, there was a third one historically, but I’ve lost it. Sorry. [The third missing keynote was on softtware and services, traditionally performed by Renee James.] Takeaway: Ambitions seem a bit shrunken, but it may just be circumstances.

A big deal was made in a media briefing about how they were going to improve Intel's Atom SoCs (Systems-On-a-Chip) at double Moore’s Law. (I think you’re supposed to gasp now.) That sounds sexy, but I interpret it as meaning they figured out that Atom really needs to be done in their latest and greatest silicon technology, as opposed to lagging a couple of generations (nodes) back the way it now does particularly now that their highest-end technologies are focused on low power.

So they’re going to catch up. Everybody, including Atom, will be using use the same 14nm technology in 2014. (That’s an estimated, forward-looking 2014, see their prospectus for caveats, etc.) Until then, well, there are iterations. I take “double Moore’s Law” to mean that they can’t steer the massive ship of microprocessor development fast enough to catch up in a single release; and/or (likely) their existing Atom customer base can’t wait without any new Atom products for as long as a single leap would take.

Will this put a dent in ARM's dominance of the low-power arena? Or MIPS's share? Maybe, in time.

Then there was that graph, also in a media briefing, of future server shipments. (Wish I had a pic; can’t find the pdf on the Intel web site.) They extended it to show some trebling or quadrupling of server shipments in the next few years, but…

Maybe they have some data I don’t have. To me, the actual past data on the graph seemed to me to say that curve of shipment volumes recently started flattening out. Extrapolating based on the slope that existed a couple of quarters or years in the past doesn’t seem justified by what I saw purely based on that graph.

Hey, did I mention that I wuz a medium? I got in with media credentials, which was another personal first. (Thanks again!) Talk about being a newbie – I didn’t even know there was a special “media corridor” until half-way through the first day. Dang. I could have had a much better breakfast on the first day.

Now I have this itch to buy a fedora so I can put a press pass into the hatband.

More will come, but I’ve got a trip to Mesa Verde for the next few days, so it won’t be immediate. Sorry. The wait won’t be anywhere near as long as it has been between other recent posts.

6 comments:

Anonymous said...: Out of topic a bit but I m wondering if you have an opinion on the following, most OOO processors atleast the Intel/AMDs seem to have strong memory models whereas the new ones ARM/Tile/NVIDIA SMs have weak memory models. Is there a relation between memory models and OOO?; September 18, 2011 at 4:58 PM
Greg Pfister said...: Others feel free to chime in -- my head tends to hurt when I think about memory models too much -- but my take is that OOO and memory model strength are pretty orthogonal.

Architectures with strong memory models have just been around a long time (X86, IBM 360 / zSeries). They generally date back to when memories were faster relative to processors, so strong memory ordering didn't hurt much and it made assembler programming simpler. Now nobody can get rid of them or they'll break legacy code.; September 18, 2011 at 5:55 PM
Anonymous said...: [anon_from_above]
Heres why I ask and this is pure speculation/imagination on my part.

[a] Load reordering: Allowing loads to jump past each other is a means of bumping up ILP. Even w/o a full blown OOO, non blocking cache + hit under miss should allow one to hide load miss latency by allowing a few non dependent loads to be issued. Assuming these new ones arrive earlier or hit on the cache to the programmer it would look like loads just got reordered.

[b] Store reordering: Likewise helps avoid stalls from stores.

An OOO processor can hide the fact that it is reordering loads/stores by doing commits in order. So to the programmer it looks like memory is strongly ordered but behind the scenes its doing memory reordering.

A processor like ARM/Tile/SM that doesnt a full blown OOO logic, has to either suck up the memory cost or try a threaded processor or expose the memory reordering to the programmer and ask them to place fences where its unacceptable.; September 18, 2011 at 8:03 PM
Anonymous said...: Weak memory models are becoming more manageable due to better support in programming languages--in particular in the new C++11 memory model. Unless there is a breakthrough in sequentially consistent, or even just total-store-order, architectures (it doesn't seem likely), weak memory models will keep gaining bigger and bigger performance advantage. For better or worse, though, Intel is carrying a big hump of backward compatibility. There was a precedent though-- moving from 8088 to x86, with its compatibility mode. So TSO could be preserved for compatibility, but new processor would offer a weaker model alongside it. I wonder if this is a viable option.; September 25, 2011 at 3:05 PM
Anonymous said...: I agree Programming languages with explicit memory models are becoming more tolerant of Weak Consistency. CUDA always was WC, Linux kernel code can tolerate WC, Java can too based your statement C++11 is moving in that direction too.

That said I disagree providing WC costs perf to for the processor. Power perhaps Silicon space absolutely. It has to do with whether you have an inorder retirement unit or not. Likewise for the imprecise interrupt problem, before invoking SW trap handler should I cause a serializing event.

Some comparisons you may find useful:
http://www.hotchips.org/uploads/archive22/HC22.24.719-1-Curran-IBMz196.pdf
Slide 7 (z196) In Order Completion
< I doubt they ll require weak consistency with a few clever tweaks you can even fake SC w/o
compromising performance or causing stalls. >

ARM Cortex A9
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0388g/CJHJCHII.html#Cacbcebh
< Missing In-Order Completion Unit .. possibly explains why they demand weak consistency >

Intel Nehalem
http://alphamike.tamu.edu/web_home/papers/perf_nehalem.pdf
Pg 10. RU (Retirement Unit is in-order); October 29, 2011 at 3:08 PM
Skyline NJ said...: Nice recap. The Intel Developer Forum is a really great annual trade show. There is so much to learn.; April 15, 2014 at 12:30 PM

The Perils of Parallel

Sunday, September 18, 2011

Impressions of a Newbie at Intel Developer Forum (IDF)

6 comments:

Post a Comment