The Perils of Parallel: Random Things of Interest at IDF 2011 (Intel Developer Forum)

I still have one IDF interview to transcribe (Joe Curley), but I’m sick of doing transcriptions. So here are a few other random things I observed at the 2011 Intel Developers Forum. It is nothing like comprehensive. It’s also not yet the promised MIC dump; that will still come.

Exhibit Hall

I found very few products I had a direct interest in, but then again I didn’t look very hard.

On the right, immediately as you enter, was a demo of a Xeon/MIC combination clocking 600-700 GFLOPS (quite assuredly single precision) doing LRU Factorization. Questions to the guys running the demo indicated: (1) They did part on the Xeon, and there may have been two of those, they weren’t sure (the diagram showed two). (2) They really learned how to say “We don’t comment on competitors” and “We don’t comment on unannounced products.”

A 6-legged robot controlled by Atom, controlled by a game controller. I included this here only because it looked funky and I took a picture (q. v.). Also, for some reason it was in constant slight motion, like it couldn’t sit still, ever.

There were three things that were interesting to me in the Intel Labs section:

One Tbit/sec memory stack: To understand why this is interesting, you need to know that the semiconductor manufacturing processes used to make DRAM and logic are quite different. Putting both on the same chip requires compromises in one or the other. The logic that must exist on DRAM chips isn’t quite as good as it could be, for example. In this project, they separated the two onto separate chips in a stack: Logic is on one, the bottom one, that interfaces with the outside world. On top of this are multiple pure memory chips, multiple layers of pure DRAM, no logic. They connect by solder bumps or something (I’m not sure), and there are many (thousands of) “through silicon vias” that go all the way through the memory chips to allow connecting a whole stack to the logic at the bottom with very high bandwidth. This whole idea eliminates the need to compromise on semiconductor processes, so the DRAM can be dense (and fast), and the logic can be fast (and low power). One result is that they can suck 1 Tbit/sec of data out of one of these stacks. This just feels right to me as a direction. Too bad they’re unlikely to use the new IBM/3M thermally conductive glue to suck heat out of the stack.

Stochastic Ray-Tracing: What it says: Ray-tracing, but allows light to be probabilistically scattered off surfaces, so, for example, shiny matte surfaces have realistically blurred reflections on them, and produce more realistic color effects on other surfaces to which they reflect. Shiny matte surfaces like the surface of the golden dome in the center of the Austrian crown, reflecting the jewels in the outer band, which was their demo image. I have a picture here, but it comes nowhere near doing this justice. The large, high dynamic range monitor they had, though – wow. Just wow. Spectacular. A guy was explaining this to me pointing to a normal monitor when I happened to glance up at the HDR one. I was like “shut up already, I just want to look at that.” To run it they used a cluster of four Xeon-based nodes, each apparently about 4U high, and that was not in real time; several seconds were required per update. But wow.

Real-Time Ray-Tracing: This has been done before; I saw it a demo on a Cell processor back in about 2006. This, however, was a much more complex scene than I’d previously viewed. It had the usual shiny classic car, but that was now in the courtyard of a much larger old palace-like building, with lots of columns and crenellations and the like. It ran on a MIC, of course – actually, several of them, all attached to the same Xeon system. Each had a complete copy of the scene data in its memory, which is unrealistic but does serve to make the problem “pleasantly parallel” (which is what I’m told is now the PC way to describe what used to be called “embarrassingly parallel”). However, the demo was still fun. Here's a video of it I found. It apparently was shot at a different event, but still the same technology demonstrated. The intro is in Swedish, or something, but it reverts to English at the demo. And yes, all the Intel Labs guys wore white lab coats. I teased them a bit on that.

Keynotes

Otellini (CEO): Intel is going hot and heavy into supporting the venerable Trusted Platform technology, a collection of technology which might well work, but upon which nobody has yet bitten. This security emphasis clearly fits with the purchase of MacAfee (everybody got a free MacAfee package at registration, good for 3 systems). “2011 may be the year the industry got serious about security!” I remain unconvinced.

Mooley Eden (General Manager, Mobile Platforms): OK. Right now, I have to say that this is the one time in the course of these IDF posts that I am going to bow to Intel’s having paid for me to attend IDF, bite my tongue rather than succumbing to my usual practice of biting the hand that feeds me, and limit my comments to:

Mooley Eden must be an acquired taste.

To learn more of my personal opinions on this subject, you are going to have to buy me a craft beer (dark & hoppy) in a very noisy bar. Since I don’t like noisy bars, and that’s an unusual combination, I consider this unlikely.

Technically… Ultrabooks, ultrabooks, “beyond thin and light.” More security. They had a lame Ninja-garbed guy on stage, trying to hack into trusted-platform-protected system, and of course failing. (Please see this.) There was also a picture of a castle with a moat, and a (deliberately) crude animation of knights trying to cross the moat and falling in. (I mention this only because it’s relevant to something below.)

People never use hibernate, because it takes too long to wake up. The solution is… to have the system wake up regularly. And run sync operations. Eh what? Is this supposed to cause your wakeup to take less time because the wakeup time is actually spent syncing? My own wakeup time is mostly wakeup. All I know is that suspend/resume used to be really fast, reliable, and smart. Then it got transplanted to Windows from BIOS and has been unsatisfactory - slow and dumb - ever since.

This was my first time seeing Windows 8. It looks like Mango phone interface. Is making phones & PCs look alike supposed to help in some way? (Like boost Windows Phone sales?) I’m quite a bit less than intrigued. It means I’m going to have to buy another laptop before Win 8 becomes the standard.

Justin Rattner (CTO): Some of his stuff I covered in my first post on IDF. One I didn’t cover was the massive deal made of CERN and the LHC (Large Hadron Collider) (“the largest machine human beings have ever created”) (everybody please now go “ooooohhh”) using MICs. Look, folks, the major high energy physics apps are embarrassingly parallel: You get a whole lot, like millions, billions, of particle collisions, gather each one’s data, and do an astounding amount of floating-point computing on each completely independent set of collision data. Separately. Hoping to find out that one is a Higgs boson or something. I saw people doing this in the late 1980s at Fermilab on a homebrew parallel system. They even had a good software framework for using it: Write your (serial) code for analyzing a collision your way, and hand it to us; we run it many times in parallel, just handing out each event’s data to an instance of your code. The only thing that would be interesting about this would be if for some reason they actually couldn’t run HEP codes very well indeed. But they can run them well. Which makes it a yawn for me. I’ve no question that the LHC is ungodly impressive, of course. I just wish it were in Texas and called something else.

Intel Fellows Panel

Some interesting questions asked and answered, many questions lame. Like: “Will high-end Xeons pass mainframes?” Silly question. Depends on what “pass” means. In the sense in which most people may mean, they already have, and it doesn’t matter. Here are some others:

Q: Besides MIC, what else is needed for Exascale? A: We’re having to go all the way down to device level. In particular, we’re looking at subthreshold or near-threshold logic. We tried that before, but failed. Devices turn out to be most efficient 20mv above threshold. May have to run at 800MHz. [Implication: A whole lot of parallelism.] Funny how they talked about near-threshold logic, and Justin Rattner just happened to have a demo of that at the next day’s keynote.

Q: Are you running out of rare metals? A: It’s a question of cost. Yes, we always try to move off expensive materials. Rare earths needed, but not much; we only use them in layers like five atoms thick.

Q: Is Moore’s Law going to end? A: This was answered by Kelin J. Kuhn, Fellow & Director of the Technology and Manufacturing Group – i.e., she really knows silicon. She noted that, by observation, at every given generation it always looks like Moore’s Law ends in two generations. But it never has. Every time we see a major impediment to the physics – several examples given, going back to the 1980s and the end of Dennard scaling – something seems to come along to avoid the problem. The exception seems to be right now: Unlike prior eras when it will end in two generations, there don't seem to be any clouds on this particular horizon at all. (While I personally know of no reason to dispute this, keep in mind that this is from Intel, whose whole existence seems tied to Moore's Law, and it's said by the woman who probably has the biggest responsibility to make it all come about.)

An aside concerning the question-taking woman with the microphone on my side of the hall: I apparently reminded her of something she hates. She kept going elsewhere, even after standing right beside me for several minutes while I had my hand raised. What I was going to ask was: This morning in the keynote we saw a castle with a moat, and several knights dropping into the moat. The last two days we also heard a lot about a knight which appears to take a ferry across the moat of PCIe. Why are you strangling a TFLOP of computation with PCIe? Other accelerator vendors don’t have a choice with their accelerators, but you guys own the whole architecture. Surely something better could be done. Does this, perhaps, indicate a lack of integration or commitment to the new architecture across the organization?

Maybe she was fitted with a wiseass detection system.

Anyway, I guess I won’t find out this year.

The Perils of Parallel

Sunday, October 2, 2011

Random Things of Interest at IDF 2011 (Intel Developer Forum)

Exhibit Hall

Keynotes

Intel Fellows Panel

No comments:

Post a Comment