Monday, May 3, 2010

All Hail the GPU! – a Tweetstream

I recently attended a talk at Colorado State University (CSU) by Sharon Glotzer, and tweeted what was going on in real time. Someone listening said he missed the start, and suggested I blog it. So, here is a nearly zero-effort blog post of my literal tweetstream, complete with hashtags.

I did add a few comments at the end, and some in the middle [marked like this]

Value add: This is a CSU Information Science and Technology (ISTeC) Distinguished Lecture Series. The slides and a video of the lecture will soon appear on the page summarizing all of the lectures. Keep scrolling down to the bottom. Many prior lectures are there, too.

Starting Tweetstream:

At CSU talk by Sharon Glotzer UMich Ann Arbor "All Hail the GPU: How the Video Game Industry is Transforming Molecular & Materials Selection [no hashtag because I hit the 140 character limit]

Right now everybody's waiting to get the projector working #HailGPU

At meetup b4 talk, She said they redid their code in CUDA and overnight got 100X speedup #HailGPU [talked to her for about 3 minutes]

Backup projector found & works, presentation starting soon, I hope. #HailGPU

None of her affiliations is CS or EE -- all in materials, fluids, etc. "Good to be talking about our new tools." #HailGPU

First code she ever wrote as a grade student was for the CM-2. 64K procs. #HailGPU

Image examples of game graphics 2003, 05, 08 - 03 looks like Second Life. #HailGPU

Also b4 talk, said when they moved to Fermi they got another 2X. Not bothered using CUDA; says "it's easy." #HailGPU

Just reviewing CPU vs. GPU arch now. #HailGPU

"Typical scientific apps running on GPUs are getting 75% of peak speed." Hoowha? #HailGPU [This is an almost impossibly large efficiency. Says more about her problems than about GPUs.]

"Huge infusion from DARPA to make GPGPUs" -- Huh? Again. Who? When? #HailGPU

"Today you can get 2GF for $500. That is ridiculous." #HailGPU [bold obviously added here, to better indicate what she said]

Answer to Q: Nvidia got huge funding from DARPA to develop GPGPU technology over last 5 years. #HailGPU [I didn't know that. It makes all kinds of sense.]

"If you've ever written MPI code, CUDA is easy. Summer school students do it productively. Docs 1st rate." #HailGPU [MPI? As in message-passing? Leads to CUDA, which is stream? Say what? Must be a statement unique to her problem domain.]

Who should use them? Folks with data-parallel problems. Yes, indeed. #HailGPU

She works on self-assembly of molecules. Like lipids self-assembled into membranes. #HailGPU

Her group doing materials that change (Terminator), multi-function & sensors (Iron Man), cloaking (illustration was a blank :-) #HailGPU [cloaking as in "Klingons"] [Bah.]

Said those kinds of things are "what the material science community is doing now." #HailGPU

Hm, not seeing tweets from anybody else. Is this thing working? // ra_livesey @gregpfister - it most certainly is, keep going [Just wanted some feedback; wasn't seeing anything else.]

Her prob, Molecular Dynamics, is F=ma across a bazillion particles a bazillion times. Yeah, data parallel. #HailGPU [The second bazillion is doing the first bazillion over a bazillion time steps.]

First generates neighbor list for each particle - what particles does each particle interact with? Mainly based on distance. #HailGPU

Response to Q: Says can reduce neighbor calc from N^2 to less (but not "Barnes-Hut"), but no slides for that. #HailGPU

Typically have ~100 neighbors per particle. #HailGPU [Aha! This is where a chunk of the 100X speedup comes from! For each molecule or whatever, do exactly the same code in simple SIMD parallel for all 100 neighbors, at the same time, just varying their locations. If they had 100 threads; I think they do, would have to check. !Added in edit to this post!]

Says get same perf on $1200 GPU workstation as on $20,000 cluster. (whole MD code HOOMD-Blue) #HailGPU [I think I may have the numbers slightly wrong here – may have been $40,000, etc. – but the spirit is right; see the slides and presentation for what she exactly said.]

Most people would rewrite code for 3X speedup. For 100X, do it yesterday. #HailGPU

Done work on "patchy nanotetrahedra" forming strands that bundle together spontaneously. #HailGPU

"Monte Carlo not so data parallel" (I don't agree.) #HailGPU

Used to be a guy at IBM who did molecular dynamics on mainframes with attached vector procs. It is easy to parallelize. #HailGPU [Very, very easy. See "bazillions" above. In addition, lots of floating-point computing at each individual F=ma calculation.]

Guy at IBM was Enrico something-or-other. Forget last name. #HailGPU [Unfortunately, the only things after "Enrico" that come to my mind are "Fermi" -- which I know is wrong -- and, for some unknown psychological reason, "vermicelli." Also um, wrong. But tasty.]

Worked on how water molecules interacted. Thought massively parallel was trash. #myenemy #HailGPU

Trying to design material that, when you do something like turn on a light, chante: become opaque, start flowing, etc. #HailGPU

Also studying Tetris as a "primitive model of complex patchy particles" Like crystal structures form. #HailGPU

Students named their software suite "Glotzilla". Uh huh. She doesn't object. Self-analysis code called Freud. #HailGPU

My general take: MD simulation is a field in need of massive compute capabilities, is pleasantly parallel, more FPUs=good. #HailGPU

Answer to post-talk Q: Her Monte Carlo affects state of the system, can't accept moves that isn't legal and affects others. Strange.

Limits of GPU usability relate to memory size. They can do 100K particles, with limited-range interaction. #HailGPU

So if you have a really large-scale problem, can't use GPUs without going off-chip and losing a LOT. #HailGPU

Talk over, insane volume of tweets will now cease. #HailGPU

End of Tweetstream.

I went to a lunch with her, but didn't get a chance to ask any meaningful questions. Well, this depends on what your definition of "meaningful" is; she grew up in NYC, and therefore thinks thin-crusted, soft, drippy pizza is the only pizza. As do I. But she folds it. Heresy! That muffles the flavor!

More (or less) seriously, molecular dynamics has always been an area in which it is really fairly simple to achieve tremendous parallel efficiency: Many identical calculations (except for the data), lots of floating-point for each calculation (electric charge force, Van Der Walls forces, etc.), not a whole lot of different data required for each calculation. I have no doubt whatsoever that she gets 75% efficiency; I wouldn't be surprised at even better results. But I think it would be a mistake to think it's easy to extend such results outside that area. It was probably well worth DARPA's investment, though, in terms of the materials science enabled. I mean, cloaking? Really?


VicDiesel said...

Enrico is probably Clementi.

And that's strange that he thought massive parallel is crap. Only time I saw him live (1986 or so) he was extolling the virtues of his ICL Dap 4096 processor box.

Greg Pfister said...

@Vic - Yes! Thanks! It definitely was Enrico Clementi.

The last time I saw him was back in the late 1980s, in dueling presentations mode. He quoted Seymour Cray's saying to me -- "I'd rather plow a field with four strong horses than with 512 chickens."

The change from Seymour's two oxen vs. 1024 chickens was mainly a not too subtle dig at the 512-way parallel system I was presenting (RP3), of which I was chief architect. He had four vector units strapped to an S/370 (or whatever they were then), hence the four vs. two.

I guess the tune changes depending on who's paying the piper. As is true of many of us.


Roland said...

I believe you mean

Greg Pfister said...

That's it! Thanks!

Commented on wrong post, but that's OK. I'll update the post to point to it.

Daryl said...

2 Gflops for $500 doesn't sound like a big deal. Maybe she meant 2 Tflops?

Post a Comment

Thanks for commenting!

Note: Only a member of this blog may post a comment.