Monday, May 3, 2010
All Hail the GPU! – a Tweetstream
I recently attended a talk at Colorado State University (CSU) by Sharon Glotzer, and tweeted what was going on in real time. Someone listening said he missed the start, and suggested I blog it. So, here is a nearly zero-effort blog post of my literal tweetstream, complete with hashtags.
I did add a few comments at the end, and some in the middle [marked like this]
Value add: This is a CSU Information Science and Technology (ISTeC) Distinguished Lecture Series. The slides and a video of the lecture will soon appear on the page summarizing all of the lectures. Keep scrolling down to the bottom. Many prior lectures are there, too.
At CSU talk by Sharon Glotzer UMich Ann Arbor "All Hail the GPU: How the Video Game Industry is Transforming Molecular & Materials Selection [no hashtag because I hit the 140 character limit]
Right now everybody's waiting to get the projector working #HailGPU
At meetup b4 talk, She said they redid their code in CUDA and overnight got 100X speedup #HailGPU [talked to her for about 3 minutes]
Backup projector found & works, presentation starting soon, I hope. #HailGPU
None of her affiliations is CS or EE -- all in materials, fluids, etc. "Good to be talking about our new tools." #HailGPU
First code she ever wrote as a grade student was for the CM-2. 64K procs. #HailGPU
Image examples of game graphics 2003, 05, 08 - 03 looks like Second Life. #HailGPU
Also b4 talk, said when they moved to Fermi they got another 2X. Not bothered using CUDA; says "it's easy." #HailGPU
Just reviewing CPU vs. GPU arch now. #HailGPU
"Typical scientific apps running on GPUs are getting 75% of peak speed." Hoowha? #HailGPU [This is an almost impossibly large efficiency. Says more about her problems than about GPUs.]
"Huge infusion from DARPA to make GPGPUs" -- Huh? Again. Who? When? #HailGPU
"Today you can get 2GF for $500. That is ridiculous." #HailGPU [bold obviously added here, to better indicate what she said]
Answer to Q: Nvidia got huge funding from DARPA to develop GPGPU technology over last 5 years. #HailGPU [I didn't know that. It makes all kinds of sense.]
"If you've ever written MPI code, CUDA is easy. Summer school students do it productively. Docs 1st rate." #HailGPU [MPI? As in message-passing? Leads to CUDA, which is stream? Say what? Must be a statement unique to her problem domain.]
Who should use them? Folks with data-parallel problems. Yes, indeed. #HailGPU
She works on self-assembly of molecules. Like lipids self-assembled into membranes. #HailGPU
Her group doing materials that change (Terminator), multi-function & sensors (Iron Man), cloaking (illustration was a blank :-) #HailGPU [cloaking as in "Klingons"] [Bah.]
Said those kinds of things are "what the material science community is doing now." #HailGPU
Hm, not seeing tweets from anybody else. Is this thing working? // ra_livesey @gregpfister - it most certainly is, keep going [Just wanted some feedback; wasn't seeing anything else.]
Her prob, Molecular Dynamics, is F=ma across a bazillion particles a bazillion times. Yeah, data parallel. #HailGPU [The second bazillion is doing the first bazillion over a bazillion time steps.]
First generates neighbor list for each particle - what particles does each particle interact with? Mainly based on distance. #HailGPU
Response to Q: Says can reduce neighbor calc from N^2 to less (but not "Barnes-Hut"), but no slides for that. #HailGPU
Typically have ~100 neighbors per particle. #HailGPU [Aha! This is where a chunk of the 100X speedup comes from! For each molecule or whatever, do exactly the same code in simple SIMD parallel for all 100 neighbors, at the same time, just varying their locations. If they had 100 threads; I think they do, would have to check. !Added in edit to this post!]
Says get same perf on $1200 GPU workstation as on $20,000 cluster. (whole MD code HOOMD-Blue) #HailGPU [I think I may have the numbers slightly wrong here – may have been $40,000, etc. – but the spirit is right; see the slides and presentation for what she exactly said.]
Most people would rewrite code for 3X speedup. For 100X, do it yesterday. #HailGPU
Done work on "patchy nanotetrahedra" forming strands that bundle together spontaneously. #HailGPU
"Monte Carlo not so data parallel" (I don't agree.) #HailGPU
Used to be a guy at IBM who did molecular dynamics on mainframes with attached vector procs. It is easy to parallelize. #HailGPU [Very, very easy. See "bazillions" above. In addition, lots of floating-point computing at each individual F=ma calculation.]
Guy at IBM was Enrico something-or-other. Forget last name. #HailGPU [Unfortunately, the only things after "Enrico" that come to my mind are "Fermi" -- which I know is wrong -- and, for some unknown psychological reason, "vermicelli." Also um, wrong. But tasty.]
Worked on how water molecules interacted. Thought massively parallel was trash. #myenemy #HailGPU
Trying to design material that, when you do something like turn on a light, chante: become opaque, start flowing, etc. #HailGPU
Also studying Tetris as a "primitive model of complex patchy particles" Like crystal structures form. #HailGPU
Students named their software suite "Glotzilla". Uh huh. She doesn't object. Self-analysis code called Freud. #HailGPU
My general take: MD simulation is a field in need of massive compute capabilities, is pleasantly parallel, more FPUs=good. #HailGPU
Answer to post-talk Q: Her Monte Carlo affects state of the system, can't accept moves that isn't legal and affects others. Strange.
Limits of GPU usability relate to memory size. They can do 100K particles, with limited-range interaction. #HailGPU
So if you have a really large-scale problem, can't use GPUs without going off-chip and losing a LOT. #HailGPU
Talk over, insane volume of tweets will now cease. #HailGPU
End of Tweetstream.
I went to a lunch with her, but didn't get a chance to ask any meaningful questions. Well, this depends on what your definition of "meaningful" is; she grew up in NYC, and therefore thinks thin-crusted, soft, drippy pizza is the only pizza. As do I. But she folds it. Heresy! That muffles the flavor!
More (or less) seriously, molecular dynamics has always been an area in which it is really fairly simple to achieve tremendous parallel efficiency: Many identical calculations (except for the data), lots of floating-point for each calculation (electric charge force, Van Der Walls forces, etc.), not a whole lot of different data required for each calculation. I have no doubt whatsoever that she gets 75% efficiency; I wouldn't be surprised at even better results. But I think it would be a mistake to think it's easy to extend such results outside that area. It was probably well worth DARPA's investment, though, in terms of the materials science enabled. I mean, cloaking? Really?