Tuesday, January 20, 2009

Multi-Multicore Single System Image / Cloud Computing. A Good Idea? (part 1)

Wouldn't it be wonderful if you could simply glue together several multicore systems with some software and make the result look like one bigger multicore system?

This sounds like something to make sysadmins salivate and venture capitalists trample each other in a rush to fund it. I spent a whole chapter of In Search of Clusters, both editions, explaining why I thought it was wonderful. Some of these reasons are echoed in today's Cloud Computing hubbub, particularly the ability to use a collection of computers as a seamless, single resource. The flavor is very different, and the implementation totally different, but that intent and the effects are the same.

Unfortunately, while this has been around for quite a while, it has never really caught on despite numerous very competent attempts. One must therefore ask the embarrassing question: Why? What kind of computer halitosis does it have to never have been picked up?

That's the subject of this series of posts. I'm going to start at the beginning, explaining what it really is (and why it's not unlike cloud computing); then describe some of the history of attempts to get it on the market; some of the technology that underlies it; and finally take a run at the central question: If it's so wonderful, can be implemented, and has appeared in products, why hasn't it taken over the world?

Doing this will span several posts; it's a long story. (This time I promise to link them in reading order.)

Note: This is about a purely software approach to multiplying the multi of multicore; it's done by distributing the operating system. Plain, ordinary, communications gear – Ethernet, likely, but whatever you like – is all that connects the systems. Using special hardware to pull the same trick will be the subject of later posts, as may techniques to distribute other entities to the same effect, like a JVM (Java Virtual Machine).

Next, a discussion of what this really is.


Anonymous said...

Hi Greg,

I enjoyed reading many of your comments about SSI not making it in the marketplace for various reasons. I do however want to point out that for most people in the High Performance Computing cluster world that SSI would be a huge win --- if/when the price comes down.

If you check with the regular HPC folks (not the Gov lab types) you will find everybody is hoping that a cost effective SSI model comes about because most everyone moved to clusters because they could not afford a SSI (SMP) machine large enough to meet their computing needs .



Greg Pfister said...

Hi, John.

Thanks for the comment. You may be correct. But there is OpenMosix, an open source cluster SSI system. See the Wikipedia entry, which points to Moshe Bar's reasons for its end of life. If as he said it competed with multicores, I'd guess it didn't scale very far - but that could have been fixed, with motivation.

Greg Pfister

Cameron Bahar said...


I'm a fan of your book and I read it after working at Locus Computing for a good number of years. Was interested to learn more about different cluster architectures and why Locus had essentially failed even though we all thought it was so cool and great to have this technology. At Locus, working under the leadership of Jerry Popek and company we did actually build a SSI unix-flavored operating system for IBM called AIX-TCF (transparent computing facility). IBM ran into trouble in 1992 and moved operations to Austin Texas and essentially disconnected from Locus over the years after Locus refused a buyout offer from IBM which was a mistake. A lot of the Locus folks ended up at Interactive/Sun/Solaris-X86 and thus was born the Solaris X86 operating system. Locus's demise benefited Sunsoft and Solaris X86, even though Sun didn't leverage the X86 platform to it's own demise! Will be interesting to see what Larry will do with Solaris.

Now to SSI and why didn't it take off. I think there are a number of reasons. Foremost, the concept and vision is certainly attractive. We had single system semantics across nodes and all OS services were distributed. We could migrate processes from one node to another. We had remote devices. We had a distributed replication filesystem (I worked on that a little) and transparency all over the system. But, I think the main problem was that ethernet got us effectively 4-5 Mbits/sec on a 10Mbit link and were a tightly coupled operating system passing messages for consistency and coherency between nodes. The slow network caused lots of problems and things didn't always converge to a stable steady state. As you increased the number of nodes, the complexity grows and the number of messages increases and scalability suffers. HA was also problematic and not well thought out and we didn't have shared SAN's to help keep data around. So a shared nothing SSI cluster with a slow network is dead in its tracks.

I joined Teradata after Locus and there we had a shared nothing database with a dedicated fibre optic network with Gigabit links. Now this worked a lot better.

The distinction between Teradata and Locus is interesting, because Teradata is extremely successful and still leads in performance today.

What is this difference:

Teradata runs 1 application and that is a parallel database for data mining. One function, entire system optimized for a RDBMS with a specific workload. They have a parallel application interface but the whole thing always runs the database and is not GENERAL PURPOSE. So they pick a problem and solve it well.

With Locus, the idea is this is GENERAL PURPOSE and all apps will run unmodified and it's the holy grail. As I've learned over the last 20 years, systems are built and optimized for certain workloads and there's no one size fits all system.

So HP/IBM/Intel/Tandem/Pyramid and all others bought the cool-aid, but at the end of the day it just doesn't work and is not needed.

What worked was a unix server and NFS for remote access to data. That model WON and SSI lost. Why did this win, because it was loosely coupled collection of independent nodes and the work was partitioned among the nodes with great autonomy between the nodes, i.e. No message passing.

Jerry passed away last year, god rest his soul. He was a visionary and a great teacher. He also wrote the first paper on virtualization which vmware is based on. At least the virtualization idea has caught on and has spawned the cloud computing era. Again independent nodes with single operating systems providing service, no message passing.

A few years ago I founded a company called ParaScale (parallel scalability). We're building a cloud storage platform which basically means an application layer software that runs on top of a number of Linux servers and federates them into a single cloud storage platform providing file services over standard protocols. Again, the idea here is loosely coupled, autonomous nodes all operating with the guidance of a few master nodes that are highly available. By learning from the past and focusing on providing a specific service (in this case storage services) we hope to achieve our goals of reliability and scalability and virtualization.

I enjoyed reading your multi-part post on SSI and it brought back some good memories. We have a few Locus people working at ParaScale and still solving interesting distributed system problems.


Greg Pfister said...

Cameron, I have no idea why it took me so long to find and read your comment, but thank you!

You can add DEC to the list of people who bought the kool-aid. I was involved in that incarnation; it even had the AIX people ready to modify their kernel to include it (unheard of). Then Compaq intervened, I think. One way or the other it didn't happen.

I agree NFS file servers won out, but I think you have to go a level deeper for why: The real why, I believe, was multiple. HA, of course. And application design / programming model as I discussed above. That fit the NFS + servers model well, with no added magic technology.

Thanks again, and best of luck at Parascale!


Post a Comment

Thanks for commenting!

Note: Only a member of this blog may post a comment.