What do people do when they have a difficult problem that is too big for one computer processor? They turn to a supercomputer or to distributed computing, one form of which is cloud computing.
- Processor Proliferation
- Why People Choose Super vs. Cloud
- Cloud as a Form of Distributed Computing
- Cloud is Not All Created Equal
A computer contains a processor and memory. Essentially, the processor conducts the work and memory holds information.
When the work you need to conduct is relatively basic, you only need one processor. If you have many different variables or large data sets, though, you sometimes need additional processors.
“Many applications in the public and private sector require massive computational resources,” explained Center for Data Innovation research analyst Travis Korte, “such as real-time weather forecasting, aerospace and biomedical engineering, nuclear fusion research and nuclear stockpile management.”
For those situations and many others, people need more sophisticated systems that can process the data faster and more efficiently. In order to achieve that, these types of systems integrate thousands of processors.
You can work with a large pool of processors in two basic ways. One is supercomputing. Supercomputers are very big and costly. With that scheme, the computer sits in one location with all its many processors, and everything is flowing through the local network. The other way to incorporate various processors is distributed computing. With this scenario, the widely accepted standard form of which is cloud computing, the processors can be located in diverse geographical locations, with all communication through the Internet.
Why People Choose Super vs. Cloud
Since information moves so quickly between processors in a supercomputer, they can all contribute to the same task. They are a great fit for any applications that require real-time processing. The downside is that they are often prohibitively costly. They are made up of the best processors available, rapid memory, specially designed components, and elaborate cooling mechanisms. Plus, it isn’t easy to scale a supercomputer: once the machine is built, it becomes a project to load in additional processors.
In contrast, one reason that people choose the distributed computing of the cloud is that it is much more affordable. The design of a distributed network can be incredibly elaborate, but hardware components and cooling do not need to be high-end or specially designed. It scales seamlessly: processing power grows as additional servers (with their processors) are added to the network.
On the downside, Korte commented that supercomputers have the advantage of sending data a short distance through fast connections, while distributed cloud architecture requires the data to be sent through slower networks.
However, that is at odds with what supercomputing expert Geoffrey Fox of Indiana University (home of Big Red II) told the American Assocation of Medical Colleges: “Fox … says the cloud’s spare capacity often enables it to process a researcher’s data faster than a supercomputer, which can have long wait times.”
When you check the weather ahead of time and are expecting clear skies, it’s easy to be irritated with the meteorologist. However, weather is extraordinarily complex and notoriously difficult to predict.
Often, weather forecasting systems use supercomputers, said Korte. In order to properly determine how the weather might evolve in a given area, a supercomputer simulation will look at huge datasets containing the levels of temperature, wind, humidity, barometric pressure, sunlight, etc., across time. Furthermore, you don’t just want to look at this information locally but globally. To get reasonably accurate answers in real-time, you have to process all that data very quickly. Korte argued that it’s necessary to use a supercomputer if you want updates in real-time, but there are millions of real-time applications hosted in the cloud.
Continuing this line of thinking, Korte said that distributed computing such as cloud is useful particularly for projects that “are not as sensitive to latency.” He continued, “For example, when NASA’s Jet Propulsion Laboratory (JPL) needed to process high volumes of image data collected by its Mars rovers, a computer cluster hosted on [a cloud provider] was a natural fit.”
Cloud as a Form of Distributed Computing
A forum topic on Stack Overflow discussed the differences between cloud and distributed computing.
“[W]hat defines cloud computing is that the underlying compute resources … of cloud-based services and software are entirely abstracted from the consumer of the software / services,” commented elite user Nathan. “This means that the vendor of cloud based resources is taking responsibility for the performance / reliability / scalability of the computing environment.”
In other words, it’s easier since you don’t have to handle the maintenance and support.
Cloud is Not All Created Equal
We will continue this discussion in a second installment; before moving on, consider that describing these categories requires broad strokes. The truth is that there is a lot of disparity in quality between different cloud systems. In fact, many “cloud” providers aren’t actually distributed. That also means they don’t offer true 100% high-availability.
Benefit from InfiniBand (IB) technology and distributed storage, highly preferable to the centralized storage and Ethernet used by many providers. The technological upgrade will usually allow you to process data 300% faster with Superb Internet than with Amazon or SoftLayer when measuring VM’s with similar specs.
Note: Part Two will be coming soon…stay tuned!
By Kent Roberts