Big Data meets HPC: Convergence? (Part 1 of 2)

Data volumes, velocity, and variety are increasing as consumer devices become more powerful. PCs, smart phones and tablets are the instrumentation, along with the business applications that continually capture user input, usage patterns and transactions. As devices become more powerful each year (each few months!) the generated volumes of data and the speed of data flow both increase concomitantly. And the variety of available applications and usage models for consumer devices is rapidly increasing as well.

Are the Big Data and HPC disciplines converging or diverging?

Holding more and more data in-memory, via in-memory databases and in-memory computing, is becoming increasingly important in Big Data and data management more broadly. HPC has always required very large memories due to both large data volumes and the complexity of the simulation models.

Big Data and HPC ConvergenceIgauzu Falls: By Mario Roberto Duran Ortiz Mariordo (Own work) CC BY 3.0, via Wikimedia Commons

Volume and Velocity and Variety

As is often pointed out in the Big Data field, it is the analytics that matters. Collecting, classifying and sorting data is a necessary prerequisite. But until a proper analysis is done, one has only expended time, energy and money. Analytics is where the value extraction happens, and that must justify the collection effort.

Applications for Big Data include customer retention, fraud detection, cross-selling, direct marketing, portfolio management, risk management, underwriting, decision support, and algorithmic trading. Industries deploying Big Data applications include telecommunications, retail, finance, insurance, health care, and the pharmaceutical industry.

There are a wide variety of statistical methods and techniques employed in the analytical phase. These can include higher-level AI or machine learning techniques e.g. neural networks, support vector machines, radial basis functions, and nearest neighbor methods. These imply a significant requirement for a large number of floating point operations, which is characteristic of most of HPC.

For one view on this, here is a recent report on and video on “Why HPC is so important to AI”

If one has the right back-end applications and systems then it is possible to keep up with the growth in data and perform the deep analytics necessary to extract new insights about customers, their wants and desires, and their behavior and buying patterns. These back-end systems increasingly need to be of the scale of HPC systems in order to stay on top of all of the ever more rapidly incoming data, and to meet the requirement to extract maximum value.

In Part 2 of this blog series, we’ll look at how Big Data and HPC environments differ, and at what they have in common.