Brian Hellman from LINBIT joins as the guest of Dan Olds and Shahin Khan to discuss our recent blog and research paper, The Cost of Data Loss and How to Avoid It. The research goes beyond traditional “outage” of resources and lookins at the value of the data that you are not collecting when there is downtime. In this podcast, the team looks at consequences of data loss, cost categories of data, and available solutions to prevent data loss.
OrionX is a team of industry analysts, marketing executives, and demand generation experts. With a stellar reputation in Silicon Valley, OrionX is known for its trusted counsel, command of market forces, technical depth, and original content.
Data volumes, velocity, and variety are increasing as consumer devices become more powerful. PCs, smart phones and tablets are the instrumentation, along with the business applications that continually capture user input, usage patterns and transactions. As devices become more powerful each year (each few months!) the generated volumes of data and the speed of data flow both increase concomitantly. And the variety of available applications and usage models for consumer devices is rapidly increasing as well.
Are the Big Data and HPC disciplines converging or diverging?
Holding more and more data in-memory, via in-memory databases and in-memory computing, is becoming increasingly important in Big Data and data management more broadly. HPC has always required very large memories due to both large data volumes and the complexity of the simulation models.
Igauzu Falls: By Mario Roberto Duran Ortiz Mariordo (Own work) CC BY 3.0, via Wikimedia Commons
Volume and Velocity and Variety
As is often pointed out in the Big Data field, it is the analytics that matters. Collecting, classifying and sorting data is a necessary prerequisite. But until a proper analysis is done, one has only expended time, energy and money. Analytics is where the value extraction happens, and that must justify the collection effort.
Applications for Big Data include customer retention, fraud detection, cross-selling, direct marketing, portfolio management, risk management, underwriting, decision support, and algorithmic trading. Industries deploying Big Data applications include telecommunications, retail, finance, insurance, health care, and the pharmaceutical industry.
There are a wide variety of statistical methods and techniques employed in the analytical phase. These can include higher-level AI or machine learning techniques e.g. neural networks, support vector machines, radial basis functions, and nearest neighbor methods. These imply a significant requirement for a large number of floating point operations, which is characteristic of most of HPC.
If one has the right back-end applications and systems then it is possible to keep up with the growth in data and perform the deep analytics necessary to extract new insights about customers, their wants and desires, and their behavior and buying patterns. These back-end systems increasingly need to be of the scale of HPC systems in order to stay on top of all of the ever more rapidly incoming data, and to meet the requirement to extract maximum value.
In Part 2 of this blog series, we’ll look at how Big Data and HPC environments differ, and at what they have in common.
Stephen Perrenod has lived and worked in Asia, the US, and Europe and possesses business experience across all major geographies in the Asia-Pacific region. He specializes in corporate strategy for market expansion, and cryptocurrency/blockchain on a deep foundation of high performance computing (HPC), cloud computing and big data. He is a prolific blogger and author of a book on cosmology.