Vector processing was an unexpected topic to emerge from the International Supercomputing Conference (ISC)held last week.
On the Monday of the conference, a new leader on the TOP500 list was announced. The Sunway TaihuLight system uses a new processor architecture that is Single-Instruction-Multiple-Data (SIMD) with a pipeline that can do eight 64-bit floating-point calculations per cycle.
This started us thinking about vector processing, a time-honored system architecture that started the supercomputing market. When microprocessors advanced enough to enable massively parallel processing (MPP) systems and then Beowulf and scale-out clusters, the supercomputing industry moved away from vector processing and led the scale-out model.
Later that day, at the “ISC 2016 Vendor Showdown”, NEC had a presentation about its project “Aurora”. This project aims to combine x86 clusters and NEC’s vector processors in the same high bandwidth system. NEC has a long history of advanced vector processors with its SX architecture. Among many achievements, it built the Earth Simulator, a vector-parallel system that was #1 on the TOP500 list from 2002 to 2004. At its debut, it had a substantial (nearly 5x) lead over the previous #1.
Close integration of accelerator technologies with the main CPU is, of course, a very desirable objective. It improves programmability and efficiency. Along those lines, we should also mention the Convey system, which goes all the way, extending the X86 instruction set, and performing the computationally intensive tasks in an integrated FPGA.
A big advantage of vector processing is that it is part of the CPU with full access to the memory hierarchy. In addition, compilers can do a good job of producing optimized code. For many codes, such as in climate modelling, vector processing is quite the right architecture.
Vector parallel systems extended the capability of vector processing and reigned supreme for many years, for very good reasons. But MPPs pushed vector processing back, and GP-GPUs pushed it further still. GPUs leverage the high volumes that the graphics market provides and can provide numerical acceleration with some incremental engineering.
But as usual, when you scale more and more, you scale not just capability, but also complexity! Little inefficiencies start adding up until they become a serious issue. At some point, you need to revisit the system and take steps, perhaps drastic steps. The Sunway TaihuLight system atop the TOP500 list is an example of this concept. And there are new applications like deep learning that look like they could use vectors to quite significant advantage.
There are other efforts towards building new exascale-class CPUs such, the “Neo processor” that Rex Computing is developing.
Will vector processing re-emerge as a lightweight, high-performing architecture?
What is the likelihood?