The HPSS Collaboration between IBM and what are now five DOE National Laboratories (Lawrence Berkeley, Lawrence Livermore, Los Alamos, Oak Ridge, and Sandia) began in the fall of 1992. The goal was to produce a highly scalable high performance storage system.
The High Performance Storage System (HPSS) needed to provide scalable hierarchical storage management (HSM), archive, and file system services. No product meeting the requirements existed. When HPSS design and implementation began scientific computing power and storage capabilities at a site, such as a DOE national laboratory, was measured in a few tens of gigaflops, data archived in HSMs in a few tens of terabytes at most, data throughput rates to an HSM in a few megabytes/sec, and daily throughput with the HSM in a few gigabytes/day. At that time, the DOE national laboratory and IBM HPSS design team recognized that we were headed for a data storage explosion driven by computing power rising to teraflops/petaflops requiring data stored in HSMs to rise to petabytes and beyond, data transfer rates with the HSM to rise to gigabytes/sec and higher, and daily throughput with an HSM in tens of terabytes/day. Therefore, we set out to design and deploy a system that would scale by a factor of 1,000 or more and evolve from the base above toward these expected targets and beyond.
Anticipating Future Storage Demands
Because of the highly scalable HPSS architecture, these targets have been successfully met. We now recognize that computing power will rise to exaflops with a corresponding rise in the need to scale storage in its various dimensions by another factor of 1,000. Further, other major application domains, such as real-time data collection, also require such extreme-scale storage. We believe the HPSS architecture and basic implementation, built around a scalable relational database management system (IBM’s Db2) make it well suited to this challenge.