HPSS for GPFS at SC07

Are you having problems with your existing data management system? Are you having scaling issues, perhaps you are bottlenecked at a single server? Are you finding it difficult to backup or HSM manage your files because you have too many? Are you experiencing poor tape drive performance? If so, you really need to keep reading!

The IBM Billion File Demo showcased the General Parallel File System (GPFS) Information Lifecycle Management (ILM) policy scan performance, which was the springboard for introducing the new High Performance Storage System GPFS/HPSS Interface (GHI). At the Almaden Research Center, a pre-GA version of GPFS is capable of scanning a single GPFS file system, containing a billion files, in less than 15 minutes!

GPFS/HPSS Interface (GHI)

Why is the speed of the GPFS ILM policy scan important? GHI uses the policy scan results to manage the GPFS disk resources using HPSS, IBM's highly scalable Hierarchical Storage Management (HSM) system, and to backup the GPFS namespace to HPSS tape. The faster the file system can be scanned, the faster GHI can begin working on copying the data between GPFS and HPSS tape. Furthermore, policy scans can take place at more frequent intervals, resulting in better management of the file system.

The backup feature of GHI captures a point-in-time snapshot of the GPFS file system. If the GPFS file system should fail, GHI can help rebuild your GPFS file system. The restore feature of GHI re-populates the GPFS namespace, using a point-in-time backup. Once the namespace has been restored, the file system is available for use. As files are accessed, the file data are staged back to GPFS, from HPSS tape.

The HSM feature of GHI manages the disk space of the GPFS file system. GHI will allow you to store petabytes of GPFS files on terabytes of GPFS disks. As files are written to GPFS, they are copied to HPSS tape. As files age, the file data are removed from the GPFS disks, leaving only the filename behind. To the user, the file remains unchanged. If the user should access one of these files, GHI will automatically recall the file data back to GPFS, from HPSS tape. GPFS ILM policy scans can also be used to bulk stage a set of files back to GPFS, from HPSS tape.

The file aggregation feature of GHI improves tape drive performance. On most file systems, 90% of the files take up 10% of the disk space -- lots of small files. Copying small files to tape usually kills tape drive performance. Not with GHI. To maximize tape drive performance, GHI bundles small files into large aggregates. At SC07, we bundled 10,000 small files into each aggregate, and we were processing a dozen aggregates in parallel. Rather than writing 120,000 small files to tape, on a given policy scan, GHI only wrote twelve files to tape. This resulted in tape write performance that was close to the tape drive limits!

As the HPSS Collaboration and our other customers know, HPSS also has no problems dealing with HUGE files. Do you need more performance than a single tape drive can offer -- perhaps you have a requirement to copy a 1 TB file to tape in less than 30 minutes? HPSS can also stripe a file across multiple tapes to meet these types of requirements. The HPSS distributed Mover technology allows a single instance HPSS to achieve a very high total system throughput rate.

Both GPFS and HPSS are distributed, parallel and highly scalable by design, and can move data at incredible speeds. That's why we say...

GPFS + HPSS = Extreme Storage Scalability!

< Home

What's New?
2017 HUF - The 2017 HPSS User Forum will be hosted by the high energy accelerator research organization Kō Enerugī Kasoki Kenkyū Kikō, known as KEK, in Tsukuba, Japan from October 16th through October 20th, 2017.

HPSS @ SC16 - SC16 is the 2016 international conference for high performance computing, networking, storage and analysis. SC16 will be in Salt Lake City, Utah from November 14th through 17th - Learn More. Come visit the HPSS folks at the IBM booth and schedule an HPSS briefing at the IBM Executive Briefing Center - Learn More

2016 HUF - The 2016 HPSS User Forum will be hosted by Brookhaven National Laboratory in New York City, New York from August 29th through September 2nd - For more information.

HPSS @ ISC16 - ISC16 is the 2016 International Supercomputing Conference for high performance computing, networking, storage and analysis. ISC16 will be in Frankfurt, Germany, from June 20th through 22nd - Learn More. Come visit the HPSS folks at the IBM booth and schedule an HPSS briefing at the IBM Executive Briefing Center - Learn More.

Swift On HPSS - Leverage OpenStack Swift to provide an object interface to data in HPSS. Directories of files and containers of objects can be accessed and shared across ALL interfaces with this OpenStack Swift Object Server implementation - Contact Us for mor information, or Download Now.

Capacity Leader - ECMWF (European Center for Medium-Range Weather Forecasts) has a single HPSS namespace with 216 PB spanning 257 million files.

File-Count Leader - LLNL (Lawrence Livermore National Laboratory) has a single HPSS namespace with 62 PB spanning 940 million files.

ORNL - Oak Ridge National Laboratory cut redundant tape cost-estimates by 75% with 4+P HPSS RAIT and enjoys large file tape transfers reaching 872 MB/s.
Home    |    About HPSS    |    Services    |    Contact us
Copyright 2015, IBM Corporation. All Rights Reserved.