HPSS for GPFS at SC07

Are you having problems with your existing data management system? Are you having scaling issues, perhaps you are bottlenecked at a single server? Are you finding it difficult to backup or HSM manage your files because you have too many? Are you experiencing poor tape drive performance? If so, you really need to keep reading!

The IBM Billion File Demo showcased the General Parallel File System (GPFS) Information Lifecycle Management (ILM) policy scan performance, which was the springboard for introducing the new High Performance Storage System GPFS/HPSS Interface (GHI). At the Almaden Research Center, a pre-GA version of GPFS is capable of scanning a single GPFS file system, containing a billion files, in less than 15 minutes!

GPFS/HPSS Interface (GHI)

Why is the speed of the GPFS ILM policy scan important? GHI uses the policy scan results to manage the GPFS disk resources using HPSS, IBM's highly scalable Hierarchical Storage Management (HSM) system, and to backup the GPFS namespace to HPSS tape. The faster the file system can be scanned, the faster GHI can begin working on copying the data between GPFS and HPSS tape. Furthermore, policy scans can take place at more frequent intervals, resulting in better management of the file system.

The backup feature of GHI captures a point-in-time snapshot of the GPFS file system. If the GPFS file system should fail, GHI can help rebuild your GPFS file system. The restore feature of GHI re-populates the GPFS namespace, using a point-in-time backup. Once the namespace has been restored, the file system is available for use. As files are accessed, the file data are staged back to GPFS, from HPSS tape.

The HSM feature of GHI manages the disk space of the GPFS file system. GHI will allow you to store petabytes of GPFS files on terabytes of GPFS disks. As files are written to GPFS, they are copied to HPSS tape. As files age, the file data are removed from the GPFS disks, leaving only the filename behind. To the user, the file remains unchanged. If the user should access one of these files, GHI will automatically recall the file data back to GPFS, from HPSS tape. GPFS ILM policy scans can also be used to bulk stage a set of files back to GPFS, from HPSS tape.

The file aggregation feature of GHI improves tape drive performance. On most file systems, 90% of the files take up 10% of the disk space -- lots of small files. Copying small files to tape usually kills tape drive performance. Not with GHI. To maximize tape drive performance, GHI bundles small files into large aggregates. At SC07, we bundled 10,000 small files into each aggregate, and we were processing a dozen aggregates in parallel. Rather than writing 120,000 small files to tape, on a given policy scan, GHI only wrote twelve files to tape. This resulted in tape write performance that was close to the tape drive limits!

As the HPSS Collaboration and our other customers know, HPSS also has no problems dealing with HUGE files. Do you need more performance than a single tape drive can offer -- perhaps you have a requirement to copy a 1 TB file to tape in less than 30 minutes? HPSS can also stripe a file across multiple tapes to meet these types of requirements. The HPSS distributed Mover technology allows a single instance HPSS to achieve a very high total system throughput rate.

Both GPFS and HPSS are distributed, parallel and highly scalable by design, and can move data at incredible speeds. That's why we say...

GPFS + HPSS = Extreme Storage Scalability!

< Home

What's New?
HPSS @ ISC17 - ISC17 is the 2017 International Supercomputing Conference for high performance computing, networking, storage and analysis. ISC17 will be in Frankfurt, Germany, from June 18th through 22nd.

IBM single client briefing details found HERE.

Schedule a briefing by sending an email to Fabienne.Wegener@de.ibm.com

2017 HUF - The 2017 HPSS User Forum will be hosted by the high energy accelerator research organization Kō Enerugī Kasoki Kenkyū Kikō, known as KEK, in Tsukuba, Japan from October 16th through October 20th, 2017 - Learn More

HPSS @ SC17 - SC17 is the 2017 international conference for high performance computing, networking, storage and analysis. SC17 will be in Denver, Colorado from November 14th through 17th.

Come visit the HPSS folks at the IBM booth and schedule an HPSS briefing at the IBM Executive Briefing Center - URL coming soon!

Swift On HPSS - Leverage OpenStack Swift to provide an object interface to data in HPSS. Directories of files and containers of objects can be accessed and shared across ALL interfaces with this OpenStack Swift Object Server implementation - Contact Us for more information, or Download Now.

Capacity Leader - ECMWF (European Center for Medium-Range Weather Forecasts) has a single HPSS namespace with 269 PB spanning 282 million files.

File-Count Leader - LLNL (Lawrence Livermore National Laboratory) has a single HPSS namespace with 64 PB spanning 966 million files.

RAIT - Oak Ridge National Laboratory cut redundant tape cost-estimates by 75% with 4+P HPSS RAIT (tape stripe with rotating parity) and enjoy large file tape transfers beyond 1 GB/s.
Home    |    About HPSS    |    Services    |    Contact us
Copyright 2017, IBM Corporation. All Rights Reserved.