High Performance Storage System

HPSS Logo
Incremental Scalability
Based on storage needs and deployment schedules, HPSS scales incrementally by adding computer, network and storage resources. A single HPSS namespace can scale from petabytes of data to exabytes of data, from millions of files to billions of files, and from a few file-creates per second to thousands of file-creates per second.
About HPSS   :    Frequently Asked Questions
Tape storage in external vaults and off-site facilities.
Question:

Does HPSS have the ability to designate backup copies of data for storage in an external vault or off-site facility?

Answer:

HPSS provides the capability to designate a set of tape volumes to be removed from the robotic tape library and stored in another facility.

Tape volumes containing backup copies of bitfiles can be selected for removal to a vault or off-site facility. The tape volumes must be in the End of Media (EOM) state in order to be a candidate for removal. HPSS provides a utility that allows the system administrator to specify the volumes to be removed. The utility then causes the tape library to eject the tapes.

Devices and tape robots
Question:

Can HPSS support a mix of tape sizes (regular and extended length) and tape drives (enterprise and LTO)?

Answer:

Yes, these media and drive types are supported in any hardware supported combination. The enterprise media will be used with the enterprise drives and the LTO media will be used with the LTO drives. HPSS will favor mounting a tape volume in a device that matches the configuration before trying other compatible drives. If an LTO 6 tape needs to be mounted, HPSS will look for an available LTO 6 drive before trying to mount the tape in an LTO 7 drive. The same is true for enterprise drive configurations.


Question:

Is it possible to share a robotic tape library?

Answer:

The robotic library system can be shared with other management systems, but it is not recommended. Drives assigned to HPSS are assumed to be exclusively used by HPSS. However, by using the HPSS administrative interface, a drive may be locked and made available for use by other applications. This technique allows the drives to be shared for tasks such as off-hours backups.


Question:

How does HPSS accommodate replacing an old generation of tape drives and media with new generation technology?

Answer:

Tape cartridges in a storage class can be changed from one media type to another. Volumes with the old media type can be set to End of Media (EOM). Cartridges with the new media type can then be added, and new writes will go to the new tape cartridges.

As more advanced storage technology becomes available or old storage technology becomes obsolete, there may be a need to replace the existing tape technology used by HPSS. HPSS provides a capability to replace the currently used tape technology with another technology. With this capability, the old technology volumes are marked "Retired" and new technology volumes are then created in the same storage class. Files written to this storage class will be written to the new technology volumes while the old technology volumes are treated by HPSS as read-only. Attrition and "repacking" will move all of the files from the old technology volumes to the new technology volumes. When the old technology volumes are empty they can be removed from HPSS.

The repacking of old technology cartridges can be done in the background. The repack process can be run continuously, and will not impact day to day operations of HPSS.

File I/O
Question:

Please describe how a client is made aware of error conditions taking place in the HPSS system (e.g. data to be retrieved is inaccessible). How does HPSS handle error situations, such as communication abort, or crash during the writing of a new bitfile.

Answer:

Errors generated during reads or writes are returned to the user's client program. It is up to the client program to retry such operations. In the case of tape-only writes, the tape will eventually be marked End of Media (EOM) and a new tape will be used for writing data.

After a mount timeout, a tape will be marked End of Media and a drive error count is incremented. After a configurable number of drive errors, the drive will be disabled. This processing prevents continued failures associated with a bad tape volume or bad drives.

Errors generated on migration to tape are hidden from user and the operation is repeated until successful by the migration process. For read errors on staged files, the users will receive an error to indicate the problem after a number of retries.

For problems with a drive or volume failure, the HPSS system will alert the administrator with messages to the Alarm and Event window.

Scalability, Capacity, and Limits
Question:

What are the built-in limits in HPSS?

Answer:

The following Core Server limits are imposed in HPSS Release 6.2:

Subsystem:
- Maximum number of HPSS subsystems: Unlimited
Storage Policy per
- Total Accounting Policies: 1
- Total Migration Policies: 64
- Total Purge Policies: 64
Storage Characteristics
- Total Storage Classes: 192
- Total Storage Hierarchies: 64
- Total Classes Of Service: 64
- Maximum storage levels per hierarchy: 5
- Maximum storage classes per level: 2
- Maximum number of File Families: Unlimited
- Maximum number of copies of a bitfile: 4
- Maximum number of disk storage segments per bitfile: 10,000
- Minimum number of disk storage segments per bitfile: 1
Mover Device/PVL Drive
- Maximum Devices/Drives: Unlimited
- Total Devices per Mover Process: 64
(Note: It is possible to configure more than one Mover process per Mover node.)


Question:

What happens when the Core Server maximum number of requests is exceeded? Do the requests fail or are they queued?

Answer:

HPSS will queue requests if ALL Core Server request-handling threads are currently in use. If the maximum number of I/O threads configured become busy, the Core Server currently returns HPSS_EBUSY. The Client API will retry requests that are rejected because the maximum number of I/O threads are in use.


Question:

What are the theoretical and practical performance limitations with a single Core Server using DB2?

Answer:

The practical performance limitations are bound by the performance and size of the meta-data disk and the size of the DB2 memory cache. Data movement should not be adversely affected as the number of files increase.


Question:

In an environment where millions of purge records have been generated (for example because millions of files are kept on disk cache), is there a large overhead in tracking files last access in the purge records?

Answer:

We do not expect a significant performance impact in managing large numbers of purge records. HPSS development regularly looks to improve performance, including increasing performance for faster purge record handling.


Question:

Is there a way to restrict the number of concurrent writes into a storage class (different bitfiles, not striping)?

Answer:

Currently, there does not exist a way to strictly limit the number of writes into a storage class in general. For tape storage classes, there is a configurable limit as to the number of active tape virtual volumes, which will provide an upper bound on the number of concurrent writes that can be active in that storage class (since each write will need to have exclusive access to a tape virtual volume during the actual write operation). However, the same capability is not supported for disk storage classes.


Question:

What are the limitations on the size of a bitfile? What are the restrictions on the number of tape volumes that a bitfile may span?

Answer:

HPSS has no practical limitation on the size of a bitfile, the number of bitfiles, or the size of a data set since it uses unsigned 64 bit numbers to represent bitfile sizes and offsets, allowing for bitfiles up to one Exabyte in size.

HPSS limits the number of disk segments per bitfile to 10,000, although in a properly configured HPSS system, this limit should never be reached. There is no such hard limit on HPSS tape bitfiles, but from previous experience, it has been determined that problems result in the underlying database when the maximum number of segments gets larger than 10,000 or 15,000. If you assume 10 TB tape cartridges and one tape segment per tape cartridge, then you have a limitation of 100 PB for the maximum size of an HPSS tape bitfile.


Question:

Is there any mechanism that allows the number of requests performed by a given user or group of users to be monitored and limited, preferably dynamically?

Answer:

HPSS provides facilities to monitor and limit the number of requests performed. HPSS supports a Gatekeeper Service, implemented by a gatekeeper server, which receives RPCs from the HPSS client library in response to client bitfile creates, opens, closes and explicit stages. These RPCs contain information on the identity of the end user which allow the administrator to provide policy modules that allow for monitoring and limiting resource usage. Sites can use and modify gatekeeper templates to implement these policies.

HPSS also has a basic Quota implementation using the Gatekeeper service. The system uses accounting data to limit the number of files and amount of data stored by users.

The FTP access file also allows for limiting controls on the number of concurrent FTP sessions.


Question:

How many removable media families does HPSS support.

Answer:

In HPSS, "File Family" is the term used for tape media family. HPSS supports an unlimited number of File Families.


Question:

Please explain the advantages of configuring two or more storage subsystems in a single HPSS instance versus multiple instances of HPSS in order to achieve scalability, separation, and growth.

Answer:

There are advantages in implementing two or more subsystems within one HPSS instance:

  • No performance degradation compared to two separate instances - There should be no performance impact of using subsystems versus two separate instances of HPSS. Subsystems allow the single instance to scale by adding processing resources as transaction performance requirements increase.
  • Single management interface - The complete system can be managed by one set of GUI screens and command line utilities. This will simplify the on-going operation, maintenance and administration of the system.
  • Use of a single core server machine - Both subsystem's core servers can be initially installed on the same processor machine, with provision to move to separate machines in the future as the system grows. We would not recommend that two separate production instances of HPSS be installed on the same core server machine.
  • Sharing of tape drives between subsystems - Subsystems provide flexibility in sharing storage resources. Storage classes (disk and tape), tape drives, and tape libraries can be shared across subsystems or dedicated to a subsystem, as best fits the requirements.
Data Movers
Question:

How does HPSS select the optimum network path for data transfers to/from the clients?

Answer:

HPSS always allows the system to send data transfers over the optimum delivery path. HPSS provides for the separation of the control path from the data path. In selecting network interfaces for data transfers, HPSS allows the client to provide a network address that is used to determine what interface will be used for the data transfer. The client's network address and Mover routing information are used to determine the path in which data transfers are directed. Additional network and routing information can be specified in the HPSS.conf file.

Metadata
Question:

How is the structure and content of the metadata used by HPSS made visible from the client point of view? Describe the utilities or mechanisms used to query and modify the metadata.

Answer:

HPSS provides a rich set of storage metadata and a robust set of APIs and administrative displays for viewing and modifying it. Attributes for the following metadata managed object types may be retrieved:

Bitfiles, Storage Segments, Storage Maps, Virtual Volumes, Physical Volumes, Mover Devices / Drives, Drives, Volumes, Cartridges, Filesets, Storage Classes, PVL Jobs / Queues, Log Files, and Server Configuration information.

In addition, APIs to set selected attributes of these managed objects are also supported.

Administrative displays for getting and setting attributes for the managed objects listed above are also provided.

In addition to the API and GUI interfaces described above, a set of utilities is provided for accessing metadata. Examples of these utilities are:

  • lshpss for listing the configuration information for Classes of Service, Hierarchies, Storage Classes, Migration Policies, Purge Policies, Physical Volumes, Mover Devices, PVL Drives, HPSS Servers, Movers, Accounting Policies, and Log Policies.
  • dump_sspvs to list the physical volumes known to an HPSS Storage Server.
  • dumpbf to display the storage characteristics of a particular bitfile.
  • lsvol to list the bitfiles that have storage segments on a particular volume.
  • dumppv_pvl to list the physical volumes managed by the Physical Volume Library (PVL).
  • dumppv_pvr to list the physical volumes managed by the Physical Volume Repository (PVR).
Repairing System Failures
Question:

How does HPSS handle the failure of single drive, volume or other storage device? If there is a failure of one drive or volume, are other drives/volumes affected?

Answer:

The failure of any single drive, volume or other storage device only affects data and bitfiles on that drive (etc.), and any requests directly related to it. An alarm signal is raised, but other operations continue as normal.

The administrator may also lock drives or volumes from being accessed. When HPSS detects specific errors, it will automatically set a tape to End of Media or, potentially, lock a drive. For example, after a mount timeout, a tape will be marked End of Media and the drive error count is incremented. After a configurable number of drive errors, the drive will be disabled. This process prevents continued failures associated with a bad tape volume or bad drives.

Alarms are sent to the Alarm and Event window, and are also written to the HPSS log.


Question:

Please describe how HPSS isolates corrupted portions of the system for repair work, while normal operations continue on the remainder.

Answer:

The following are provisions within HPSS, which allow for the isolation of corrupted portions of the system to be repaired, while normal operation continues on the remainder.

An HPSS Mover can be taken offline when repairs are required, while normal operations continue on the remainder.

The Virtual Volume can be marked "read-only." All subsequent attempts to allocate space on the Virtual Volume will fail (another VV in the same storage class will be selected, if available).

Locking a drive disallows its use by HPSS. Changing a drive's state to locked will ensure that the drive will not be used for new mounts, but it will not cause the dismount of any cartridges currently on the drive. The drive will be unloaded when the current client using the drive completes and dismounts.

A drive, which is currently mounted, can be locked. Locking a drive will not affect an active job which has a volume mounted. Once that job has completed, the drive will be dismounted and no further mounts will take place. This may be useful when preventative maintenance is required for an operating drive.


Question:

Please describe the HPSS features that provide for system availability in event of failures, for example, continuing to provide the full complement of its services, possibly at reduced performance, after a hardware or software failure has been experienced.

Answer:

A number of features are provided by HPSS for high availability.

Servers may be configured to automatically respawn upon failure. HPSS servers register their address information in the Startup Daemon and connection contexts are maintained between HPSS servers. Should a server fail, any other server with connections to that server will re-establish its connection when the failed server restarts.

For reliability of metadata, DB2 logs are mirrored on separate physical storage units. DB2 HADR functionality is also used to replicate metadata to a standby core server. In addition, regular metadata backups are performed and archived for recovery.

Storage Subsystems allow the system to be partitioned into sub-units according to name space. A separate set of HPSS core software processes are associated with each Storage Subsystem. As a result, a failure in one Storage Subsystem should not impact the other.

Administrative interfaces for locking devices is supported. This allows devices with hardware errors to be taken offline. In addition, in response to selected errors, the HPSS software will automatically lock drives.


Question:

How does HPSS notify the operations staff that manual intervention is required?

Answer:

Although HPSS provides several windows that display the status of different components of the system, the Alarms and Events window displays the most information when manual intervention is required. The messages on this window are color-coded to indicate the severity level of the messages (red indicating the highest severity).

Events displayed on the window indicate that a significant event has just occurred which may be of interest to the operator while alarms are indications that a server has detected an abnormal condition. If the operator left clicks on the message, another window appears with complete information on the message. Using the message number associated with the message, the operator can locate additional information on each alarm in the HPSS Error Message Reference Manual.

< Home

Come meet with us!
HPSS @ ISC18
The 2018 international conference for high performance computing, networking, and storage will be in Frankfurt, Germany from June 24th through 28th, 2018 - Learn More. Come visit the HPSS folks at the IBM booth and contact us if you would like to meet with the IBM business and technical leaders of HPSS in Frankfurt.

2018 HUF
The 2018 HPSS User Forum will be hosted by the UK Met Office in Exeter, United Kingdom from October 15th through October 18th, 2018. Please contact us if you would like to meet with the HPSS Collaboration leaders, HPSS Collaboration developers, architects, system engineers, support staff, project managers, and many of the HPSS customers.

The 2018 HUF information and registration web site is here. Hope to see you there!

HPSS @ SC18
The 2018 international conference for high performance computing, networking, storage and analysis will be in Dallas, Texas from November 11th through 15th, 2018 - Learn More. Come visit the HPSS folks at the IBM booth and contact us if you would like to meet with the IBM business and technical leaders of HPSS in Dallas.

HPSS @ MSST 2019
The 35th International Conference on Massive Storage Systems and Technology will be in Santa Clara, California in May of 2019 - Learn More. Please contact us if you would like to meet with the IBM business and technical leaders of HPSS at Santa Clara University.
What's New?
HPSS Vendor Partnership Grows - HPSS begins Quantum Scalar i6000 tape library testing in 2018. Other HPSS tape vendor partners include IBM, Oracle, and Spectra Logic.

Swift On HPSS - Leverage OpenStack Swift to provide an object interface to data in HPSS. Directories of files and containers of objects can be accessed and shared across ALL interfaces with this OpenStack Swift Object Server implementation - Contact Us for more information, or Download Now.

Capacity Leader - ECMWF (European Center for Medium-Range Weather Forecasts) has a single HPSS namespace with 318 PB spanning 330 million files.

File-Count Leader - LLNL (Lawrence Livermore National Laboratory) has a single HPSS namespace with 41 PB spanning 1.084 billion files.

RAIT - Oak Ridge National Laboratory cut redundant tape cost-estimates by 75% with 4+P HPSS RAIT (tape stripe with rotating parity) and enjoy large file tape transfers beyond 1 GB/s.
Home    |    About HPSS    |    Services    |    Contact us
Copyright 2018, HPSS Collaboration. All Rights Reserved.