Cluster Administration Storage Tools (CAST)

CAST is comprised of several open source components:

Cluster System Management (CSM)

Cluster System Management (CSM) is a cognitive self learning system for managing and overseeing a HPC cluster. CSM interacts with a variety of open source IBM tools for supporting and maintaining a cluster, such as:

  • Discovery and management of system resources
  • Database integration (PostgreSQL)
  • Job launch support (workload management, cluster, and allocation APIs)
  • Node diagnostics (diag APIs and scripts)
  • RAS events and actions
  • Infrastructure Health checks
  • Python Bindings for C APIs

Burst Buffer

The Burst Buffer is an I/O data caching technology which can improve I/O performance for a large class of high-performance computing applications without requirement of intermediary hardware.

Burst Buffer provides:

  • A fast storage tier between compute nodes and the traditional parallel file system
  • Overlapping job stage-in and stage-out of data for checkpoint and restart
  • Scratch volumes
  • Extended memory I/O workloads
  • Usage and SSD endurance monitoring