Cluster System Management (CSM)ΒΆ
CSM is a cognitive self learning system for managing and overseeing a HPC cluster. CSM interacts with a variety of open source IBM tools for supporting and maintaining a cluster, such as:
- Discovery and management of system resources
- Database integration (PostgreSQL)
- Job launch support (workload management, cluster, and allocation APIs)
- Node diagnostics (diag APIs and scripts)
- RAS events and actions
- Infrastructure Health checks
- Python Bindings for C APIs
Table of Contents