Elasticsearch Guide

Slurm provides multiple Job Completion Plugins. These plugins are an orthogonal way to provide historical job accounting data for finished jobs.

In most installations, Slurm is already configured with an AccountingStorageType plugin — usually slurmdbd. In these situations, the information captured by a completion plugin is intentionally redundant.

The jobcomp/elasticsearch plugin can be used together with a web layer on top of the Elasticsearch server — such as Kibana — to visualize your finished jobs and the state of your cluster. Some of these visualization tools also let you easily create different types of dashboards, diagrams, tables, histograms and/or apply customized filters when searching.

Prerequisites

The plugin requires additional libraries for compilation:

Configuration

The Elasticsearch instance should be running and reachable from the multiple SlurmctldHost configured. Refer to the Elasticsearch Official Documentation for further details on setup and configuration.

There are three slurm.conf options related to this plugin:

Visualization

Once jobs are being indexed, it is a good idea to use a web visualization layer to analyze the data. Kibana is a recommended open-source data visualization plugin for Elasticsearch. Once installed, an Elasticsearch index name or pattern has to be configured to instruct Kibana to retrieve the data. Once data is loaded it is possible to create tables where each row is a finished job, ordered by any column you choose — the @end_time timestamp is suggested — and any dashboards, graphs, or other analysis of interest.

Testing and Debugging

For debugging purposes, you can use the curl command or any similar tool to perform REST requests against Elasticsearch directly. Some of the following examples using the curl tool may be useful.

Query information assuming a slurm index name, including the document count (which should be one per job indexed):

$ curl -XGET http://localhost:9200/_cat/indices/slurm?v
health status index uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   slurm 103CW7GqQICiMQiSQv6M_g   5   1          9            0    142.8kb        142.8kb

Query all indexed jobs in the slurm index:

$ curl -XGET 'http://localhost:9200/slurm/_search?pretty=true&q=*:*' | less

Delete the slurm index (caution!):

$ curl -XDELETE http://localhost:9200/slurm
{"acknowledged":true}

Query information about _cat options. More can be found in the official documentation.

$ curl -XGET http://localhost:9200/_cat

Failure management

When the primary slurmctld is shut down, information about all completed but not yet indexed jobs held within the Elasticsearch plugin saved to a file named elasticsearch_state, which is located in the StateSaveLocation. This permits the plugin to restore the information when the slurmctld is restarted, and will be sent to the Elasticsearch database when the connection is restored.

Acknowledgments

The Elasticsearch plugin was created as part of Alejandro Sanchez's Master's Thesis.

Last modified 6 August 2021