TLS Certificate Manager

Overview

The certmgr plugin interface can be used alongside the tls plugin interface to dynamically create and renew signed certificates for slurmd nodes.

certmgr/script

The certmgr/script plugin allows scripts to be used to perform the necessary operations needed to validate node identity and generate signed certificates.

OpenSSL Example

This is an example using the openssl cli to generate certificate signing requests and to sign such requests to create signed certificates. This example is not meant to be used in production, and is only mean to show the intended responsibilities of each script.

In this example, there are a list of things that need to be preloaded on each machine before Slurm can do its certificate management.

slurmctld will need access to the CA certificate, and the CA certificate/key pair must be owned by SlurmUser (this is NOT recommended in a production setting). See the TLS page for more info on how to generate this certificate/key pair.

The following scripts need to be created and configured. See CertificateManagerParameters for more details on each script.

  • get_node_token_script
  • generate_csr_script
  • validate_node_token_script
  • sign_csr_script

slurmctld needs to be able to validate slurmd's certificate signing request. This is done via unique tokens that are retrieved on slurmd nodes using get_node_token_script, and validated on the slurmctld host using validate_node_token_script.

A unique token will to be generated for each slurmd. Each token will be stored on its respective slurmd host, as well as in a comprehensive list that contains all node tokens on the slurmctld host. This token will be sent from slurmd to slurmctld along with the certificate signing request that slurmd will generate at runtime, and be validated by slurmctld before slurmctld creates a signed certificate.

This is a simple example of how these tokens can be generated and stored:
# generate base64 32 character random token
base64 /dev/urandom | head -c 32 > ${NODENAME}_token.txt

# add token to token list
echo "`cat ${NODENAME}_token.txt`" >> node_token_list.txt

Node "n1" needs to boot up with n1_token.txt and/or have n1_token.txt securely transferred to it. slurmctld needs to have secure access to node_token_list.txt in order to validate node tokens with the validate_node_token script.

The get_node_token_script and generate_csr_script paths need to point to scripts that exist and are executable on slurmd nodes.

get_node_token_script example:

Print token to stdout. Return zero exit code for success, and non-zero exit code for error.

#!/bin/bash

# Slurm node name is passed in as arg $1
TOKEN_PATH=/etc/slurm/certmgr/$1_token.txt

# Check if token file exists
if [ ! -f $TOKEN_PATH ]
then
    echo "$BASH_SOURCE: Failed to resolve token path '$TOKEN_PATH'"
    exit 1
fi

# Print token to stdout
cat $TOKEN_PATH

# Exit with exit code 0 to indicate success
exit 0

generate_csr_script example:

Print certificate signing request to stdout. Return zero exit code for success, and non-zero exit code for error.

#!/bin/bash

# Slurm node name is passed in as arg $1
NODE_PRIVATE_KEY=/etc/slurm/certmgr/$1_private_key.pem

# Check if node private key file exists
if [ ! -f $NODE_PRIVATE_KEY ]
then
    echo "$BASH_SOURCE: Failed to resolve node private key path '$NODE_PRIVATE_KEY'"
    exit 1
fi

# Generate CSR using node private key and print CSR to stdout
openssl req -new -key $NODE_PRIVATE_KEY \
    -subj "/C=XX/ST=StateName/L=CityName/O=CompanyName/OU=CompanySectionName/CN=$1"

# Check exit code from openssl
if [ $? -ne 0 ]
then
    echo "$BASH_SOURCE: Failed to generate CSR"
    exit 1
fi

# Exit with exit code 0 to indicate success
exit 0

The validate_node_token_script and sign_csr_script paths need to point to scripts that exist and are executable on slurmctld.

validate_node_token_script example:

Return zero exit code for valid node tokens, and non-zero exit code for invalid node tokens or other errors.

#!/bin/bash

# Node's unique token is passed in as arg $1
NODE_TOKEN=$1
NODE_TOKEN_LIST_FILE=/etc/slurm/certmgr/node_token_list.txt

# Check if node token list file exists
if [ ! -f $NODE_TOKEN_LIST ]
then
    echo "$BASH_SOURCE: Failed to resolve node token list path '$NODE_TOKEN_LIST'"
    exit 1
fi

# Check if unique node token is in token list file
grep $1 $NODE_TOKEN_LIST_FILE

# Check exit code from grep to see if token was found
if [ $? -ne 0 ]
then
    echo "$BASH_SOURCE: Failed to validate token '$NODE_TOKEN'"
    exit 1
fi

# Exit with exit code 0 to indicate success (node token is valid)
exit 0

sign_csr_script example:

Print signed certificate to stdout. Return zero exit code for success, and non-zero exit code for error.

#!/bin/bash

# Certificate signing request is passed in as arg $1
CSR=$1
CA_CERT=/etc/slurm/certmgr/root_cert.pem
CA_KEY=/etc/slurm/certmgr/root_key.pem

# Check if CA certificate file exists
if [ ! -f $CA_CERT ]
then
    echo "$BASH_SOURCE: Failed to resolve CA certificate path '$CA_CERT'"
    exit 1
fi

# Check CA private key permissions
if [ `stat -c "%a" $CA_KEY` -ne $KEY_PERMISSIONS ]
then
    echo "$BASH_SOURCE: Bad permissions for CA private key at '$CA_KEY'. Permissions should be $KEY_PERMISSIONS"
    exit 1
fi

# Sign CSR using CA certificate and CA private key and print signed cert to stdout
openssl x509 -req -CA $CA_CERT -CAkey $CA_KEY 2>/dev/null <<< $CSR

# Check exit code from openssl
if [ $? -ne 0 ]
then
    echo "$BASH_SOURCE: Failed to generate signed certificate"
    exit 1
fi

# Exit with exit code 0 to indicate success
exit 0

If everything is configured correctly, the following lines should appear in the slurmd and slurmctld logs with the DebugFlags=TLS setting.

slurmd:

slurmd: certmgr/script: certmgr_p_get_node_token: TLS: Successfully retrieved unique node token
slurmd: certmgr/script: certmgr_p_generate_csr: TLS: Successfully generated csr:
-----BEGIN CERTIFICATE REQUEST-----
. . .
-----END CERTIFICATE REQUEST-----

slurmctld:

slurmctld: certmgr/script: certmgr_p_sign_csr: TLS: Successfully validated node token
slurmctld: certmgr/script: certmgr_p_sign_csr: TLS: Successfully generated signed certificate:
-----BEGIN CERTIFICATE-----
. . .
-----END CERTIFICATE-----

slurmd:

slurmd: TLS: Successfully got signed certificate from slurmctld:
-----BEGIN CERTIFICATE-----
. . .
-----END CERTIFICATE-----

DebugFlags=AuditTLS can also be used to show less verbose logs of certificate renewal.

Last modified 25 May 2025