Volume Claim Policies
JobSet provides the VolumeClaimPolicies API to automatically create and manage shared PersistentVolumeClaims (PVCs) across multiple ReplicatedJobs within a JobSet. This enables stateful JobSets that require persistent storage for datasets, models, checkpoints, or intermediate results.
Basic Usage
To use VolumeClaimPolicies, define them in the volumeClaimPolicies field of your JobSet spec.
Each policy can contain one or more PVC templates.
This example demonstrates creating shared PVCs with different retention policies:
In this example:
- An
initializerReplicatedJob downloads a model to theinitializervolume - A
nodeReplicatedJob reads the model and writes checkpoints which contain index of the ReplicatedJob - The PVCs are automatically created with the naming convention:
<claim-name>-<jobset-name>initializer-volume-claim-trainjob(deleted when JobSet is deleted)checkpoints-volume-claim-trainjob(retained after JobSet is deleted)
# This example creates two shared PVC for two ReplicatedJobs.
# The first PVC: initializer will be deleted after JobSet is deleted.
# The second PVC: checkpoints will be retained after JobSet is deleted.
# The second replicatedJob runs after the first is complete. After JobSet is complete or deleted,
# you can create a simple busybox Pod to mount this PVC: checkpoints-volume-claim-trainjob.
# You should see the following: $ /workspace/checkpoints # ls
# model_0.txt model_1.txt model_2.txt
apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
name: volume-claim-trainjob
spec:
volumeClaimPolicies:
- templates:
- metadata:
name: initializer
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 1Gi
- templates:
- metadata:
name: checkpoints
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 1Gi
retentionPolicy:
whenDeleted: Retain
replicatedJobs:
- name: initializer
template:
spec:
template:
spec:
containers:
- name: initializer
image: busybox
command:
- /bin/sh
- -c
- |
echo "Download pre-trained model into /workspace/model/qwen3.txt"
echo 'Qwen3-30b' > /workspace/model/qwen3.txt
volumeMounts:
- mountPath: /workspace/model
name: initializer
- name: node
dependsOn:
- name: initializer
status: Complete
template:
spec:
parallelism: 3
completions: 3
template:
spec:
containers:
- name: node
image: busybox
command:
- /bin/sh
- -c
- |
echo "Read pre-trained model" && cat /workspace/model/qwen3.txt
echo "Write model checkpoint to /workspace/checkpoints/model_node_${JOB_COMPLETION_INDEX}"
echo "Checkpoint: model_${JOB_COMPLETION_INDEX}" > /workspace/checkpoints/model_${JOB_COMPLETION_INDEX}.txt
volumeMounts:
- mountPath: /workspace/model
name: initializer
- mountPath: /workspace/checkpoints
name: checkpoints
How Volumes Are Mounted
To mount a shared PVC in your pods:
- Define a
volumeClaimPoliciestemplate with a specific name (e.g.,model-data) - Add a
volumeMountin your container with the same name - JobSet automatically injects the PVC
volumeinto your pod spec and creates the appropriate PVC
Retention Policies
VolumeClaimPolicies support retention policies to control what happens to PVCs when the JobSet is deleted.
Delete (Default)
The PVC is automatically deleted when the JobSet is deleted. This is the default behavior when no retention policy is specified.
volumeClaimPolicies:
- templates:
- metadata:
name: temporary-data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
retentionPolicy:
whenDeleted: Delete
Retain
The PVC is kept after the JobSet is deleted, allowing you to access the data later or use it in subsequent JobSets.
Note
If you are trying to use the existing volume in the VolumeClaimPolicies, the spec must be equal to the existing PVC spec.volumeClaimPolicies:
- templates:
- metadata:
name: checkpoints
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
retentionPolicy:
whenDeleted: Retain # PVC survives JobSet deletion
When using Retain, you can access the persisted data by:
- Creating a new JobSet with a volumeMount referencing the existing PVC name
- Mounting the PVC directly in a debug pod
- Using the PVC in other workloads
Custom Labels and Annotations
You can add custom labels and annotations to PVC templates for organization, monitoring,
or integration with other tools. These labels and annotations are preserved on the created PVCs
along with the automatically added jobset.sigs.k8s.io/jobset-name label.
volumeClaimPolicies:
- templates:
- metadata:
name: my-volume
labels:
team: ml-platform
environment: production
content-type: model
annotations:
backup-policy: "daily"
retention-days: "30"
spec:
accessModes: ["ReadWriteMany"]
resources:
requests:
storage: 100Gi
Limitations
- Maximum of 50 volume claim templates per JobSet
- PVC templates cannot specify the
namespacefield (namespace is inherited from the JobSet) - ReplicatedJob templates must not define volumes with the same name as VolumeClaimPolicy templates
- At least one container or initContainer in the ReplicatedJobs must have a volumeMount matching each volume claim template name
- When defining the existing volume in VolumeClaimPolicies the spec must be equal to the pre-created PVC
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.