Concepts

Core Jobset Concepts

Jobset

A JobSet creates one or more Jobs. It allows you to create sets of jobs of different or identical templates, and control their lifecycle together.

Conceptual Diagram

jobset diagram

Running an Example JobSet

Here is an example JobSet. It runs a distributed PyTorch training workload.

apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
  name: pytorch
spec:
  replicatedJobs:
    - name: workers
      template:
        spec:
          parallelism: 4
          completions: 4
          backoffLimit: 0
          template:
           spec:
            containers:
            - name: pytorch
              image: gcr.io/k8s-staging-jobset/pytorch-resnet:latest
              ports:
              - containerPort: 3389
              env:
              - name: MASTER_ADDR
                value: "pytorch-workers-0-0.pytorch"
              - name: MASTER_PORT
                value: "3389"
              command:
              - bash
              - -xc
              - |
                                torchrun --nproc_per_node=1 --master_addr=$MASTER_ADDR --master_port=$MASTER_PORT resnet.py --backend=gloo

To list all the jobs that belong to a JobSet, you can use a command like this:

kubectl get jobs --selector=jobset.sigs.k8s.io/jobset-name=pytorch

The output is similar to

NAME                COMPLETIONS   DURATION   AGE
pytorch-workers-0   0/4           6m12s      6m12s

To list all the pods that belong to a JobSet, you can use a command like this:

kubectl get pods --selector=jobset.sigs.k8s.io/jobset-name=pytorch

The output is similar to

NAME                        READY   STATUS    RESTARTS   AGE
pytorch-workers-0-0-tcngx   1/1     Running   0          13m
pytorch-workers-0-1-sbhxs   1/1     Running   0          13m
pytorch-workers-0-2-5m6d6   1/1     Running   0          13m
pytorch-workers-0-3-mn8c8   1/1     Running   0          13m

JobSet defaults for Jobs and Pods

JobSet labels

JobSet labels will have jobset.x-k8s.io/ prefix. JobSet sets the following labels on both the jobs and pods:

  • jobset.sigs.k8s.io/jobset-name: .metadata.name
  • jobset.sigs.k8s.io/replicatedjob-name: .spec.replicatedJobs[*].name
  • jobset.sigs.k8s.io/replicatedjob-replicas: .spec.replicatedJobs[*].replicas
  • jobset.sigs.k8s.io/job-index: ordinal index of a job within a spec.replicatedJobs[*]

ReplicatedJob

The list .spec.replicatedJobs allows the user to define groups of Jobs of different templates.

Each entry of .spec.replicatedJobs defines a Job template in spec.replicatedJobs[*].template, and the number replicas that should be created in spec.replicatedJobs[*].replicas. When unset, it is defaulted to 1.

Each Job in each spec.replicatedJobs gets a different job-index in the range 0 to .spec.replicatedJob[*].replicas-1. The Job name will have the following format: <jobSetName>-<replicatedJobName>-<jobIndex>.

DNS hostnames for Pods

By default, JobSet configures DNS for Pods by creating a headless service for each spec.replicatedJobs. The headless service name, which determines the subdomain, is .metadata.name-.spec.replicatedJobs[*].name

To list all the headless services that belong to a JobSet, you can use a command like this:

kubectl get services --selector=jobset.sigs.k8s.io/jobset-name=pytorch

The output is similar to

NAME              TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
pytorch-workers   ClusterIP   None         <none>        <none>    25m

Exclusive Job to topology placement

The JobSet annotation alpha.jobset.sigs.k8s.io/exclusive-topology defines 1:1 job to topology placement. For example, consider the case where the nodes are assigned a rack label. To optimize for network performance, we want to assign each job exclusively to one rack. This can be done as follows:

apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
  name: pytorch
  annotations:
    alpha.jobset.sigs.k8s.io/exclusive-topology: rack
spec:
  replicatedJobs:
    - name: workers
      template:
        spec:
          ...

JobSet termination

A JobSet is marked as successful when ALL the Jobs it created completes successfully.

A JobSet failure is counted when ANY of its child Jobs fail. spec.failurePolicy.maxRestarts defines how many times
to automatically restart the JobSet. A restart is done by recreating all child jobs.

A JobSet is terminally failed when the number of failures reaches spec.failurePolicy.maxRestarts

Coordinator

A specific pod can be assigned as coordinator using spec.coordinator. If defined, a jobset.sigs.k8s.io/coordinator annotation and label with the stable network endpoint of the coordinator Pod will be added to all Jobs and Pods in the JobSet. This label can be useful by other Pods. For example:

apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
  name: pytorch
spec:
  coordinator:
    replicatedJob: leader
    jobIndex: 0
    podIndex: 0
  replicatedJobs:
    - name: leader
      replicas: 1
      template:
        spec:
          parallelism: 1
          completions: 1
          ...
    - name: workers
      replicas: 1
      template:
        spec:
          parallelism: 8
          completions: 8
          template:
            spec:
              containers:
              - name: worker
                env:
                - name: LEADER_ADDRESS
                  valueFrom:
                    fieldRef:
                      fieldPath: "metadata.labels['jobset.sigs.k8s.io/coordinator']"
            ...

All Jobs and Pods in the JobSet will have a jobset.sigs.k8s.io/coordinator=pytorch-leader-0-0.pytorch label and annotation.

WARNING: if using Kueue with JobSet, ensure Kueue version v0.8.3+ or v0.9.0+ is installed. Prior versions are built using a JobSet api definition that does not have the coordinator field: https://github.com/kubernetes-sigs/kueue/issues/3495.

Last modified November 8, 2024: Add Coordinator concept (#702) (98bf0e5)