Topology Aware Scheduling

Configure topology aware scheduling with JobSet using the Kubernetes WAS APIs

Topology Aware Scheduling (TAS) ensures that all pods in a gang are placed within the same network topology domain (e.g., the same rack, block, or data center zone). This is critical for distributed training workloads where inter-pod communication latency directly impacts performance.

Prerequisites

In addition to the general WAS prerequisites, you must:

  1. Enable the TopologyAwareWorkloadScheduling feature gate. This feature gate is included in the kind cluster configuration and is required for topology-aware placement.

  2. Label your nodes with topology keys. The scheduler uses node labels to determine topology domains. Apply labels that represent your topology hierarchy — for example, rack, block, or zone:

    kubectl label node <node-name> topology.example.com/rack=rack-1
    

    Every node that should participate in topology-aware placement must have the topology label defined in the PodGroup’s schedulingConstraints.topology field.

How It Works

The PodGroup’s schedulingConstraints.topology field tells the scheduler which topology domain to consider when placing pods. The scheduler finds a single topology domain that can accommodate the entire gang and co-locates all pods within it.

For example, setting topology.example.com/rack as the topology key ensures all pods in the gang land on nodes within the same rack, minimizing network hops between them.

Step 1: Create the Workload

The Workload references the JobSet and defines a pod group template with a gang scheduling policy:

apiVersion: scheduling.k8s.io/v1alpha2
kind: Workload
metadata:
  name: js-abc
spec:
  controllerRef:
    apiGroup: jobset.x-k8s.io
    kind: JobSet
    name: js
  podGroupTemplates:
  - name: workers
    schedulingPolicy:
      gang:
        minCount: 4

The controllerRef links the Workload to the JobSet. The podGroupTemplates entry defines a gang scheduling policy requiring all 4 pods to be schedulable.

Step 2: Create the PodGroup

The PodGroup references the pod group template and adds topology constraints:

apiVersion: scheduling.k8s.io/v1alpha2
kind: PodGroup
metadata:
  name: js-abc-workers-def
  namespace: default
spec:
  podGroupTemplateRef:
    workload:
      workloadName: js-abc
      podGroupTemplateName: workers
  schedulingPolicy:
    gang:
      minCount: 4
  schedulingConstraints:
    topology:
    - key: topology.example.com/rack

The key addition compared to basic gang scheduling is schedulingConstraints.topology:

  • key: topology.example.com/rack: The scheduler will find a single rack (i.e., a set of nodes sharing the same topology.example.com/rack label value) that can fit all 4 pods and schedule them there.

Step 3: Create the JobSet

The JobSet pods reference the PodGroup through schedulingGroup.podGroupName:

apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
  name: js
spec:
  failurePolicy:
    maxRestarts: 10
  replicatedJobs:
  - name: rj
    replicas: 2
    template:
      spec:
        completions: 2
        parallelism: 2
        backoffLimit: 0
        template:
          spec:
            terminationGracePeriodSeconds: 0
            schedulingGroup:
              podGroupName: js-abc-workers-def
            containers:
            - name: worker
              image: busybox
              command: ["sleep", "infinity"]
              resources:
                requests:
                  cpu: "500m"

This JobSet creates 4 pods total (2 replicas × 2 completions). All 4 pods will be gang-scheduled onto nodes within the same topology.example.com/rack domain.

Verify Topology Placement

After applying all three resources, verify that the pods were co-located within the same topology domain:

kubectl get pods -l jobset.sigs.k8s.io/jobset-name=js -o wide

Check that all pods landed on nodes sharing the same topology label value:

kubectl get nodes -L topology.example.com/rack