Prometheus Metrics
Prometheus metrics exported by Jobset
Prometheus Metrics
JobSet exposes prometheus metrics to monitor the health of the controller.
Installation Examples
The following example show how to install the Prometheus Operator for JobSet system.
JobSet controller health
Use the following metrics to monitor the health of the jobset controller:
Metric name | Type | Description | Labels |
---|---|---|---|
controller_runtime_reconcile_errors_total |
Counter | The total number of reconciliation errors encountered by each controller. | controller : name of controller (i.e. use value jobset to obtain metrics for jobset controller) |
controller_runtime_reconcile_time_seconds |
Histogram | The latency of a reconciliation attempt in seconds. | controller : name of controller (i.e. use value jobset to obtain metrics for jobset controller) |
JobSet metrics
Use the following metrics to monitor the health of the jobsets created by the jobset controller:
Metric name | Type | Description | Labels |
---|---|---|---|
jobset_failed_total |
Counter | The total number of failed JobSets. | jobset_name : name of jobset |
jobset_completed_total |
Counter | The total number of completed JobSets. | jobset_name : name of jobset |
Last modified July 30, 2024: docs: update metrics info for site (49e89c5)