Selecting series
Select latest sample for series with a given metric name:
node_cpu_seconds_total
Select 5-minute range of samples for series with a given metric name:
node_cpu_seconds_total[5m]
Only series with given label values:
node_cpu_seconds_total{cpu="0",mode="idle"}
Complex label matchers (=: equality, !=: non-equality, =~: regex match, !~: negative regex match):
node_cpu_seconds_total{cpu!="0",mode=~"user|system"}
Select data from one day ago and shift it to the current time:
process_resident_memory_bytes offset 1d
Rates of increase for counters
Per-second rate of increase, averaged over last 5 minutes:
rate(demo_api_request_duration_seconds_count[5m])
Per-second rate of increase, calculated over last two samples in a 1-minute time window:
irate(demo_api_request_duration_seconds_count[1m])
Absolute increase over last hour:
increase(demo_api_request_duration_seconds_count[1h])
Aggregating over multiple series
Sum over all series:
sum(node_filesystem_size_bytes)
Preserve the instance and job label dimensions:
sum by(job, instance) (node_filesystem_size_bytes)
Aggregate away the instance and job label dimensions:
sum without(instance, job) (node_filesystem_size_bytes)
Available aggregation operators: sum()
,min()
,max()
,avg()
,stddev()
,stdvar()
,count()
,count_values()
,group()
,bottomk()
,topk()
,quantile()
Math between series
Add all equally-labelled series from both sides:
node_memory_MemFree_bytes + node_memory_Cached_bytes
Add series, matching only on the instance and job labels:
node_memory_MemFree_bytes + on(instance, job) node_memory_Cached_bytes
Add series, ignoring the instance and job labels for matching:
node_memory_MemFree_bytes + ignoring(instance, job) node_memory_Cached_bytes
Explicitly allow many-to-one matching:
rate(demo_cpu_usage_seconds_total[1m]) / on(instance, job) group_left demo_num_cpus
Include the version label from "one" (right) side in the result:
node_filesystem_avail_bytes * on(instance, job) group_left(version) node_exporter_build_info
Available arithmetic operators: +, -, \*, /, %, ^
Filtering series by value
Only keep series with a sample value greater than a given number:
node_filesystem_avail_bytes > 10*1024*1024
Only keep series from the left-hand side whose sample values are larger than their right-hand-side matches:
go_goroutines > go_threads
Instead of filtering, return 0 or 1 for each compared series:
go_goroutines > bool go_threads
Match only on specific labels:
go_goroutines > bool on(job, instance) go_threads
Available comparison operators: ==, !=, >, <, >=, <=
Set operations
Include any label sets that are either on the left or right side:
up{job="prometheus"} or up{job="node"}
Include any label sets that are present both on the left and right side:
node_network_mtu_bytes and (node_network_address_assign_type == 0)
Include any label sets from the left side that are not present in the right side:
node_network_mtu_bytes unless (node_network_address_assign_type == 1)
Match only on specific labels:
node_network_mtu_bytes and on(device) (node_network_address_assign_type == 0)
Quantiles from histograms
90th percentile request latency over last 5 minutes, for every label dimension:
histogram_quantile(0.9, rate(demo_api_request_duration_seconds_bucket[5m]))
...for only the path and method dimensions:
histogram_quantile(
0.9,
sum by(le, path, method) (
rate(demo_api_request_duration_seconds_bucket[5m])
)
)
Changes in gauges
Per-second derivative using linear regression:
deriv(demo_disk_usage_bytes[1h])
Absolute change in value over last hour:
delta(demo_disk_usage_bytes[1h])
Predict value in 1 hour, based on last 4 hours:
predict_linear(demo_disk_usage_bytes[4h], 3600)
Aggregating over time
Average within each series over a 5-minute period:
avg_over_time(go_goroutines[5m])
Get the maximum for each series over a one-day period:
max_over_time(process_resident_memory_bytes[1d])
Count the number of samples for each series over a 5-minute period:
count_over_time(process_resident_memory_bytes[5m])
See all available xxx_over_time() aggregation functions.
Time
Get the Unix time in seconds at each resolution step:
time()
Get the age of the last successful batch job run:
time() - demo_batch_last_success_timestamp_seconds
Find batch jobs which haven't succeeded in an hour:
time() - demo_batch_last_success_timestamp_seconds > 3600
Dealing with missing data
Create one output series when the input vector is empty:
absent(up{job="some-job"})
Create one output series when the input range vector is empty for 5 minutes:
absent_over_time(up{job="some-job"}[5m])
Manipulating labels
Join the values of two labels with a - separator into a new endpoint label:
label_join(rate(demo_api_request_duration_seconds_count[5m]), "endpoint", " ", "method", "path")
Extract part of a label and store it in a new label:
label_replace(up, "hostname", "$1", "instance", "(.+):(\\d+)")
Subqueries
Calculate the 5-minute-averaged rate over a 1-hour period, at the default subquery resolution (= global rule evaluation interval):
rate(demo_api_request_duration_seconds_count[5m])[1h:]
Calculate the 5-minute-averaged rate over a 1-hour period, at a 15-second subquery resolution:
rate(demo_api_request_duration_seconds_count[5m])[1h:15s]
Using the subquery result to get the maximum rate over a 1-hour period:
max_over_time(
rate(
demo_api_request_duration_seconds_count[5m]
)[1h:]
)