Prometheus
2023-10-29
Selecting series
Select latest sample for series with a given metric name:
node_cpu_seconds_total
Select 5-minute range of samples for series with a given metric name:
node_cpu_seconds_total[5m]
Only series with given label values:
node_cpu_seconds_total{cpu="0",mode="idle"}
Complex label matchers (=: equality, !=: non-equality, =~: regex match, !~: negative regex match):
node_cpu_seconds_total{cpu!="0",mode=~"user|system"}
Select data from one day ago and shift it to the current time:
process_resident_memory_bytes offset 1d
Rates of increase for counters
Per-second rate of increase, averaged over last 5 minutes:
rate(demo_api_request_duration_seconds_count[5m])
Per-second rate of increase, calculated over last two samples in a 1-minute time window:
irate(demo_api_request_duration_seconds_count[1m])
Absolute increase over last hour:
increase(demo_api_request_duration_seconds_count[1h])
Aggregating over multiple series
Sum over all series:
sum(node_filesystem_size_bytes)
Preserve the instance and job label dimensions:
sum by(job, instance) (node_filesystem_size_bytes)
Aggregate away the instance and job label dimensions:
sum without(instance, job) (node_filesystem_size_bytes)
Available aggregation operators: sum(),min(),max(),avg(),stddev(),stdvar(),count(),count_values(),group(),bottomk(),topk(),quantile()
Math between series
Add all equally-labelled series from both sides:
node_memory_MemFree_bytes + node_memory_Cached_bytes
Add series, matching only on the instance and job labels:
node_memory_MemFree_bytes + on(instance, job) node_memory_Cached_bytes
Add series, ignoring the instance and job labels for matching:
node_memory_MemFree_bytes + ignoring(instance, job) node_memory_Cached_bytes
Explicitly allow many-to-one matching:
rate(demo_cpu_usage_seconds_total[1m]) / on(instance, job) group_left demo_num_cpus
Include the version label from "one" (right) side in the result:
node_filesystem_avail_bytes * on(instance, job) group_left(version) node_exporter_build_info
Available arithmetic operators: +, -, \*, /, %, ^
Filtering series by value
Only keep series with a sample value greater than a given number:
node_filesystem_avail_bytes > 10*1024*1024
Only keep series from the left-hand side whose sample values are larger than their right-hand-side matches:
go_goroutines > go_threads
Instead of filtering, return 0 or 1 for each compared series:
go_goroutines > bool go_threads
Match only on specific labels:
go_goroutines > bool on(job, instance) go_threads
Available comparison operators: ==, !=, >, <, >=, <=
Set operations
Include any label sets that are either on the left or right side:
up{job="prometheus"} or up{job="node"}
Include any label sets that are present both on the left and right side:
node_network_mtu_bytes and (node_network_address_assign_type == 0)
Include any label sets from the left side that are not present in the right side:
node_network_mtu_bytes unless (node_network_address_assign_type == 1)
Match only on specific labels:
node_network_mtu_bytes and on(device) (node_network_address_assign_type == 0)
Quantiles from histograms
90th percentile request latency over last 5 minutes, for every label dimension:
histogram_quantile(0.9, rate(demo_api_request_duration_seconds_bucket[5m]))
...for only the path and method dimensions:
histogram_quantile(
0.9,
sum by(le, path, method) (
rate(demo_api_request_duration_seconds_bucket[5m])
)
)
Changes in gauges
Per-second derivative using linear regression:
deriv(demo_disk_usage_bytes[1h])
Absolute change in value over last hour:
delta(demo_disk_usage_bytes[1h])
Predict value in 1 hour, based on last 4 hours:
predict_linear(demo_disk_usage_bytes[4h], 3600)
Aggregating over time
Average within each series over a 5-minute period:
avg_over_time(go_goroutines[5m])
Get the maximum for each series over a one-day period:
max_over_time(process_resident_memory_bytes[1d])
Count the number of samples for each series over a 5-minute period:
count_over_time(process_resident_memory_bytes[5m])
See all available xxx_over_time() aggregation functions.
Time
Get the Unix time in seconds at each resolution step:
time()
Get the age of the last successful batch job run:
time() - demo_batch_last_success_timestamp_seconds
Find batch jobs which haven't succeeded in an hour:
time() - demo_batch_last_success_timestamp_seconds > 3600
Dealing with missing data
Create one output series when the input vector is empty:
absent(up{job="some-job"})
Create one output series when the input range vector is empty for 5 minutes:
absent_over_time(up{job="some-job"}[5m])
Manipulating labels
Join the values of two labels with a - separator into a new endpoint label:
label_join(rate(demo_api_request_duration_seconds_count[5m]), "endpoint", " ", "method", "path")
Extract part of a label and store it in a new label:
label_replace(up, "hostname", "$1", "instance", "(.+):(\\d+)")
Subqueries
Calculate the 5-minute-averaged rate over a 1-hour period, at the default subquery resolution (= global rule evaluation interval):
rate(demo_api_request_duration_seconds_count[5m])[1h:]
Calculate the 5-minute-averaged rate over a 1-hour period, at a 15-second subquery resolution:
rate(demo_api_request_duration_seconds_count[5m])[1h:15s]
Using the subquery result to get the maximum rate over a 1-hour period:
max_over_time(
rate(
demo_api_request_duration_seconds_count[5m]
)[1h:]
)
Python Subprocess
2023-09-27
程序中有些时候无法避免需要执行系统命令完成一些任务。
比如,我们系统有一个内部的小功能以来 rsync 来做文件同步。
最近发现如果数据量稍微大一点,rsync 就会卡住,不知道为什么。
经过排查之后,发现是 Popen 模块的使用错误导致。
准备环境
mkdir /tmp/aa /tmp/bb
# Create 10K ~ 1000k Files
for i in {00001..10000}; do
file_size=$((1 + $RANDOM % 100))
dd if=/dev/urandom of=/tmp/aa/file$i.txt bs=10K count=$file_size
done
# du -sh /tmp/aa/
# 4.9G /tmp/aa/
代码
from subprocess import PIPE, STDOUT, Popen
src_dir = '/tmp/aa/'
tgt_dir = '/tmp/bb/'
# --remove-source-files
command = 'rsync -av %s %s' % (src_dir, tgt_dir)
p = Popen(command, stdin=PIPE, stdout=PIPE, stderr=STDOUT, shell=True)
p.wait()
if p.returncode == 0:
LOG.info('rsync success')
else:
LOG.warning('rsync error %d', p.returncode)
数据传输卡在 file0670.txt 了,总传输数据 2.3G。
排查
经过排查之后,确认是我们的编码问题。
代码捕获了标准输出和标准错误,但是我们没有去读这个数据,最后把管道缓冲区占满了,程序就无法继续运行。
Popen 初始化有一个参数 pipesize,如果设置了,则会调用 fcntl.fcntl(p2cwrite, fcntl.F_SETPIPE_SZ, self.pipesize) 设置缓冲区大小。
在 man fcntl 中了解到:
Changing the capacity of a pipe
F_SETPIPE_SZ (int; since Linux 2.6.35)
Change the capacity of the pipe referred to by fd to be at least arg bytes. An unprivileged process can adjust the pipe capacity to any value between the system page size and the limit defined in /proc/sys/fs/pipe-max-size (see proc(5)). Attempts to set the pipe capacity below the page size are silently rounded up to the page size. Attempts by an unprivileged process to set the pipe capacity above the limit in /proc/sys/fs/pipe-max-size yield the error EPERM; a privileged process (CAP_SYS_RESOURCE) can override the limit.
When allocating the buffer for the pipe, the kernel may use a capacity larger than arg, if that is convenient for the implementation. (In the current implementation, the allocation is the next higher power-of-two page-size multiple of the requested size.) The actual capacity (in bytes) that is set is returned as the function result.
Attempting to set the pipe capacity smaller than the amount of buffer space currently used to store data produces the error EBUSY.
Note that because of the way the pages of the pipe buffer are employed when data is written to the pipe, the number of bytes that can be written may be less than the nominal size, depending on the size of the writes.
F_GETPIPE_SZ (void; since Linux 2.6.35)
Return (as the function result) the capacity of the pipe referred to by fd.
又在 man 7 pipe | grep size -C10 中了解到:
Pipe capacity
A pipe has a limited capacity. If the pipe is full, then a write(2) will block or fail, depending on whether the O_NONBLOCK flag is set (see below). Different implementations have different limits for the pipe capacity.
Applications should not rely on a particular capacity: an application should be designed so that a reading process consumes data as soon as it is available, so that a writing process does not remain blocked.
In Linux versions before 2.6.11, the capacity of a pipe was the same as the system page size (e.g., 4096 bytes on i386). Since Linux 2.6.11, the pipe capacity is 16 pages (i.e., 65,536 bytes in a system with a page size of 4096 bytes). Since Linux 2.6.35, the default pipe capacity is 16 pages, but the capacity can be queried and set using the fcntl(2) F_GETPIPE_SZ and F_SETPIPE_SZ operations. See fcntl(2) for more information.
The following ioctl(2) operation, which can be applied to a file descriptor that refers to either end of a pipe, places a count of the number of unread bytes in the pipe in the int buffer pointed to by the final argument of the call:
ioctl(fd, FIONREAD, &nbytes);
The FIONREAD operation is not specified in any standard, but is provided on many implementations.
也就是说:
- 非特权进程可以调整管道缓冲区大小,范围是:页大小到
/proc/sys/fs/pipe-max-size
- 低于页大小,这会被当作是页大小
- 超过 pipe-max-size 则会报错 EPERM
- 特权进程不受限制
- 管道缓冲区大小
- 2.6.11 以前,系统页大小
- 2.6.11 之后,系统页大小 x 16
- 2.6.35 之后,可以通过 fcntl 来手动调整
- 如果管道缓冲区满了,则会写阻塞,除非程序设置了非阻塞运行模式(
O_NONBLOCK)
查看当前系统的页大小
getconf PAGE_SIZE
4096
getconf PAGESIZE
4096
验证
系统的缓冲区大小应该是 16 x 4K = 64K
import subprocess
cmd = ['python', '-c', 'print("a" * 1024 * 64, end=".")']
# cmd = ['python', '-c', 'import time; time.sleep(10);']
print(1)
p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
print(2)
# stdout, stderr = p.communicate()
# print(repr([stdout, stderr]))
print(3)
p.wait()
子进程执行 python 命令,输出 64KB 不会卡住,增加 1B 就会卡在 wait 那个地方。
解除 communicate 那一行的注释,程序能正常运行。
调整
子程序使用系统的 stdin/stdout/stderr
Popen(command, shell=True)
重定向到 DEVNULL
Popen(command, stdout=subprocess.DEVNULL, shell=True)
# python2 不支持 DEVNULL
devnull = os.open(os.devnull, os.O_RDWR)
Popen(command, stdout=devnull, shell=True)
devnull.close()
读子程序的 stdout
p = Popen(command, stdout=PIPE, stderr=STDOUT, shell=True)
# 程序阻塞着,不停从子进程标准输出读数据
p.communicate()
使用 run
result = subprocess.run(command, shell=True)
print(result.returncode)
print(result.stdout)
print(result.stderr)