#2 Supervisor 最大打开文件数问题

2021-07-13

LOGO

线上服务日志中看到 Too many open files,于是检查了一番。

一、过程

确认:进程最大打开文件数

首先确认了系统没有限制最大打开文件数,但进程的最大打开文件数是限制的:

ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 30007
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 655350
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 327675
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

cat /proc/29749/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            10485760             unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             30007                30007                processes
Max open files            1024                 1024                 files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       30007                30007                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

解决方法

最后网上搜索之后,有人说 supervisor 有限制,我一检查果然是 supervisor 服务器的最大打开文件数问题:

[supervisord]
...
minfds = 65535
...

改成 655350,之后再看 proc 信息,就好了。

PS: 好像 reloadrereadrestart 都没用,需要关闭 supervisor 重启服务才能生效。(有待进一步确认)

二、minfds 配置

官方解释:

The minimum number of file descriptors that must be available before supervisord will start successfully. A call to setrlimit will be made to attempt to raise the soft and hard limits of the supervisord process to satisfy minfds. The hard limit may only be raised if supervisord is run as root. supervisord uses file descriptors liberally, and will enter a failure mode when one cannot be obtained from the OS, so it’s useful to be able to specify a minimum value to ensure it doesn’t run out of them during execution. These limits will be inherited by the managed subprocesses. This option is particularly useful on Solaris, which has a low per-process fd limit by default.
Default: 1024
Required: No.
Introduced: 3.0

这个名字很有迷惑性,怎么看也想不到进程的最大打开文件数上面去,不过官方文档也能自洽:这是保证 supervisor 正常启动的一个最小值,相当于一个检查点。然后它会调用 setrlimit 设置子进程的最大打开文件数。

#1 Python Supervisor 的简单配置

2019-03-11
name =
script-path-and-args =
execute-path =
python-path =
[program:<name>]
; process_name=<name>       ; 默认就是 program 名称

command = python <script-path-and-args>
directory = <execute-path>  ; 执行路径
environment=PYTHONPATH=<python-path>    ; 设置环境变量,逗号隔开
; user=

; killasgroup = false       ; 没用过
stopasgroup = true          ; 杀掉子进程,文档说是包含 killasgroup
; stopsignal=TERM           ; TERM, HUP, INT, QUIT, KILL, USR1, or USR2 中的一个
stopwaitsecs = 15

autostart = true
autorestart = true          ; true, false, unexpected
; exitcodes=0,2             ; 允许的退出码,否则会进入自动重启判断
; startretries = 3          ; 重启次数

numprocs=1
numprocs_start=0

loglevel = debug            ; critical, error, warn, info, debug, trace, blather
redirect_stderr = true
stdout_logfile = /var/log/<name>.log
stderr_logfile = /var/log/<name>.log

; stdout_logfile_maxbytes=1MB
; stdout_logfile_backups=10
; stdout_capture_maxbytes=1MB
; stdout_events_enabled=false

; stderr_logfile_maxbytes=1MB
; stderr_logfile_backups=10
; stderr_capture_maxbytes=1MB
; stderr_events_enabled=false

; 没用过
; umask=022
; priority=999
; serverurl=AUTO

另外,文档中看到一处有意思的用法,配置可以这样通过参数传递给进程:

[program:example]
command=/usr/bin/example --loglevel=%(ENV_LOGLEVEL)s
  1. 简单多进程就让 supervisor 控制
  2. 进程日志由进程自己控制,supervisor 只记录本身的运行与监控日志和进程漏出来的错误日志等。