TOC

Tornado: max_clients limit reached, request queued

看到日志里面有很多 tornado 的 max_clients limit reached, request queued. 日志。

2021-02-01 06:05:50,581 DEBUG [xTi35W6z] [tornado.general:fetch_impl#187] max_clients limit reached, request queued. 10 active, 9 queued requests.
2021-02-01 06:05:50,591 DEBUG [TID7iqKm] [tornado.general:fetch_impl#187] max_clients limit reached, request queued. 10 active, 10 queued requests.
2021-02-01 06:05:50,602 DEBUG [7r968v4s] [tornado.general:fetch_impl#187] max_clients limit reached, request queued. 10 active, 9 queued requests.
2021-02-01 06:05:50,613 DEBUG [4uXPmkjU] [tornado.general:fetch_impl#187] max_clients limit reached, request queued. 10 active, 6 queued requests.
2021-02-01 06:05:50,625 DEBUG [25JJR0KO] [tornado.general:fetch_impl#187] max_clients limit reached, request queued. 10 active, 4 queued requests.
2021-02-01 06:36:12,980 DEBUG [Y3Zsq1Hj] [tornado.general:fetch_impl#187] max_clients limit reached, request queued. 10 active, 1 queued requests.
2021-02-01 06:36:12,991 DEBUG [c3yC93Dz] [tornado.general:fetch_impl#187] max_clients limit reached, request queued. 10 active, 2 queued requests.
2021-02-01 06:36:13,001 DEBUG [dIh5ZfgW] [tornado.general:fetch_impl#187] max_clients limit reached, request queued. 10 active, 2 queued requests.
2021-02-01 08:47:37,389 DEBUG [8Yhg7dxW] [tornado.general:fetch_impl#187] max_clients limit reached, request queued. 10 active, 1 queued requests.
2021-02-01 08:47:37,400 DEBUG [VmJxW8C1] [tornado.general:fetch_impl#187] max_clients limit reached, request queued. 10 active, 2 queued requests.
2021-02-01 08:47:37,410 DEBUG [FhZNlvhE] [tornado.general:fetch_impl#187] max_clients limit reached, request queued. 10 active, 3 queued requests.
2021-02-01 08:47:37,421 DEBUG [hNlyHc7U] [tornado.general:fetch_impl#187] max_clients limit reached, request queued. 10 active, 4 queued requests.
2021-02-01 08:47:37,431 DEBUG [LERkqdWU] [tornado.general:fetch_impl#187] max_clients limit reached, request queued. 10 active, 5 queued requests.
2021-02-01 08:47:37,442 DEBUG [oxIJagaI] [tornado.general:fetch_impl#187] max_clients limit reached, request queued. 10 active, 6 queued requests.

查了一下源代码。

我们都是使用 AsyncHTTPClient 进行 HTTP 操作,通过 Configurable 机制,其实我们用的是其 SimpleAsyncHTTPClient(这个机制很简单,但是很有意思,通过它可以更好的理解 Python OOP,以后再单独写一篇整理一下)。

下面是 SimpleAsyncHTTPClient 的部分代码:

    def fetch_impl(
        self, request: HTTPRequest, callback: Callable[[HTTPResponse], None]
    ) -> None:
        key = object()
        self.queue.append((key, request, callback))
        assert request.connect_timeout is not None
        assert request.request_timeout is not None
        timeout_handle = None
        if len(self.active) >= self.max_clients:
            timeout = (
                min(request.connect_timeout, request.request_timeout)
                or request.connect_timeout
                or request.request_timeout
            )  # min but skip zero
            if timeout:
                timeout_handle = self.io_loop.add_timeout(
                    self.io_loop.time() + timeout,
                    functools.partial(self._on_timeout, key, "in request queue"),
                )
        self.waiting[key] = (request, callback, timeout_handle)
        self._process_queue()
        if self.queue:
            gen_log.debug(
                "max_clients limit reached, request queued. "
                "%d active, %d queued requests." % (len(self.active), len(self.queue))
            )

    def _process_queue(self) -> None:
        while self.queue and len(self.active) < self.max_clients:
            key, request, callback = self.queue.popleft()
            if key not in self.waiting:
                continue
            self._remove_timeout(key)
            self.active[key] = (request, callback)
            release_callback = functools.partial(self._release_fetch, key)
            self._handle_request(request, release_callback, callback)

简单的来说, AsyncHTTPClient 通过 max_clients 来配置最大并发请求数,默认是 10,然后其维持了一个队列,请求来了先进去,如果最大请求已经达到,还有请求在队列等待,就会答应之前说的那一行 DEBUG 日志。

DEBUG 日志是没有多大关系,但是我想的是,如果我们不理会 AsyncHTTPClient 内部的队列,按照固定的频率在产生请求,假设每秒一千个,被调用服务如果卡住一段时间,请求堆积会不会对当前服务造成不利影响(同时可能也意味着本地协程大量受阻)。

被调用服务恢复之后,大量请求的涌入会不会造成被调用方的压力,其对应的大量返回响应会不会又反过来对当前服务造成不利影响?

对这个问题的处理需要结合服务的业务属性和流程进行全盘的考虑,但是今后最好还是考虑一下 AsyncHTTPClient 队列,不能让其成为一个不受控的因素。