TOC

HTTP 超时相关的疑问

作者： catroll
日期： 2022-02-25
标签： Linux HTTP

今天发现一个奇怪的现象，相同的代码在 CentOS 7 服务器上发起 HTTP 请求 3 秒之后超时，报 “TimeoutError: [Errno 110] Connection timed out”。

在我本地就按我们的定义的超时时间 5 秒超时，报 “tornado.simple_httpclient.HTTPTimeoutError: Timeout while connecting”。

测试代码

import time
import traceback

from tornado import httpclient

c = httpclient.AsyncHTTPClient()


async def test():
    start_time = time.time()
    print(start_time)
    try:
        response = await c.fetch('https://www.google.com/', connect_timeout=5, request_timeout=5)
        print(response.body)
    except Exception as e:
        print(traceback.format_exc())
    end_time = time.time()
    cost_time = end_time - start_time
    print('cost time: %.2fs' % cost_time)


def main():
    import asyncio
    loop = asyncio.get_event_loop()
    loop.run_until_complete(test())


if __name__ == '__main__':
    main()

分析

看到服务器上的报错 Errno 110 就知道，肯定是系统的限制。应该是我们的系统参数做了什么配置优化。
不过我问了相关技术人员，似乎也没有得到什么有价值的信息。

我自己在 sysctl，/proc/sys 中找了半天也没有发现什么线索。

服务器上 curl 也是 3 秒超时：

$ time curl -vvv --trace - --trace-time http://www.google.com
Warning: --trace overrides an earlier trace/verbose option
14:39:47.125337 == Info: About to connect() to www.google.com port 80 (#0)
14:39:47.125449 == Info:   Trying 65.49.68.152...
14:39:50.132247 == Info: 连接超时
14:39:50.132300 == Info:   Trying 2001::c710:9c47...
14:39:50.132351 == Info: Failed to connect to 2001::c710:9c47: 网络不可达
14:39:50.132374 == Info: Failed connect to www.google.com:80; 网络不可达
14:39:50.132387 == Info: Closing connection 0
curl: (7) Failed to connect to 2001::c710:9c47: 网络不可达

real    0m3.018s
user    0m0.007s
sys     0m0.001s

在本地 curl 测试几次都是两分十秒左右（130 秒）超时：

$ time curl -vvv --trace - --trace-time http://google.com/
Warning: --trace overrides an earlier trace/verbose option
15:08:38.554031 == Info:   Trying 142.251.42.238:80...
15:10:48.526457 == Info: connect to 142.251.42.238 port 80 failed: 连接超时
15:10:48.526564 == Info: Failed to connect to google.com port 80: 连接超时
15:10:48.526620 == Info: Closing connection 0
curl: (28) Failed to connect to google.com port 80: 连接超时
curl -vvv --trace - --trace-time http://google.com/  0.00s user 0.02s system 0% cpu 2:10.14 total

查询 Linux 网络调优相关资料看到 tcp_syn_retries 相关信息, 和我看到的现象非常吻合。

然后，我又调整本地的 tcp_syn_retries 为 1，根据定义，超时时间应该变成 $2^0 + 2^1 = 3$ 秒。
测试和预期完全符合！

我调整本地的 tcp_syn_retries 为 2，根据定义，超时时间应该变成 $2^0 + 2^1 + 2^2 = 7$ 秒。
测试和预期完全符合！

破案了。

`tcp_syn_retries` 参数

TCP 连接发起之后，如果没有收到对方的握手包 (SYN + ACK), 会自动重试。
tcp_syn_retries 就是重试次数。

每次连接的超时时间是 2 的 n 次幂：

初始连接后等待 $2^0$ 秒超时，
第一次重试之后等待 $2^1$ 秒超时，
第二次重试之后等待 $2^2$ 秒超时，
第三次重试之后等待 $2^3$ 秒超时，
...

Linux 默认重试次数为 6，也就是应该总共等待 127 秒。
可能由于系统调度等因素，重试次数越大误差也就越大，等到第六次重试之后便大了三四秒。

总结

其实相关参数我之前学习 Linux 参数优化的时候肯定是有看过的，不过没有遇到实际的问题总是不能真的理解。
正所谓：纸上得来终觉浅，绝知此事要躬行。

参考资料与拓展阅读

微信公众号, 搬运工来架构, 记一次网络请求连接超时的事故

发布于码厩技术博客的所有文章，除注明转载外，均为作者原创，欢迎转载，但必须注明出处。
尊重他人劳动，共创开源社区！转载请注明以下信息：
转载来源：码厩技术博客 [https://www.markjour.com]
原文标题：HTTP 超时相关的疑问
原文地址：/article/20220225-http-timeout.html

如果你有魔法，你可以看到一个评论框～

一	二	三	四	五	六	日

HTTP 超时相关的疑问

测试代码

分析

tcp_syn_retries 参数

总结

参考资料与拓展阅读

`tcp_syn_retries` 参数