Python 与 HTTP/2

HTTP/2 都出来三年半了,还没有一个好用的 Python 客户端库,真是的。

注:HTTP/3 都快出来了,而且改走 UDP...
相关新闻:下一代 HTTP/3 协议将弃用 TCP 改用 QUIC

检查了一下 requests 的一些信息,找到下面三个 issue:

issue#4156 http2 protocol support?
Some servers are moving http2 for better performance. What would it take to have that in the requests library?
Opened by kgk on 14 Jun 2017, 3 comments
2017/06/14 kennethreitz: In the works!
哎,距今又有一年半了,还是没有动静。

issue#2452 Add HTTP2 support
The draft specification of HTTP 2.0 is out. As a long-term goal, there should be a way to make requests compatible with it.
Opened by AkshatM on 20 Feb 2015, 1 comment

issue#2082 Add HTTP 2.0 support

requests 应该是使用最多的第三方 HTTP 库吧,可能大神私事繁忙,没有足够精力吧。
当然,也有客观原因:
一方面,由于 HTTP/2 的新特性,可能需要一些异步的支持,对之前的同步架构来说,会有很大的调整。这些变动以及可能产生的不兼容都需要反复掂量(与等待,等待技术换代的一个契机)。
另一方面,相比在 requests 这一层来做变更,更好的方案是和 urllib3 协同更新(urllib3 还没有加入 HTTP/2 支持)。

HTTP/2

  • https://developers.google.com/web/fundamentals/performance/http2/?hl=zh-cn
  • https://juejin.im/post/5aaf51daf265da239376dfde

变化:

  1. 二进制,不再明文传输了
  2. 帧(二进制分帧层)
  3. TSL,协议无要求,但已是事实标准了
  4. 数据流优先级
  5. 连接复用
  6. Header 压缩
  7. 服务器推送
curl -vsS 'https://cn.bing.com/' -H 'authority: cn.bing.com' -H 'upgrade-insecure-requests: 1' -H 'user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36' -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8' -H 'accept-encoding: gzip, deflate' -H 'accept-language: zh-CN,zh;q=0.9' --compressed -o /tmp/cn.bing.com.html
*   Trying 202.89.233.100...
* TCP_NODELAY set
* Connected to cn.bing.com (202.89.233.100) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
} [5 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
} [209 bytes data]
* TLSv1.2 (IN), TLS handshake, Server hello (2):
{ [94 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [4555 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [300 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [37 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=www.bing.com
*  start date: Jul 20 17:47:08 2017 GMT
*  expire date: Jul 10 17:47:08 2019 GMT
*  subjectAltName: host "cn.bing.com" matched cert's "*.bing.com"
*  issuer: C=US; ST=Washington; L=Redmond; O=Microsoft Corporation; OU=Microsoft IT; CN=Microsoft IT TLS CA 5
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
} [5 bytes data]
* Using Stream ID: 1 (easy handle 0x556d97fb88e0)
} [5 bytes data]
> GET / HTTP/2
> Host: cn.bing.com
> authority: cn.bing.com
> upgrade-insecure-requests: 1
> user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36
> accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
> accept-encoding: gzip, deflate
> accept-language: zh-CN,zh;q=0.9
> 
{ [5 bytes data]
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
} [5 bytes data]
< HTTP/2 200 
< cache-control: private, max-age=0
< content-type: text/html; charset=utf-8
< content-encoding: gzip
< vary: Accept-Encoding
< p3p: CP="NON UNI COM NAV STA LOC CURa DEVa PSAa PSDa OUR IND"
< set-cookie: SRCHD=AF=NOFORM; domain=.bing.com; expires=Sun, 13-Dec-2020 03:22:48 GMT; path=/
< set-cookie: SRCHUID=V=2&GUID=3524B9B660A44C16ACC68814CD08E5B7&dmnchg=1; domain=.bing.com; expires=Sun, 13-Dec-2020 03:22:48 GMT; path=/
< set-cookie: SRCHUSR=DOB=20181213; domain=.bing.com; expires=Sun, 13-Dec-2020 03:22:48 GMT; path=/
< set-cookie: _SS=SID=2934048BFC616C332883084FFD4F6D93; domain=.bing.com; path=/
< strict-transport-security: max-age=31536000; includeSubDomains; preload
< report-to: {"group":"network-errors","max_age":604800,"endpoints":[{"url":"https://nel.bingparachute.com/api/report?cat=binghp"}]}
< nel: {"report_to":"network-errors","max_age":604800,"success_fraction":0.01,"failure_fraction":1.0}
< x-msedge-ref: Ref A: 7C446E8F1EF94EF48520FC233A975DFD Ref B: BJ1EDGE0310 Ref C: 2018-12-13T03:22:48Z
< set-cookie: _EDGE_S=F=1&SID=2934048BFC616C332883084FFD4F6D93; path=/; httponly; domain=bing.com
< set-cookie: _EDGE_V=1; path=/; httponly; expires=Tue, 07-Jan-2020 03:22:48 GMT; domain=bing.com
< set-cookie: MUID=2C049E938AEF6583179F92578BC1641B; path=/; expires=Tue, 07-Jan-2020 03:22:48 GMT; domain=bing.com
< set-cookie: MUIDB=2C049E938AEF6583179F92578BC1641B; path=/; httponly; expires=Tue, 07-Jan-2020 03:22:48 GMT
< date: Thu, 13 Dec 2018 03:22:47 GMT
< 
{ [2909 bytes data]
* Connection #0 to host cn.bing.com left intact

知识点:

  1. accept-encoding 中去掉 br 这个选项,否则:
    curl: (61) Unrecognized content encoding type. libcurl understands deflate, gzip content encodings.
  2. HTTP 压缩(accept-encoding) 参考:维基百科
    compress: 来自 Unix,被淘汰了
    gzip: GNU zip format, rfc1952
    deflate: 同时使用了LZ77算法与哈夫曼编码(Huffman Coding)的一个无损数据压缩算法,rfc1951
    br: Brotli, HTTP/2 引入
  3. 最下面这个 2909 bytes data,数字时大时小,不一致。300 ~ 3000,原因不明。
    难道有缓存?

python-hyper(h2)

网上一搜,看到一个 hyper 好像比较有名气。

hyper:客户端,不知道是不是废弃了,16 年之后就没有更新。
代码几乎不能用,需要重新梳理。

  • 文档:https://hyper.readthedocs.io/en/latest/
  • 代码:https://github.com/Lukasa/hyper

h2 及其它仓库:HTTP/2 协议相关底层库,积极开发中...

  • 文档:https://python-hyper.org
  • 代码:https://github.com/python-hyper

其他

Tornado HTTP/2 Client

  • https://github.com/vladmunteanu/th2c 基于上面那个 h2
  • https://github.com/bdarnell/tornado_http2 基于 pycurl