转载:Python 异步编程与数据库

标题:Asynchronous Python and Databases
地址https://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/
时间:February 15, 2015 at 12:10 PM

这是大神 zzzeek 2015 年发表的一篇文章,详细介绍了关于 SQLAlchemy 与异步编程的一些事情。解答了我关于如何实现异步编程的一些疑惑。

我曾反复阅读这篇文章,以求能够更加深刻地领会到大佬阐述的意思。我认为每个 Python 的使用者都应该阅读一下。

  1. asyncio 也好,gevent/eventlet 也好,适用于低速网络的高并发
    例如:PyMySQL + Eventlet,单线程,同时接入大量请求,发出非阻塞 IO,通过用户级的事件循环来处理
  2. 事件循环 + 非阻塞 IO 可以有效降低延迟,提高吞吐量,
    但是如果是高速网络,需要提升吞吐量的更好办法是线程池 + 阻塞 IO
  3. GIL 的影响对于纯 IO 类型可以忽略
  4. Python 数据库驱动的一次操作所消耗的时间中网络部分占比很小,可以忽略

注意:目前正在开发中的 SQLAlchemy 即将提供 asyncio 支持:
1. https://docs.sqlalchemy.org/en/14/changelog/migration_14.html#change-3414
1. https://docs.sqlalchemy.org/en/14/orm/extensions/asyncio.html

这篇文章得到了 A. Jesse Jiryu Davis(Twitter 上说是 MongoDB 的 Staff Engineer)的回应:https://emptysqua.re/blog/response-to-asynchronous-python-and-databases/
OSChina 翻译,回应“异步 Python 与数据库”
并且基于这一次讨论,Motor 0.7 就放弃了 greenlet 改用了线程池来实现异步(参考:motor Changelog)。

Asynchronous Python and Databases (异步的 Python 与数据库)

2015-02-15 12:10

The asynchronous programming topic is difficult to cover. These days, it's not just about one thing, and I'm mostly an outsider to it. However, because I deal a lot with relational databases and the Python stack's interaction with them, I have to field a lot of questions and issues regarding asynchronous IO and database programming, both specific to SQLAlchemy as well as towards Openstack.
对于一个 ORM 来说,异步这件事说起来简单,做起来复杂,设计不同数据库底层和 Python 之间的交互。

As I don't have a simple opinion on the matter, I'll try to give a spoiler for the rest of the blog post here. I think that the Python asyncio library is very neat, promising, and fun to use, and organized well enough that it's clear that some level of SQLAlchemy compatibility is feasible, most likely including most parts of the ORM. As asyncio is now a standard part of Python, this compatiblity layer is something I am interested in producing at some point.
Python 标准库中的 asyncio 很棒,用来实现 ORM 的异步是可行的。

All of that said, I still think that asynchronous programming is just one potential approach to have on the shelf, and is by no means the one we should be using all the time or even most of the time, unless we are writing HTTP or chat servers or other applications that specifically need to concurrently maintain large numbers of arbitrarily slow or idle TCP connections (where by "arbitrarily" we mean, we don't care if individual connections are slow, fast, or idle, throughput can be maintained regardless). For standard business-style, CRUD-oriented database code, the approach given by asyncio is never necessary, will almost certainly hinder performance, and arguments as to its promotion of "correctness" are very questionable in terms of relational database programming. Applications that need to do non-blocking IO on the front end should leave the business-level CRUD code behind the thread pool.
不过,我认为异步编程应该看场合,除了 HTTP 或聊天服务这种需要并发维护大量 TCP 连接的程序,我们不应该随意使用异步。
对于常规的增删改查,asyncio 完全是不需要的,肯定会降低服务性能。
并且,异步操作对于 RDB 编程来说能否保障正确性,我是持怀疑态度。

With my assumedly entirely unsurprising viewpoint revealed, let's get underway!

What is Asynchronous IO? (异步 IO 是什么?)

Asynchronous IO is an approach used to achieve concurrency by allowing processing to continue while responses from IO operations are still being waited upon. To achieve this, IO function calls are made to be non blocking, so that they return immediately, before the actual IO operation is complete or has even begun. A typically OS-dependent polling system (such as epoll) is used within a loop in order to query a set of file descriptors in search of the next one which has data available; when located, it is acted upon, and when the operation is complete, control goes back to the polling loop in order to act upon the next descriptor with data available.
异步 IO 是一种实现并发的方式,IO 操作被等待时,程序可以继续进行。
实现方式就是采用非阻塞 IO,这些函数被调用之后,不等待具体结果,就会立即返回。
最常见的就是借用操作系统的轮询系统,如 epoll,查询有更新的连接,然后继续进行处理。

Non-blocking IO in its classical use case is for those cases where it's not efficient to dedicate a thread of execution towards waiting for a socket to have results. It's an essential technique for when you need to listen to lots of TCP sockets that are arbitrarily "sleepy" or slow - the best example is a chat server, or some similar kind of messaging system, where you have lots of connections connected persistently, only sending data very occasionally; e.g. when a connection actually sends data, we consider it to be an "event" to be responded to.
如果我们需要等待套接字产生结果,但是这些连接的反馈没有那么快,非阻塞 IO 就很有用。

In recent years, the asynchronous IO approach has also been successfully applied to HTTP related servers and applications. The theory of operation is that a very large number of HTTP connections can be efficiently serviced without the need for the server to dedicate threads to wait on each connection individually; in particular, slow HTTP clients need not get in the way of the server being able to serve lots of other clients at the same time. Combine this with the renewed popularity of so-called long polling approaches, and non-blocking web servers like nginx have proven to work very well.
最近几年,异步 IO 在 HTTP 相关领域非常成功,帮助很多服务同时处理大量连接,而不是像以往的方案一样,起线程来处理。异步 IO 的方案能支持更多连接,使用更少资源,得到了很多知名服务(Nginx 等)的背书。

Asynchronous IO and Scripting (异步 IO 与脚本语言)

重点来了,这里强调 Python 是一门脚本语言,效率不行。

Asynchronous IO programming in scripting languages is heavily centered on the notion of an event loop, which in its most classic form uses callback functions that receive a call once their corresponding IO request has data available. A critical aspect of this type of programming is that, since the event loop has the effect of providing scheduling for a series of functions waiting for IO, a scripting language in particular can replace the need for threads and OS-level scheduling entirely, at least within a single CPU. It can in fact be a little bit awkward to integrate multithreaded, blocking IO code with code that uses non-blocking IO, as they necessarily use different programming approaches when IO-oriented methods are invoked.
在 Python 这样的脚本语言中,异步编程主要实现是通过一个事件循环做任务调度,结合很多回调方法,等待连接有变化。
当你采用了这种方式编程之后,你会发现,糅合多线程、阻塞 IO与非阻塞 IO,会非常麻烦。

The relationship of asynchronous IO to event loops, combined with its growing popularity for use in web-server oriented applications as well as its ability to provide concurrency in an intuitive and obvious way, found itself hitting a perfect storm of factors for it to become popular on one platform in particular, Javascript. Javascript was designed to be a client side scripting language for browsers. Browsers, like any other GUI app, are essentially event machines; all they do is respond to user-initiated events of button pushes, key presses, and mouse moves. As a result, Javascript has a very strong concept of an event loop with callbacks and, until recently, no concept at all of multithreaded programming.
上面所说的方案在 JS 世界非常流行。JS 本身是一种为浏览器设计的 GUI 语言,其本身就是面向事件的,事件循环 + 回调函数。至今没有多线程的概念。

As an army of front-end developers from the 90's through the 2000's mastered the use of these client-side callbacks, and began to use them not just for user-initiated events but for network-initiated events via AJAX connections, the stage was set for a new player to come along, which would transport the ever growing community of Javascript programmers to a new place...
后来前端把这种编程历年带到了后端。

The Server (服务器)

Node.js is not the first attempt to make Javascript a server side language. However, a key reason for its success was that there were plenty of sophisticated and experienced Javascript programmers around by the time it was released, and that it also fully embraces the event-driven programming paradigm that client-side Javascript programmers are already well-versed in and comfortable with.
Node.js 的成功就是受益于有了大量熟悉事件驱动这种编程范式的程序员。

In order to sell this, it followed that the "non-blocking IO" approach needed to be established as appropriate not just for the classic case of "tending to lots of usually asleep or arbitrarily slow connections", but as the de facto style in which all web-oriented software should be written. This meant that any network IO of any kind now had to be interacted with in a non-blocking fashion, and this of course includes database connections - connections which are normally relatively few per process, with numbers of 10-50 being common, are usually pooled so that the latency associated with TCP startup is not much of an issue, and for which the response times for a well-architected database, naturally served over the local network behind the firewall and often clustered, are extremely fast and predictable - in every way, the exact opposite of the use case for which non-blocking IO was first intended. The Postgresql database supports an asynchronous command API in libpq, stating a primary rationale for it as - surprise! using it in GUI applications.
在 Node.js 代码中,一切逻辑都采用了异步的编码风格,所有的 HTTP 连接,还有数据库连接。
其中数据库采用连接池的设计,一般一个进程 10 - 50 个连接。
设计良好的数据库通常都是放在一个防火墙之后的本地服务器上,通常有集群,访问速度非常快。
执行非常快的网络操作可不是采用非阻塞 IO 的初衷。

node.js already benefits from an extremely performant JIT-enabled engine, so it's likely that despite this repurposing of non-blocking IO for a case in which it was not intended, scheduling among database connections using non-blocking IO works acceptably well. (authors note: the comment here regarding libuv's thread pool is removed, as this only regards file IO.)
受益于一个高性能的 JIT 引擎,虽然 Node.js 使用了不需要的非阻塞 IO,但效率还是可以让人接受。

The Spectre of Threads (关于多线程编程)

Well before node.js was turning masses of client-side Javascript developers into async-only server side programmers, the multithreaded programming model had begun to make academic theorists complain that they produce non-deterministic programs, and asynchronous programming, having the side effect that the event-driven paradigm effectively provides an alternative model of programming concurrency (at least for any program with a sufficient proportion of IO to keep context switches high enough), quickly became one of several hammers used to beat multithreaded programming over the head, centered on the two critiques that threads are expensive to create and maintain in an application, being inappropriate for applications that wish to tend to hundreds or thousands of connections simultaneously, and secondly that multithreaded programming is difficult and non-deterministic. In the Python world, continued confusion over what the GIL does and does not do provided for a natural tilling of land fertile for the async model to take root more strongly than might have occurred in other scenarios.
在异步 IO 大行其道之前,多线程编程是主流的高并发方案,那时的人们已经在开始抱怨其不确定性了,而现在,异步编程成为多线程编程的主要挑战者之一。
相比多线程编程,异步 IO 有两个主要优势:
1. 创建并维护一个线程的成本是巨大的,不适用于需要同时维护更大量连接的情况
1. 多线程编程是困难和不确定的(有时可能需要看运气)

Python 世界的 GIL 更加促进了异步 IO 在这一领域的茁壮成长。

How do you like your Spaghetti? (回调地狱)

The callback style of node.js and other asynchronous paradigms was considered to be problematic; callbacks organized for larger scale logic and operations made for verbose and hard-to-follow code, commonly referred to as callback spaghetti. Whether callbacks were in fact spaghetti or a thing of beauty was one of the great arguments of the 2000's, however I fortunately don't have to get into it because the async community has clearly acknowledged the former and taken many great steps to improve upon the situation.
回调风格是有很大问题的,当应用在复杂逻辑中时,会导致代码难以理解。习惯的称呼是意面回调(回调地狱)。大家都认可这一点,并开始改善他。

In the Python world, one approach offered in order to allow for asyncrhonous IO while removing the need for callbacks is the "implicit async IO" approach offered by eventlet and gevent. These take the approach of instrumenting IO functions to be implicitly non-blocking, organized such that a system of green threads may each run concurrently, using a native event library such as libev to schedule work between green threads based on the points at which non-blocking IO is invoked. The effect of implicit async IO systems is that the vast majority of code which performs IO operations need not be changed at all; in most cases, the same code can literally be used in both blocking and non-blocking IO contexts without any changes (though in typical real-world use cases, certainly not without occasional quirks).
Python 世界有一种解决方案:eventlet 和 gevent 提供的隐式异步,直接将原生的 IO 操作都改成非阻塞,封装成 “绿色线程”,然后通过 libev 之类的事件库来调度。
这样的好处是代码几乎不需要更改,自动在同步和异步之间切换。

In constrast to implicit async IO is the very promising approach offered by Python itself in the form of the previously mentioned asyncio library, now available in Python 3. Asyncio brings to Python fully standardized concepts of "futures" and coroutines, where we can produce non-blocking IO code that flows in a very similar way to traditional blocking code, while still maintaining the explicit nature of when non blocking operations occur.
另一种方案就是标准库中提供的 asyncio:Future 和协程。

SQLAlchemy? Asyncio? Yes? (SQLAlchemy 应该支持 asyncio?)

Now that asyncio is part of Python, it's a common integration point for all things async. Because it maintains the concepts of meaningful return values and exception catching semantics, getting an asyncio version of SQLAlchemy to work for real is probably feasible; it will still require at least several external modules that re-implement key methods of SQLAlchemy Core and ORM in terms of async results, but it seems that the majority of code, even within execution-centric parts, can stay much the same. It no longer means a rewrite of all of SQLAlchemy, and the async aspects should be able to remain entirely outside of the central library itself. I've started playing with this. It will be a lot of effort but should be doable, even for the ORM where some of the patterns like "lazy loading" will just have to work in some more verbose way.
asyncio 成为标准之后,大家都会向他看齐。由于在新版本中,return 和异常捕获都和之前类似,SQLAlchemy 提供异步也变得简单,但是即便如此,还是需要引入几个模块来改写部分关键方法。
大部分 SQLAlchemy 代码不涉及 IO,可以基本保持不变。
新引入的异步支持,将会独立于原有代码。
lazy loading 这种执行方式可能会变得更加冗长。

However. I don't know that you really would generally want to use an async-enabled form of SQLAlchemy.
只是,我不知道大家是否真心想使用异步的 SQLAlchemy。

Taking Async Web Scale (异步 Web)

重点:关于异步 Web 中的数据库操作

As anticipated, let's get into where it's all going wrong, especially for database-related code.

Issue One - Async as Magic Performance Fairy Dust (问题一:使用异步之后性能大幅提升?)

Many (but certainly not all) within both the node.js community as well as the Python community continue to claim that asynchronous programming styles are innately superior for concurrent performance in nearly all cases. In particular, there's the notion that the context switching approaches of explicit async systems such as that of asyncio can be had virtually for free, and as the Python has a GIL, that all adds up in some unspecified/non-illustrated/apples-to-oranges way to establish that asyncio will totally, definitely be faster than using any kind of threaded approach, or at the very least, not any slower. Therefore any web application should as quickly as possible be converted to use a front-to-back async approach for everything, from HTTP request to database calls, and performance enhancements will come for free.
JS 社区和 Python 社区有部分人觉得异步编程在所有场景下都具有天生的性能优势。
特别是说,像 asyncio 这种显式异步库的上下文切换是没有成本的。
而且由于 GIL,使用 asyncio 肯定比线程要好,至少不会更糟糕。
所以,所有 Web 程序应该尽快使用异步来提升性能。

I will address this only in terms of database access. For HTTP / "chat" server styles of communication, either listening as a server or making client calls, asyncio may very well be superior as it can allow lots more sleepy/arbitrarily slow connections to be tended towards in a simple way. But for local database access, this is just not the case.
对于 HTTP 连接来说,这说的通,但是数据库连接方面我不敢苟同。

1. Python is Very , Very Slow compared to your database

正式吐槽:Python 太慢,相对于数据库连接来说

Update - redditor Riddlerforce found valid issues with this section, in that I was not testing over a network connection. Results here are updated. The conclusion is the same, but not as hyperbolically amusing as it was before.
有朋友提醒我这一小节的问题,我重新测试了一遍,数据确实没有之前那么夸张,不过结论不变。

Let's first review asynchronous programming's sweet spot, the I/O Bound application:
先来看看异步 IO 的主要适用场景 —— IO 密集程序:

I/O Bound refers to a condition in which the time it takes to complete a computation is determined principally by the period spent waiting for input/output operations to be completed. This circumstance arises when the rate at which data is requested is slower than the rate it is consumed or, in other words, more time is spent requesting data than processing it.
A great misconception I seem to encounter often is the notion that communication with the database takes up a majority of the time spent in a database-centric Python application. This perhaps is a common wisdom in compiled languages such as C or maybe even Java, but generally not in Python. Python is very slow, compared to such systems; and while Pypy is certainly a big help, the speed of Python is not nearly as fast as your database, when dealing in terms of standard CRUD-style applications (meaning: not running large OLAP-style queries, and of course assuming relatively low network latencies). As I worked up in my PyMySQL Evaluation for Openstack, whether a database driver (DBAPI) is written in pure Python or in C will incur significant additional Python-level overhead. For just the DBAPI alone, this can be as much as an order of magnitude slower. While network overhead will cause more balanced proportions between CPU and IO, just the CPU time spent by Python driver itself still takes up twice the time as the network IO, and that is without any additional database abstraction libraries, business logic, or presentation logic in place.
很多人都觉得与数据库的通讯占用了系统大部分时间,在编译型语言(C,甚至 Java)中,这没问题,不过在 Python 中可不是这样。
正常的本地网络下,常规的数据库操作速度非常快,相对于 Python 执行速度来说。
我曾为 OpenStack 评估 PyMySQL,我发现,无论采用纯 Python 还是用 C 语言编写的 DBAPI,都比数据库通讯时间高了一个数量级。
虽然网络开销会使 CPU 和 IO 之间的比例更加平衡(Markjour: 没理解),Python 驱动本身消耗的时间依然比数据库通讯消耗的时间多一倍,这还不算 SQLAlchemy 的其他数据库抽象等其他逻辑。

Markjour: 这意味着大部分时间都不是数据库通讯所花费的,也就意味着异步 IO 无法有效的提升性能。

This script, adapted from the Openstack entry, illustrates a pretty straightforward set of INSERT and SELECT statements, and virtually no Python code other than the barebones explicit calls into the DBAPI.
跑用一个脚本来跑一段简单的 INSERT 和 SELECT 语句测试一下。

MySQL-Python, a pure C DBAPI, runs it like the following over a network:

DBAPI (cProfile):  <module 'MySQLdb'>
     47503 function calls in 14.863 seconds
DBAPI (straight time):  <module 'MySQLdb'>, total seconds 12.962214

With PyMySQL, a pure-Python DBAPI,and a network connection we're about 30% slower:

DBAPI (cProfile):  <module 'pymysql'>
     23807673 function calls in 21.269 seconds
DBAPI (straight time):  <module 'pymysql'>, total seconds 17.699732

上面两个对比,大概是说 Python 和 C 写的驱动通过网络访问数据执行结果大概是 13:17.7,Python 驱动慢 30%。

Running against a local database, PyMySQL is an order of magnitude slower than MySQLdb:

DBAPI:  <module 'pymysql'>, total seconds 9.121727

DBAPI:  <module 'MySQLdb'>, total seconds 1.025674

本地网络对比,Python 和 C 驱动对比是 9:1,再结合上面的网络访问测试,
Python 驱动本地和网络对比是 9:17.7,C 驱动本地和网络对比是 1:13。

To highlight the actual proportion of these runs that's spent in IO, the following two RunSnakeRun displays illustrate how much time is actually for IO within the PyMySQL run, both for local database as well as over a network connection. The proportion is not as dramatic over a network connection, but in that case network calls still only take 1/3rd of the total time; the other 2/3rds is spent in Python crunching the results. Keep in mind this is just the DBAPI alone; a real world application would have database abstraction layers, business and presentation logic surrounding these calls as well:

来吧,展示!
两个矩形树图显示的是 PyMySQL 运行消耗的各种时间,可视化的显示出 IO 时间所占用的比例。


Local connection - clearly not IO bound.
本地连接,几乎没有 IO 消耗。


Network connection - not as dramatic, but still not IO bound (8.7 sec of socket time vs. 24 sec for the overall execute)
网络连接没有本地连接那么厉害,但是也只收消耗了大概 1/3 的时间。
再次提醒,这只是驱动,还没有考虑数据库抽象和业务逻辑和表现逻辑。

2. AsyncIO uses appealing, but relatively inefficient Python paradigms

asyncio 只是一些语法糖,效率依然不高,相对而言。
Markjour: 这里,zzzeek 对比的早期版本的 asyncio,不知道现在的怎么样,可以跑一下试试。

At the core of asyncio is that we are using the @asyncio.coroutine decorator, which does some generator tricks in order to have your otherwise synchronous looking function defer to other coroutines. Central to this is the yield from technique, which causes the function to stop its execution at that point, while other things go on until the event loop comes back to that point. This is a great idea, and it can also be done using the more common yield statement as well. However, using yield from, we are able to maintain at least the appearance of the presence of return values:

@asyncio.coroutine
def some_coroutine():
    conn = yield from db.connect()
    return conn

That syntax is fantastic, I like it a lot, but unfortunately, the mechanism of that return conn statement is necessarily that it raises a StopIteration exception. This, combined with the fact that each yield from call more or less adds up to the overhead of an individual function call separately. I tweeted a simple demonstration of this, which I include here in abbreviated form:

def return_with_normal():
    """One function calls another normal function, which returns a value."""

    def foo():
        return 5

    def bar():
        f1 = foo()
        return f1

    return bar

def return_with_generator():
    """One function calls another coroutine-like function,
    which returns a value."""

    def decorate_to_return(fn):
        def decorate():
            it = fn()
            try:
                x = next(it)
            except StopIteration as y:
                return y.args[0]
        return decorate

    @decorate_to_return
    def foo():
        yield from range(0)
        return 5

    def bar():
        f1 = foo()
        return f1

    return bar

return_with_normal = return_with_normal()
return_with_generator = return_with_generator()

import timeit

print(timeit.timeit("return_with_generator()",
    "from __main__ import return_with_generator", number=10000000))
print(timeit.timeit("return_with_normal()",
    "from __main__ import return_with_normal", number=10000000))

The results we get are that the do-nothing yield from + StopIteration take about six times longer:

yield from: 12.52761328802444
normal: 2.110536064952612

对比了一下普通调用和生成器封装的调用,纯 Python 比较,封装一下之后,效率损失严重,恐怖的 1:6。
Markjour: zzzeek 是想证明原生异步 asyncio 背靠生成器技术本身会带来效率损失。

To which many people said to me, "so what? Your database call is much more of the time spent". Never minding that we're not talking here about an approach to optimize existing code, but to prevent making perfectly fine code more slow than it already is. The PyMySQL example should illustrate that Python overhead adds up very fast, even just within a pure Python driver, and in the overall profile dwarfs the time spent within the database itself. However, this argument still may not be convincing enough.
我们要避免引入新的编程风格导致程序变慢。

So, I will here present a comprehensive test suite which illustrates traditional threads in Python against asyncio, as well as gevent style nonblocking IO. We will use psycopg2 which is currently the only production DBAPI that even supports async, in conjunction with aiopg which adapts psycopg2's async support to asyncio and psycogreen which adapts it to gevent.
使用 psycopg2 这个适用于生产环境的支持异步的 DBAPI,再做一个测试,对比 gevent:
1. psycopg2 + aiopg -> asyncio
1. psycopg2 + psycogreen -> gevent

The purpose of the test suite is to load a few million rows into a Postgresql database as fast as possible, while using the same general set of SQL instructions, such that we can see if in fact the GIL slows us down so much that asyncio blows right past us with ease. The suite can use any number of connections simultaneously; at the highest I boosted it up to using 350 concurrent connections, which trust me, will not make your DBA happy at all.
测试内容:加载百万行数据到一个数据库中。
如果像有些人想象的一样,GIL 大幅拖慢程序运行速度,asyncio 应该可以遥遥领先。
可以使用任意连接测试,我最高使用 350 个并发连接(大大超出最佳实践推荐的连接数)。

The results of several runs on different machines under different conditions are summarized at the bottom of the README. The best performance I could get was running the Python code on one laptop interfacing to the Postgresql database on another, but in virtually every test I ran, whether I ran just 15 threads/coroutines on my Mac, or 350 (!) threads/coroutines on my Linux laptop, threaded code got the job done much faster than asyncio in every case (including the 350 threads case, to my surprise), and usually faster than gevent as well. Below are the results from running 120 threads/processes/connections on the Linux laptop networked to the Postgresql database on a Mac laptop:
从一台笔记本上连接另一台笔记本上的 postgresql。
README 文件中汇总了不同机器上不同条件下多次运行的结果。
结论是:不管是 Mac 还是 Linux,不管是 15 个并发,还是 350 个并发,线程代码运行得比协程要快得多。

Python2.7.8 threads (22k r/sec, 22k r/sec)
Python3.4.1 threads (10k r/sec, 21k r/sec)
Python2.7.8 gevent (18k r/sec, 19k r/sec)
Python3.4.1 asyncio (8k r/sec, 10k r/sec)

Above, we see asyncio significantly slower for the first part of the run (Python 3.4 seemed to have some issue here in both threaded and asyncio), and for the second part, fully twice as slow compared to both Python2.7 and Python3.4 interpreters using threads. Even running 350 concurrent connections, which is way more than you'd usually ever want a single process to run, asyncio could hardly approach the efficiency of threads. Even with the very fast and pure C-code psycopg2 driver, just the overhead of the aiopg library on top combined with the need for in-Python receipt of polling results with psycopg2's asynchronous library added more than enough Python overhead to slow the script right down.

结论就是说 asyncio 不行,Python 3.4 也不知道有什么问题,感觉有些奇怪。
Markjour: 没有自己跑一下,上面说的体验不深,不做评论,等我自己测试了再说。

Remember, I wasn't even trying to prove that asyncio is significantly slower than threads; only that it wasn't any faster. The results I got were more dramatic than I expected. We see also that an extremely low-latency async approach, e.g. that of gevent, is also slower than threads, but not by much, which confirms first that async IO is definitely not faster in this scenario, but also because asyncio is so much slower than gevent, that it is in fact the in-Python overhead of asyncio's coroutines and other Python constructs that are likely adding up to very significant additional latency on top of the latency of less efficient IO-based context switching.

这个测试不是想验证 asyncio 不如线程,只是想说应用 asyncio 的协程做数据库操作也不比线程更快。
Markjour: 大意似乎:gevent > 线程 >> asyncio

底下有低效的基于 IO 的上下文切换造成了损耗,
上面又有一些 Python 构造来造成额外的损耗,

Issue Two - Async as Making Coding Easier (问题二:异步编程很容易?)

This is the flip side to the "magic fairy dust" coin. This argument expands upon the "threads are bad" rhetoric, and in its most extreme form goes that if a program at some level happens to spawn a thread, such as if you wrote a WSGI application and happen to run it under mod_wsgi using a threadpool, you are now doing "threaded programming", of the caliber that is just as difficult as if you were doing POSIX threading exercises throughout your code. Despite the fact that a WSGI application should not have the slightest mention of anything to do with in-process shared and mutable state within in it, nope, you're doing threaded programming, threads are hard, and you should stop.
这是上面说的那个“魔法仙尘”硬币的另一面。这个观点引申出“线程很糟糕”的看法。
极端情况下,如果程序启动了一个线程(比如你在编写一个 WSGI 程序,然后运行在带线程池的 mod_wsgi 下),你就会感到像是在做 POSIX 线程编程一样困难。

The "threads are bad" argument has an interesting twist (ha!), which is that it is being used by explicit async advocates to argue against implicit async techniques. Glyph's Unyielding post makes exactly this point very well. The premise goes that if you've accepted that threaded concurrency is a bad thing, then using the implicit style of async IO is just as bad, because at the end of the day, the code looks the same as threaded code, and because IO can happen anywhere, it's just as non-deterministic as using traditional threads. I would happen to agree with this, that yes, the problems of concurrency in a gevent-like system are just as bad, if not worse, than a threaded system. One reason is that concurrency problems in threaded Python are fairly "soft" because already the GIL, as much as we hate it, makes all kinds of normally disastrous operations, like appending to a list, safe. But with green threads, you can easily have hundreds of them without breaking a sweat and you can sometimes stumble across pretty weird issues that are normally not possible to encounter with traditional, GIL-protected threads.
线程很糟糕的论点还会有一个有意思的转折:被显示异步的拥护者用来反对隐式异步。
Twisted 的作者 Glyph 写的这篇文章很好的阐述了这个观点。
不过有一个前提:如果你认为使用线程做并发很糟糕,那么隐式异步也是一样糟糕,因为隐式的异步 IO 和线程一样,其调度不受控,具有和线程一样的不确定性。
我同意这一点,线程由于 GIL 的存在会变得安全一些,而类似 gevent 的系统中的并发可能还不如线程,启动协程很简单,但是可能会遇到一些奇怪的问题。

As an aside, it should be noted that Glyph takes a direct swipe at the "magic fairy dust" crowd:
顺便说一下,Glyph 对这类拥护 “魔法仙尘” 的人群进行了一次直接打击:

Unfortunately, “asynchronous” systems have often been evangelized by emphasizing a somewhat dubious optimization which allows for a higher level of I/O-bound concurrency than with preemptive threads, rather than the problems with threading as a programming model that I’ve explained above. By characterizing “asynchronousness” in this way, it makes sense to lump all 4 choices together.
不幸的是,异步系统通常强调的优化并不太可靠,这种优化提供一种比抢占 CPU 更高级的 IO 密集型并发,但是实现方式不是线程模型。
I’ve been guilty of this myself, especially in years past: saying that a system using Twisted is more efficient than one using an alternative approach using threads. In many cases that’s been true, but:
我很内疚,这些年一直说使用 Twisted 比线程更加高效,事实是有些时候并不是这样:
1. the situation is almost always more complicated than that, when it comes to performance,
当涉及性能时,情况更加复杂,不能这么简单的概括
1. “context switching” is rarely a bottleneck in real-world programs, and
上下文切换的成本在真实场景中几乎不会成为瓶颈
1. it’s a bit of a distraction from the much bigger advantage of event-driven programming, which is simply that it’s easier to write programs at scale, in both senses (that is, programs containing lots of code as well as programs which have many concurrent users).
过多强调效率可能导致事件驱动的更大优势被忽略,那就是更加简单的开发大规模程序(包含大量代码,服务大量用户)

People will quote Glyph's post when they want to talk about how you'll have fewer bugs in your program when you switch to asyncio, but continue to promise greater performance as well, for some reason choosing to ignore this part of this very well written post.
大家尝尝使用这篇文章来证明使用 asyncio 会有更好的性能、更少的 BUG,但由出于某些目的,选择性忽略这一节。

Glyph makes a great, and very clear, argument for the twin points that both non-blocking IO should be used, and that it should be explicit. But the reasoning has nothing to do with non-blocking IO's original beginnings as a reasonable way to process data from a large number of sleepy and slow connections. It instead has to do with the nature of the event loop and how an entirely new concurrency model, removing the need to expose OS-level context switching, is emergent.
Glyph 认为应该使用非阻塞 IO,而且是显式地使用。其原因和处理大量连接的合理使用无关。而是,应该处理好事件循环的本质工作,已经为什么需要一个全新的任务调度机制。(Markjour: 没理解)

While we've come a long way from writing callbacks and can now again write code that looks very linear with approaches like asyncio, the approach should still require that the programmer explicitly specify all those function calls where IO is known to occur. It begins with the following example:
我们已经从需要写大量回调到可以使用 asyncio 编写更加优雅的代码,但是程序员依然需要显式的声明发生指定 IO 事件之后,应该调用什么函数。

def transfer(amount, payer, payee, server):
    if not payer.sufficient_funds_for_withdrawal(amount):
        raise InsufficientFunds()
    log("{payer} has sufficient funds.", payer=payer)
    payee.deposit(amount)
    log("{payee} received payment", payee=payee)
    payer.withdraw(amount)
    log("{payer} made payment", payer=payer)
    server.update_balances([payer, payee])

The concurrency mistake here in a threaded perspective is that if two threads both run transfer() they both may withdraw from payer such that payer goes below InsufficientFunds, without this condition being raised.
从多线程编程角度来看,上面的代码是不安全的,可能会发生一个并发性错误。

The explcit async version is then:

@coroutine
def transfer(amount, payer, payee, server):
    if not payer.sufficient_funds_for_withdrawal(amount):
        raise InsufficientFunds()
    log("{payer} has sufficient funds.", payer=payer)
    payee.deposit(amount)
    log("{payee} received payment", payee=payee)
    payer.withdraw(amount)
    log("{payer} made payment", payer=payer)
    yield from server.update_balances([payer, payee])

Where now, within the scope of the process we're in, we know that we are only allowing anything else to happen at the bottom, when we call yield from server.update_balances(). There is no chance that any other concurrent calls to payer.withdraw() can occur while we're in the function's body and have not yet reached the server.update_balances() call.
协程环境中,在执行最后一步,yield 之前不可能调度到其他地方。

He then makes a clear point as to why even the implicit gevent-style async isn't sufficient. Because with the above program, the fact that payee.deposit() and payer.withdraw() do not do a yield from, we are assured that no IO might occur in future versions of these calls which would break into our scheduling and potentially run another transfer() before ours is complete.
但是,上面的结论有一个前提,前面的操作没有异步。
所以,为了保证程序的 “协程安全”,gevent 那样的隐式异步还是不够。

(As an aside, I'm not actually sure, in the realm of "we had to type yield from and that's how we stay aware of what's going on", why the yield from needs to be a real, structural part of the program and not just, for example, a magic comment consumed by a gevent/eventlet-integrated linter that tests callstacks for IO and verifies that the corresponding source code has been annotated with special comments, as that would have the identical effect without impacting any libraries outside of that system and without incurring all the Python performance overhead of explicit async. But that's a different topic.)
题外话:我们为什么必须通过 yield from 来得到异步的结果,而不是通过一个魔法注释,然后使用一个集成 gevent/eventlet 的 linter(Markjour: 没理解)

Regardless of style of explicit coroutine, there's two flaws with this approach.
且不说编程风格如何,这种实现技术上有两个缺陷:

One is that asyncio makes it so easy to type out yield from that the idea that it prevents us from making mistakes loses a lot of its plausibility. A commenter on Hacker News made this great point about the notion of asynchronous code being easier to debug:
使用 asyncio 会导致 yield from 到处都是(可能是说协程的传染性),阻止用户犯错的想法可能导致程序不可信。
Hacker News 上面有这么一个很棒的评论,关于如何使异步代码变得更容易调试。

It's basically, "I want context switches syntactically explicit in my code. If they aren't, reasoning about it is exponentially harder."
“我希望在代码中语法明确地上下文切换,否则理解这段代码会变得成倍的困难。”
And I think that's pretty clearly a strawman. Everything the author claims about threaded code is true of any re-entrant code, multi-threaded or not. If your function inadvertently calls a function which calls the original function recursively, you have the exact same problem.
上面的这种看法是自欺欺人。无论是否是多线程,作者关于线程代码的描述都适用于所有可重入代码。如果你的函数无意中递归的调用了原函数,你会遇到相同的问题。
But, guess what, that just doesn't happen that often. Most code isn't re-entrant. Most state isn't shared.
但是,这种情况并不经常发生,大部分代码不能重入,大部分数据不能共享。
For code that is concurrent and does interact in interesting ways, you are going to have to reason about it carefully. Smearing "yield from" all over your code doesn't solve.
对于并发的代码,需要仔细斟酌。yield from 不能解决问题。
In practice, you'll end up with so many "yield from" lines in your code that you're right back to "well, I guess I could context switch just about anywhere", which is the problem you were trying to avoid in the first place.
实践中,到处都是 yield from 会给你一种可以随意切换上下文的错觉,这正是编程需要重点避免的问题。

In my benchmark code, one can see this last point is exactly true. Here's a bit of the threaded version:
在 SQLAlchemy 的基准测试代码中,你可以看到上一个论点是完全正确的。

cursor.execute(
    "select id from geo_record where fileid=%s and logrecno=%s",
    (item['fileid'], item['logrecno'])
)
row = cursor.fetchone()
geo_record_id = row[0]

cursor.execute(
    "select d.id, d.index from dictionary_item as d "
    "join matrix as m on d.matrix_id=m.id where m.segment_id=%s "
    "order by m.sortkey, d.index",
    (item['cifsn'],)
)
dictionary_ids = [
    row[0] for row in cursor
]
assert len(dictionary_ids) == len(item['items'])

for dictionary_id, element in zip(dictionary_ids, item['items']):
    cursor.execute(
        "insert into data_element "
        "(geo_record_id, dictionary_item_id, value) "
        "values (%s, %s, %s)",
        (geo_record_id, dictionary_id, element)
    )

Here's a bit of the asyncio version:

yield from cursor.execute(
    "select id from geo_record where fileid=%s and logrecno=%s",
    (item['fileid'], item['logrecno'])
)
row = yield from cursor.fetchone()
geo_record_id = row[0]

yield from cursor.execute(
    "select d.id, d.index from dictionary_item as d "
    "join matrix as m on d.matrix_id=m.id where m.segment_id=%s "
    "order by m.sortkey, d.index",
    (item['cifsn'],)
)
rows = yield from cursor.fetchall()
dictionary_ids = [row[0] for row in rows]

assert len(dictionary_ids) == len(item['items'])

for dictionary_id, element in zip(dictionary_ids, item['items']):
    yield from cursor.execute(
        "insert into data_element "
        "(geo_record_id, dictionary_item_id, value) "
        "values (%s, %s, %s)",
        (geo_record_id, dictionary_id, element)
    )

Notice how they look exactly the same? The fact that yield from is present is not in any way changing the code that I write, or the decisions that I make - this is because in boring database code, we basically need to do the queries that we need to do, in order. I'm not going to try to weave an intelligent, thoughtful system of in-process concurrency into how I call into the database or not, or try to repurpose when I happen to need database data as a means of also locking out other parts of my program; if I need data I'm going to call for it.
其实 yield from 并没有改变任何我的代码,因为数据库操作基本上还是要求按顺序进行。我不会试着设计一个智能的用户级任务调度机制来帮我处理一切数据库操作,如果我需要做什么,我会自己来调用,无论是读数据,还是加锁。

Whether or not that's compelling, it doesn't actually matter - using async or mutexes or whatever inside our program to control concurrency is in fact completely insufficient in any case. Instead, there is of course something we absolutely must always do in real world boring database code in the name of concurrency, and that is:
仅仅使用异步,或者互斥锁,或者别的什么来控制并发实际上远远不够。

Database Code Handles Concurrency through ACID, Not In-Process Synchronization

数据库本身的 ACID 控制,而非进程内同步。

Whether or not we've managed to use threaded code or coroutines with implicit or explicit IO and find all the race conditions that would occur in our process, that matters not at all if the thing we're talking to is a relational database, especially in today's world where everything runs in clustered / horizontal / distributed ways - the handwringing of academic theorists regarding the non-deterministic nature of threads is just the tip of the iceberg; we need to deal with entirely distinct processes, and regardless of what's said, non-determinism is here to stay.

For database code, you have exactly one technique to use in order to assure correct concurrency, and that is by using ACID-oriented constructs and techniques. These unfortunately don't come magically or via any known silver bullet, though there are great tools that are designed to help steer you in the right direction.

All of the example transfer() functions above are incorrect from a database perspective. Here is the correct one:

def transfer(amount, payer, payee, server):
    with transaction.begin():
        if not payer.sufficient_funds_for_withdrawal(amount, lock=True):
            raise InsufficientFunds()
        log("{payer} has sufficient funds.", payer=payer)
        payee.deposit(amount)
        log("{payee} received payment", payee=payee)
        payer.withdraw(amount)
        log("{payer} made payment", payer=payer)
        server.update_balances([payer, payee])

See the difference? Above, we use a transaction. To call upon the SELECT of the payer funds and then modify them using autocommit would be totally wrong. We then must ensure that we retrieve this value using some appropriate system of locking, so that from the time that we read it, to the time that we write it, it is not possible to change the value based on a stale assumption. We'd probably use a SELECT .. FOR UPDATE to lock the row we intend to update. Or, we might use "read committed" isolation in conjunction with a version counter for an optimistic approach, so that our function fails if a race condition occurs. But in no way does the fact that we're using threads, greenlets, or whatever concurrency mechanism in our single process have any impact on what strategy we use here; our concurrency concerns involve the interaction of entirely separate processes.

Sum up! (总结)

Please note I am not trying to make the point that you shouldn't use asyncio. I think it's really well done, it's fun to use, and I still am interested in having more of a SQLAlchemy story for it, because I'm sure folks will still want this no matter what anyone says.
再次申明,我不是在说不要用 asyncio,我觉得它很棒。
我还是会去为 sqlalchemy 提供 asyncio 相关支持,因为不管怎么说,一定有人需要它。

My point is that when it comes to stereotypical database logic, there are no advantages to using it versus a traditional threaded approach, and you can likely expect a small to moderate decrease in performance, not an increase. This is well known to many of my colleagues(1, 2), but recently I've had to argue this point nonetheless.
我的观点是,在常见的数据库场景中,和多线程相比,使用 asyncio 没有优势,甚至性能会下降一些。
我的同事都清楚这一点,但是最近我面临一些这方面的争论。

Markjour: 作者放了他 twitter 消息和两个回复,大概是说,他无法通过实验证明 asyncio 或者 gevent 的效率超过多线程。
同事 A 回复:这些应该告知那些为了所谓性能,认为应该在所有地方使用异步的人。
同事 B 回复:异步系统中性能意味着吞吐量,而不是延迟。

An ideal integration situation if one wants to have the advantages of non-blocking IO for receiving web requests without needing to turn their business logic into explicit async is a simple combination of nginx with uWsgi, for example.
理想的集成方案是,如果有人希望利用非阻塞 IO 的优势来处理 Web 请求,不需要使用异步框架,直接上 uwsgi。