#117 Python UCS 相关问题
Python 字符编码 2023-12-08老项目中有时可以看到报错:unichr() arg not in range(0x10000) (narrow Python build)
经过一番调研之后,发现是和 Python 内部 Unicode 编码有关。
coding in a complicated world
老项目中有时可以看到报错:unichr() arg not in range(0x10000) (narrow Python build)
经过一番调研之后,发现是和 Python 内部 Unicode 编码有关。
https://lp.jetbrains.com/zh-cn/python-developers-survey-2022/
59% 是全职员工,其中 65% 是程序员,年龄主要是 20~40 岁,大部分人在 10 人以下团队工作。
PS:Linux 相较去年下降了 4%
Python 3 的比例达到 95% 左右,其中:
Name | Percent | Remark |
---|---|---|
venv | 43% | |
virtualenv | 37% | |
conda | 21% | environment.yml |
poetry | 16% | poetry |
pipenv | 14% | pipfile |
virtualenvwrapper | 6% | |
hatch | 3% |
程序中有些时候无法避免需要执行系统命令完成一些任务。
比如,我们系统有一个内部的小功能以来 rsync 来做文件同步。
最近发现如果数据量稍微大一点,rsync 就会卡住,不知道为什么。
经过排查之后,发现是 Popen 模块的使用错误导致。
mkdir /tmp/aa /tmp/bb
# Create 10K ~ 1000k Files
for i in {00001..10000}; do
file_size=$((1 + $RANDOM % 100))
dd if=/dev/urandom of=/tmp/aa/file$i.txt bs=10K count=$file_size
done
# du -sh /tmp/aa/
# 4.9G /tmp/aa/
from subprocess import PIPE, STDOUT, Popen
src_dir = '/tmp/aa/'
tgt_dir = '/tmp/bb/'
# --remove-source-files
command = 'rsync -av %s %s' % (src_dir, tgt_dir)
p = Popen(command, stdin=PIPE, stdout=PIPE, stderr=STDOUT, shell=True)
p.wait()
if p.returncode == 0:
LOG.info('rsync success')
else:
LOG.warning('rsync error %d', p.returncode)
数据传输卡在 file0670.txt 了,总传输数据 2.3G。
经过排查之后,确认是我们的编码问题。
代码捕获了标准输出和标准错误,但是我们没有去读这个数据,最后把管道缓冲区占满了,程序就无法继续运行。
Popen 初始化有一个参数 pipesize,如果设置了,则会调用 fcntl.fcntl(p2cwrite, fcntl.F_SETPIPE_SZ, self.pipesize)
设置缓冲区大小。
在 man fcntl 中了解到:
Changing the capacity of a pipe
F_SETPIPE_SZ (int; since Linux 2.6.35)
Change the capacity of the pipe referred to by fd to be at least arg bytes. An unprivileged process can adjust the pipe capacity to any value between the system page size and the limit defined in /proc/sys/fs/pipe-max-size (see proc(5)). Attempts to set the pipe capacity below the page size are silently rounded up to the page size. Attempts by an unprivileged process to set the pipe capacity above the limit in /proc/sys/fs/pipe-max-size yield the error EPERM; a privileged process (CAP_SYS_RESOURCE) can override the limit.
When allocating the buffer for the pipe, the kernel may use a capacity larger than arg, if that is convenient for the implementation. (In the current implementation, the allocation is the next higher power-of-two page-size multiple of the requested size.) The actual capacity (in bytes) that is set is returned as the function result.
Attempting to set the pipe capacity smaller than the amount of buffer space currently used to store data produces the error EBUSY.
Note that because of the way the pages of the pipe buffer are employed when data is written to the pipe, the number of bytes that can be written may be less than the nominal size, depending on the size of the writes.
F_GETPIPE_SZ (void; since Linux 2.6.35)
Return (as the function result) the capacity of the pipe referred to by fd.
又在 man 7 pipe | grep size -C10
中了解到:
Pipe capacity
A pipe has a limited capacity. If the pipe is full, then a write(2) will block or fail, depending on whether the O_NONBLOCK flag is set (see below). Different implementations have different limits for the pipe capacity.
Applications should not rely on a particular capacity: an application should be designed so that a reading process consumes data as soon as it is available, so that a writing process does not remain blocked.
In Linux versions before 2.6.11, the capacity of a pipe was the same as the system page size (e.g., 4096 bytes on i386). Since Linux 2.6.11, the pipe capacity is 16 pages (i.e., 65,536 bytes in a system with a page size of 4096 bytes). Since Linux 2.6.35, the default pipe capacity is 16 pages, but the capacity can be queried and set using the fcntl(2) F_GETPIPE_SZ and F_SETPIPE_SZ operations. See fcntl(2) for more information.
The following ioctl(2) operation, which can be applied to a file descriptor that refers to either end of a pipe, places a count of the number of unread bytes in the pipe in the int buffer pointed to by the final argument of the call:
ioctl(fd, FIONREAD, &nbytes);
The FIONREAD operation is not specified in any standard, but is provided on many implementations.
也就是说:
/proc/sys/fs/pipe-max-size
O_NONBLOCK
)getconf PAGE_SIZE
4096
getconf PAGESIZE
4096
系统的缓冲区大小应该是 16 x 4K = 64K
import subprocess
cmd = ['python', '-c', 'print("a" * 1024 * 64, end=".")']
# cmd = ['python', '-c', 'import time; time.sleep(10);']
print(1)
p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
print(2)
# stdout, stderr = p.communicate()
# print(repr([stdout, stderr]))
print(3)
p.wait()
子进程执行 python 命令,输出 64KB 不会卡住,增加 1B 就会卡在 wait 那个地方。
解除 communicate 那一行的注释,程序能正常运行。
Popen(command, shell=True)
Popen(command, stdout=subprocess.DEVNULL, shell=True)
# python2 不支持 DEVNULL
devnull = os.open(os.devnull, os.O_RDWR)
Popen(command, stdout=devnull, shell=True)
devnull.close()
p = Popen(command, stdout=PIPE, stderr=STDOUT, shell=True)
# 程序阻塞着,不停从子进程标准输出读数据
p.communicate()
result = subprocess.run(command, shell=True)
print(result.returncode)
print(result.stdout)
print(result.stderr)
相信做 Python 的都知道 Python 有一个内置的 HTTP Server,可能经常临时性使用一下。
今天,我在 https://til.simonwillison.net/python/stdlib-cli-tools 上了解到 Python 其实内置了不少小工具,可以了解一下,说不定可以用到。
Python 内部的 GIL(全局解释器锁)使 Python 的多线程开发变得更加简单,避免了大部分竞争条件,但是限制了 Python 进程使用多核心处理器,所有需要并行的场景全部受限。
官方建议,如果需要用到多核,那就选择多进程编程。
Python 出现的时候还没有多核 CPU,而现在多核心早就成为了主流,4 核、8 核太常见了。
因此从来不乏移除 GIL 的声音,不过没有一次成功。要想不对现有的项目造成太大的影响简直难于登天。
PS:几年前有一个叫做 Gilectomy 的项目(GIL Ectomy,GIL 切除术)因为导致 Python 性能下滑严重而失败。
最大的挑战是要保持对现有代码的兼容性,保住基本盘,也就是容易上手,容易开发。
2021 年,香农计划成员 Eric Snow 提交 PEP 684 A Per-Interpreter GIL(每个解释器一个 GIL)。
这应该是一个比较稳妥的方案,但我怀疑性价比是否足够。
PS:和下面提到的 nogil 方案不冲突。
PEP 683 – Immortal Objects, Using a Fixed Refcount
PEP 554 – Multiple Interpreters in the Stdlib
2021 年,Meta(Facebook)开发者 Sam Gross 基于 Python 3.9 创建了一个 nogil 的分支,最终证明了这个方案技术上可行,相较于 Gilectomy 单核性能没有收到太大的影响,拓展性也不错。
现在是最接近移除 GIL 的时刻。
如果顺利通过,几年之后,我们就能用上没有 GIL 的 Python。
--disable-gil
来创建一个没有 GIL 的 Python 版本。A while ago at $work, we had a performance issue with one of our core Python libraries.
This particular library forms the backbone of our 3D processing pipeline. It’s a rather big and complex library which uses NumPy and other scientific Python packages to do a wide range of mathematical and geometrical operations.
Our system also has to work on-prem with limited CPU resources, and while at first it performed well, as the number of concurrent physical users grew we started running into problems and our system struggled to keep up with the load.
We came to the conclusion that we had to make our system at least 50 times faster to handle the increased workload, and we figured that Rust could help us achieve that.
Because the performance problems we encountered are pretty common, we can recreate & solve them right here, in a (not-so-short) article.
So grab a cup of tea (or coffee) and I’ll walk you through (a) the basic underlying problem and (b) a few iterations of optimizations we can apply to solve this problem.
If you want to jump straight to the final code, just to go to the summary.
Let’s create a small library, which will exhibit our original performance issues (but does completely arbitrary work).
Imagine you have a list of polygons and a of list points, all in 2D. For business reasons, we want to “match” each point to a single polygon.
Our imaginary library is going to:
In code, that’s going to look like this (The full code can be found here):
from typing import List, Tuple
import numpy as np
from dataclasses import dataclass
from functools import cached_property
Point = np.array
@dataclass
class Polygon:
x: np.array
y: np.array
@cached_property
def center(self) -> Point: ...
def area(self) -> float: ...
def find_close_polygons(polygon_subset: List[Polygon], point: Point, max_dist: float) -> List[Polygon]:
...
def select_best_polygon(polygon_sets: List[Tuple[Point, List[Polygon]]]) -> List[Tuple[Point, Polygon]]:
...
def main(polygons: List[Polygon], points: np.ndarray) -> List[Tuple[Point, Polygon]]:
...
The key difficulty (performance wise) is this mix of Python objects and numpy arrays.
We are going to analyze this in depth in a minute.
It’s worth noting that converting parts of / everything to vectorized numpy might be possible for this toy library, but will be nearly impossible for the real library while making the code much less readable and modifiable, and the gains are going to be limited (here’s a partially vertorized version, which is faster but far from the results we are going to achieve).
Also, using any JIT-based tricks (PyPy / numba) results in very small gains (as we will measure, just to make sure).
As compelling as a complete rewrite was, it had a few problems:
It is time to introduce our friend the profiler.
Python has a built in Profiler (cProfile
), but in this case it’s not really the right tool for the job:
We are going to use py-spy
(GitHub).
py-spy
is a sampling profiler which can see into native frames.
They also mercifully publish pre-built wheels to pypi, so we can just pip install py-spy
and get to work.
We also need something to measure.
# measure.py
import time
import poly_match
import os
# Reduce noise, actually improve perf in our case.
os.environ["OPENBLAS_NUM_THREADS"] = "1"
polygons, points = poly_match.generate_example()
# We are going to increase this as the code gets faster and faster.
NUM_ITER = 10
t0 = time.perf_counter()
for _ in range(NUM_ITER):
poly_match.main(polygons, points)
t1 = time.perf_counter()
took = (t1 - t0) / NUM_ITER
print(f"Took and avg of {took * 1000:.2f}ms per iteration")
It’s not very scientific, but it’s going to take us very far.
“Good benchmarking is hard. Having said that, do not stress too much about having a perfect benchmarking setup, particularly when you start optimizing a program.”
~ Nicholas Nethercote, in “The Rust Performance Book”
Running this script will give us our baseline:
$ python measure.py
Took an avg of 293.41ms per iteration
For the original library, we used 50 different examples to make sure all cases are covered.
This matched the overall system perf, meaning we can start working on crushing this number.
Side note: We can also measure using PyPy (we’ll also add a warmup to allow the JIT to do its magic).
$ conda create -n pypyenv -c conda-forge pypy numpy && conda activate pypyenv
$ pypy measure_with_warmup.py
Took an avg of 1495.81ms per iteration
So, let’s find out what is so slow here.
$ py-spy record --native -o profile.svg -- python measure.py
py-spy> Sampling process 100 times a second. Press Control-C to exit.
Took an avg of 365.43ms per iteration
py-spy> Stopped sampling because process exited
py-spy> Wrote flamegraph data to 'profile.svg'. Samples: 391 Errors: 0
Already, we can see that the overhead is pretty small. Just for comparison, using cProfile
we get this:
$ python -m cProfile measure.py
Took an avg of 546.47ms per iteration
7551778 function calls (7409483 primitive calls) in 7.806 seconds
...
We get this nice, reddish graph called a flamegraph:
Each box is a function, and we can see the relative time we spend in each function, including the functions it is calling to (going down the graph/stack). Try clicking on a the norm
box to zoom into it.
Here, the main takeaways are:
find_close_polygons
.norm
, which is a numpy function.So, let’s have a look at find_close_polygons
:
def find_close_polygons(
polygon_subset: List[Polygon], point: np.array, max_dist: float
) -> List[Polygon]:
close_polygons = []
for poly in polygon_subset:
if np.linalg.norm(poly.center - point) < max_dist:
close_polygons.append(poly)
return close_polygons
We are going to rewrite this function in Rust.
Before diving into the details, it’s important to notice a few things here:
Polygon
, np.array
).pyo3
is a crate for interacting between Python and Rust. It has exceptionally good documentation, and they explain the basic setup here.
We are going to call our crate poly_match_rs
, and add function called find_close_polygons
.
mkdir poly_match_rs && cd "$_"
pip install maturin
maturin init --bindings pyo3
maturin develop
Starting out, our crate is going to look like this:
use pyo3::prelude::*;
#[pyfunction]
fn find_close_polygons() -> PyResult<()> {
Ok(())
}
#[pymodule]
fn poly_match_rs(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(find_close_polygons, m)?)?;
Ok(())
}
We also need to remember to execute maturin develop
every time we change the Rust library.
And thats it! Let’s call our new function and see what happens.
>>> poly_match_rs.find_close_polygons(polygons, point, max_dist)
E TypeError: poly_match_rs.poly_match_rs.find_close_polygons() takes no arguments (3 given)
We’ll start with matching the expected API.
PyO3 is pretty smart about Python to Rust conversions, so that’s going to be pretty easy:
#[pyfunction]
fn find_close_polygons(polygons: Vec<PyObject>, point: PyObject, max_dist: f64) -> PyResult<Vec<PyObject>> {
Ok(vec![])
}
PyObject
is (as the name suggest) a generic “anything goes” Python object. We’ll try to interact with it in a bit.
This should make the program run (albeit incorrectly).
I’m going to just copy and paste the original Python function, and fix the syntax.
#[pyfunction]
fn find_close_polygons(polygons: Vec<PyObject>, point: PyObject, max_dist: f64) -> PyResult<Vec<PyObject>> {
let mut close_polygons = vec![];
for poly in polygons {
if norm(poly.center - point) < max_dist {
close_polygons.push(poly)
}
}
Ok(close_polygons)
}
Cool, but this won’t compile:
% maturin develop
...
error[E0609]: no field `center` on type `Py<PyAny>`
--> src/lib.rs:8:22
|
8 | if norm(poly.center - point) < max_dist {
| ^^^^^^ unknown field
error[E0425]: cannot find function `norm` in this scope
--> src/lib.rs:8:12
|
8 | if norm(poly.center - point) < max_dist {
| ^^^^ not found in this scope
error: aborting due to 2 previous errors ] 58/59: poly_match_rs
We need three crates to implement our function:
# For Rust-native array operations.
ndarray = "0.15"
# For a `norm` function for arrays.
ndarray-linalg = "0.16"
# For accessing numpy-created objects, based on `ndarray`.
numpy = "0.18"
First, lets turn the opaque and generic point: PyObject
into something we can work with.
Just like we asked PyO3 for a “Vec
of PyObject
s”, we can ask for a numpy-array, and it’ll auto-convert the argument for us.
use numpy::PyReadonlyArray1;
#[pyfunction]
fn find_close_polygons(
// An object which says "I have the GIL", so we can access Python-managed memory.
py: Python<'_>,
polygons: Vec<PyObject>,
// A reference to a numpy array we will be able to access.
point: PyReadonlyArray1<f64>,
max_dist: f64,
) -> PyResult<Vec<PyObject>> {
// Convert to `ndarray::ArrayView1`, a fully operational native array.
let point = point.as_array();
...
}
Because point
is now an ArrayView1
, we can actually use it. For example:
// Make the `norm` function available.
use ndarray_linalg::Norm;
assert_eq!((point.to_owned() - point).norm(), 0.);
Now we just need to get the center of each polygon, and “cast” it to an ArrayView1
.
In PyO3, this looks like this:
let center = poly
.getattr(py, "center")? // Python-style getattr, requires a GIL token (`py`).
.extract::<PyReadonlyArray1<f64>>(py)? // Tell PyO3 what to convert the result to.
.as_array() // Like `point` before.
.to_owned(); // We need one of the sides of the `-` to be "owned".
It’s a bit of a mouthful, but overall the result is a pretty clear line-to-line translation of the original code:
1use pyo3::prelude::*;
2
3use ndarray_linalg::Norm;
4use numpy::PyReadonlyArray1;
5
6#[pyfunction]
7fn find_close_polygons(
8 py: Python<'_>,
9 polygons: Vec<PyObject>,
10 point: PyReadonlyArray1<f64>,
11 max_dist: f64,
12) -> PyResult<Vec<PyObject>> {
13 let mut close_polygons = vec![];
14 let point = point.as_array();
15 for poly in polygons {
16 let center = poly
17 .getattr(py, "center")?
18 .extract::<PyReadonlyArray1<f64>>(py)?
19 .as_array()
20 .to_owned();
21
22 if (center - point).norm() < max_dist {
23 close_polygons.push(poly)
24 }
25 }
26
27 Ok(close_polygons)
28}
vs the original:
def find_close_polygons(
polygon_subset: List[Polygon], point: np.array, max_dist: float
) -> List[Polygon]:
close_polygons = []
for poly in polygon_subset:
if np.linalg.norm(poly.center - point) < max_dist:
close_polygons.append(poly)
return close_polygons
We expect this version to have some advantage over the original function, but how much?
$ (cd ./poly_match_rs/ && maturin develop)
$ python measure.py
Took an avg of 609.46ms per iteration
So.. Is Rust just super slow? No! We just forgot to ask for speed! If we run with maturin develop --release
we get much better results:
$ (cd ./poly_match_rs/ && maturin develop --release)
$ python measure.py
Took an avg of 23.44ms per iteration
Now that is a nice speedup!
We also want to see into our native code, so we are going to enable debug symbols in release. While we are at it, we might as well ask for maximum speed.
# added to Cargo.toml
[profile.release]
debug = true # Debug symbols for our profiler.
lto = true # Link-time optimization.
codegen-units = 1 # Slower compilation but faster code.
Now, using the --native
flag in py-spy
is going to show us both Python and our new native code.
Running py-spy
again
$ py-spy record --native -o profile.svg -- python measure.py
py-spy> Sampling process 100 times a second. Press Control-C to exit.
we get this flamegraph (non-red colors are added to so we can refer to them):
Looking at the profiler output, we can see a few interesting things:
find_close_polygons::...::trampoline
(the symbol Python directly calls) and __pyfunction_find_close_polygons
(our actual implementation).if (center - point).norm() < max_dist { ... }
) which is lib_v1.rs:22
(very small box on the right), is about 9% of the total runtime.lib_v1.rs:16
, which is poly.getattr(...).extract(...)
and if we zoom in we can see is really just getattr
and getting the underlying array using as_array
.The conclusion here is that we need to focus on solving the 3rd point, and the way to do that is to Rewrite Polygon
in Rust.
Let’s look at our target:
@dataclass
class Polygon:
x: np.array
y: np.array
_area: float = None
@cached_property
def center(self) -> np.array:
centroid = np.array([self.x, self.y]).mean(axis=1)
return centroid
def area(self) -> float:
if self._area is None:
self._area = 0.5 * np.abs(
np.dot(self.x, np.roll(self.y, 1)) - np.dot(self.y, np.roll(self.x, 1))
)
return self._area
We’ll want to keep the existing API as much as possible, but we don’t really need area
to be that fast (for now).
The actual class might have additional complex stuff, like a merge
method which uses ConvexHull
from scipy.spatial
.
To cut costs (and limit the scope of this already long article), we will only move the “core” functionality of Polygon
to Rust, and subclass that from Python to implement the rest of the API.
Our struct
is going to look like this:
// `Array1` is a 1d array, and the `numpy` crate will play nicely with it.
use ndarray::Array1;
// `subclass` tells PyO3 to allow subclassing this in Python.
#[pyclass(subclass)]
struct Polygon {
x: Array1<f64>,
y: Array1<f64>,
center: Array1<f64>,
}
Now we need to actually implement it. We want to expose poly.{x, y, center}
as:
We also need a constructor so Python can create new Polygon
s.
use numpy::{PyArray1, PyReadonlyArray1, ToPyArray};
#[pymethods]
impl Polygon {
#[new]
fn new(x: PyReadonlyArray1<f64>, y: PyReadonlyArray1<f64>) -> Polygon {
let x = x.as_array();
let y = y.as_array();
let center = Array1::from_vec(vec![x.mean().unwrap(), y.mean().unwrap()]);
Polygon {
x: x.to_owned(),
y: y.to_owned(),
center,
}
}
// the `Py<..>` in the return type is a way of saying "an Object owned by Python".
#[getter]
fn x(&self, py: Python<'_>) -> PyResult<Py<PyArray1<f64>>> {
Ok(self.x.to_pyarray(py).to_owned()) // Create a Python-owned, numpy version of `x`.
}
// Same for `y` and `center`.
}
We need to add our new struct as a class to the module:
#[pymodule]
fn poly_match_rs(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_class::<Polygon>()?; // new.
m.add_function(wrap_pyfunction!(find_close_polygons, m)?)?;
Ok(())
}
And now we can update the Python code to use it:
class Polygon(poly_match_rs.Polygon):
_area: float = None
def area(self) -> float:
...
We can compile it and it’ll actually work, but it’ll be much slower! (Remember that x
, y
, and center
will now need to create a new numpy array on each access).
To actually improve performance, we need to extract
our original Rust-based Polygon
from the list of Python-Polygon
s.
PyO3 is very flexible with this type of operation, so there are a few ways we could do it. One limit we have is that we also need to return Python-Polygon
s, and we don’t want to do any cloning of the actual data.
It’s possible to manually call .extract::<Polygon>(py)?
on each PyObject
s, but we ask PyO3 to give us Py<Polygon>
directly.
This is a reference to a Python-owned object, which we expect to contain an instance (or a subclass, in our case) of a native pyclass
struct.
45#[pyfunction]
46fn find_close_polygons(
47 py: Python<'_>,
48 polygons: Vec<Py<Polygon>>, // References to Python-owned objects.
49 point: PyReadonlyArray1<f64>,
50 max_dist: f64,
51) -> PyResult<Vec<Py<Polygon>>> { // Return the same `Py` references, unmodified.
52 let mut close_polygons = vec![];
53 let point = point.as_array();
54 for poly in polygons {
55 let center = poly.borrow(py).center // Need to use the GIL (`py`) to borrow the underlying `Polygon`.
56 .to_owned();
57
58 if (center - point).norm() < max_dist {
59 close_polygons.push(poly)
60 }
61 }
62
63 Ok(close_polygons)
64}
Let’s see what we get using this code:
$ python measure.py
Took an avg of 6.29ms per iteration
We are nearly there! Just x2 to go!
Let’s fire up the profiler one more time.
select_best_polygon
, which now calls some Rust code (when it gets the x
& y
vectors)extract_argument
(under lib_v2.rs:48
), so we are still paying quite a lot on overhead!PyIterator::next
and PyTypeInfo::is_type_of
, which aren’t easy to fix.lib_v2.rs:58
is our if
, and we see drop_in_place
and to_owned
.Let’s tackle the last point.
This our problematic snippet:
let center = poly.borrow(py).center
.to_owned();
if (center - point).norm() < max_dist { ... }
What we want is to avoid that to_owned
. But we need an owned object for norm
, so we’ll have to implement that manually.
(The reason we can improve on ndarray
here is that we know that our array is actually just 2 f32
s).
This would look like this:
use ndarray_linalg::Scalar;
let center = &poly.as_ref(py).borrow().center;
if ((center[0] - point[0]).square() + (center[1] - point[1]).square()).sqrt() < max_dist {
close_polygons.push(poly)
}
But, alas, the borrow checker is unhappy with us:
error[E0505]: cannot move out of `poly` because it is borrowed
--> src/lib.rs:58:33
|
55 | let center = &poly.as_ref(py).borrow().center;
| ------------------------
| |
| borrow of `poly` occurs here
| a temporary with access to the borrow is created here ...
...
58 | close_polygons.push(poly);
| ^^^^ move out of `poly` occurs here
59 | }
60 | }
| - ... and the borrow might be used here, when that temporary is dropped and runs the `Drop` code for type `PyRef`
As usual, the borrow checker is correct: we are doing memory crimes.
The simpler fix is to Just Clone, and close_polygons.push(poly.clone())
compiles.
This is actually a very cheap clone, because we only incr
the reference count of the Python object.
However, in this case we can also shorten the borrow by doing a classic Rust trick:
let norm = {
let center = &poly.as_ref(py).borrow().center;
((center[0] - point[0]).square() + (center[1] - point[1]).square()).sqrt()
};
if norm < max_dist {
close_polygons.push(poly)
}
Because poly
is only borrowed in the inner scope, once we reach close_polygons.push
the compiler can know that we no longer hold that reference, and will happily compile the new version.
And finally, we have
$ python measure.py
Took an avg of 2.90ms per iteration
Which is 100x improvement over the original code.
We started out with this Python code:
@dataclass
class Polygon:
x: np.array
y: np.array
_area: float = None
@cached_property
def center(self) -> np.array:
centroid = np.array([self.x, self.y]).mean(axis=1)
return centroid
def area(self) -> float:
...
def find_close_polygons(
polygon_subset: List[Polygon], point: np.array, max_dist: float
) -> List[Polygon]:
close_polygons = []
for poly in polygon_subset:
if np.linalg.norm(poly.center - point) < max_dist:
close_polygons.append(poly)
return close_polygons
# Rest of file (main, select_best_polygon).
We profiled it using py-spy
, and even our most naive, line-to-line translation of find_close_polygons
resulted in more than x10 improvement.
We did a few profile-rewrite-measure iterations until we finally gained x100 improvement in runtime, while keeping the same API as the original library.
Version | Avg time per iteration (ms) | Multiplier |
---|---|---|
Baseline implementation (Python) | 293.41 | 1x |
Naive line-to-line Rust translation of find_close_polygons |
23.44 | 12.50x |
Polygon implementation in Rust |
6.29 | 46.53x |
Optimized allocation implementation in Rust | 2.90 | 101.16x |
The final python code looks like this
import poly_match_rs
from poly_match_rs import find_close_polygons
class Polygon(poly_match_rs.Polygon):
_area: float = None
def area(self) -> float:
...
# Rest of file unchanged (main, select_best_polygon).
which calls this Rust code:
use pyo3::prelude::*;
use ndarray::Array1;
use ndarray_linalg::Scalar;
use numpy::{PyArray1, PyReadonlyArray1, ToPyArray};
#[pyclass(subclass)]
struct Polygon {
x: Array1<f64>,
y: Array1<f64>,
center: Array1<f64>,
}
#[pymethods]
impl Polygon {
#[new]
fn new(x: PyReadonlyArray1<f64>, y: PyReadonlyArray1<f64>) -> Polygon {
let x = x.as_array();
let y = y.as_array();
let center = Array1::from_vec(vec![x.mean().unwrap(), y.mean().unwrap()]);
Polygon {
x: x.to_owned(),
y: y.to_owned(),
center,
}
}
#[getter]
fn x(&self, py: Python<'_>) -> PyResult<Py<PyArray1<f64>>> {
Ok(self.x.to_pyarray(py).to_owned())
}
// Same for `y` and `center`.
}
#[pyfunction]
fn find_close_polygons(
py: Python<'_>,
polygons: Vec<Py<Polygon>>,
point: PyReadonlyArray1<f64>,
max_dist: f64,
) -> PyResult<Vec<Py<Polygon>>> {
let mut close_polygons = vec![];
let point = point.as_array();
for poly in polygons {
let norm = {
let center = &poly.as_ref(py).borrow().center;
((center[0] - point[0]).square() + (center[1] - point[1]).square()).sqrt()
};
if norm < max_dist {
close_polygons.push(poly)
}
}
Ok(close_polygons)
}
#[pymodule]
fn poly_match_rs(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_class::<Polygon>()?;
m.add_function(wrap_pyfunction!(find_close_polygons, m)?)?;
Ok(())
}
Rust (with the help of pyo3) unlocks true native performance for everyday Python code, with minimal compromises.
Python is a superb API for researchers, and crafting fast building blocks with Rust is an extremely powerful combination.
Profiling is super interesting, and it pushes you to truly understand everything that’s happening in your code.
And finally: computers are crazy fast. The next time you wait for something to complete, consider firing up a profiler, you might learn something new 🚀
今天了解到这个库,为一些演示工作的方便而开发。用 Python 来配置界面,主要是输入、输出,然后将输入的参数传入处理方法,将返回值显示在输出。
PS:安装的时候可以看到,这个库有 14M,而且其他的依赖不少。
这只是一个简单的示例:
import gradio as gr
def greet(name):
return "Hello " + name + "!"
# demo = gr.Interface(fn=greet, inputs="text", outputs="text")
demo = gr.Interface(
fn=greet,
inputs=gr.Textbox(lines=2, placeholder="Name Here..."),
outputs="text",
)
demo.launch()
python gradioTest.py
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
以后需要用到这个的时候再来研究研究。
Cygwin 下执行一个 Python 脚本,其中 datetime.now()
获取到的时间居然是 UTF 时间。
执行 date 命令也是如此。
执行 tzselect,三次分别选 Asia,China,Beijing Time,然后就好了。
命令中有提示:如果要永久有效,需要在 ~/.profile 中加入 TZ='Asia/Shanghai'
。
关键是,TZ 其实有配置:
export | grep TZ
declare -x TZ="Asia/Shanghai"
今日发现一个 Python 2.7 的一个大坑:部分版本下 JSON 解析字符串为 str 类型,部分版本解析成 unicode 类型。
Python 2.7 (r27:82500, Apr 12 2016, 21:09:36)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import json; print(repr(json.loads('"a"')));
'a'
Python 2.7.15 (default, Jul 22 2019, 17:38:55)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import json; json.loads('"s"'); exit();
's'
Python 2.7.15 (default, Jun 26 2018, 11:17:18)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import json; json.loads('"s"'); exit();
u's'
Python 2.7.18 (default, Mar 23 2022, 15:07:54)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import json; json.loads('"s"'); exit();
u's'
找到一个相关的 Issue:json.loads() on str should return unicode, not str,
对应 GitHub 上的 Issue#54247。
根据相关内容,这个问题影响了 Python 2.7 的一些版本。
还有:json.loads() returns str instead of unicode for empty strings,
对应 GitHub 上的 Issue#56191。
setup_module(module)
/ teardown_module(module)
引入包的时候执行setup_function(function)
/ teardown_function(function)
setup()
/ teardown()
测试模块载入的时候执行setup_class(cls)
/ teardown_class(cls)
setup_method(self, method)
/ teardown_method(self, method)
setup(self)
/ teardown(self)
nose 语法,会被上面两个方法覆盖Supported nose Idioms
setup()
andteardown()
at module/class/method level: any function or method calledsetup
will be called during the setup phase for each test, same forteardown
.SkipTest
exceptions and markers- setup/teardown decorators
__test__
attribute on modules/classes/functions- general usage of nose utilities