#896 转载:Making Python 100x faster with less than 100 lines of Rust

2023-06-28

A while ago at $work, we had a performance issue with one of our core Python libraries.

This particular library forms the backbone of our 3D processing pipeline. It’s a rather big and complex library which uses NumPy and other scientific Python packages to do a wide range of mathematical and geometrical operations.

Our system also has to work on-prem with limited CPU resources, and while at first it performed well, as the number of concurrent physical users grew we started running into problems and our system struggled to keep up with the load.

We came to the conclusion that we had to make our system at least 50 times faster to handle the increased workload, and we figured that Rust could help us achieve that.

Because the performance problems we encountered are pretty common, we can recreate & solve them right here, in a (not-so-short) article.

So grab a cup of tea (or coffee) and I’ll walk you through (a) the basic underlying problem and (b) a few iterations of optimizations we can apply to solve this problem.

If you want to jump straight to the final code, just to go to the summary.

Our running example

Let’s create a small library, which will exhibit our original performance issues (but does completely arbitrary work).

Imagine you have a list of polygons and a of list points, all in 2D. For business reasons, we want to “match” each point to a single polygon.

Our imaginary library is going to:

  1. Start with an initial list of points and polygons (all in 2D).
  2. For each point, find a much smaller subset of polygons that are closest to it, based on distance from the center.
  3. Out of those polygons, select the “best” one (we are going to use “smallest area” as “best”).

In code, that’s going to look like this (The full code can be found here):

from typing import List, Tuple
import numpy as np
from dataclasses import dataclass
from functools import cached_property

Point = np.array

@dataclass
class Polygon:
    x: np.array
    y: np.array

    @cached_property
    def center(self) -> Point: ...
    def area(self) -> float: ...

def find_close_polygons(polygon_subset: List[Polygon], point: Point, max_dist: float) -> List[Polygon]:
    ...

def select_best_polygon(polygon_sets: List[Tuple[Point, List[Polygon]]]) -> List[Tuple[Point, Polygon]]:
    ...

def main(polygons: List[Polygon], points: np.ndarray) -> List[Tuple[Point, Polygon]]:
    ...

The key difficulty (performance wise) is this mix of Python objects and numpy arrays.

We are going to analyze this in depth in a minute.

It’s worth noting that converting parts of / everything to vectorized numpy might be possible for this toy library, but will be nearly impossible for the real library while making the code much less readable and modifiable, and the gains are going to be limited (here’s a partially vertorized version, which is faster but far from the results we are going to achieve).

Also, using any JIT-based tricks (PyPy / numba) results in very small gains (as we will measure, just to make sure).

Why not just Rewrite It (all) In Rust™?

As compelling as a complete rewrite was, it had a few problems:

  1. The library was already using numpy for a lot of its calculations, so why should we expect Rust to be better?
  2. It is big and complex and very business critical and highly algorithmic, so that would take ~months of work, and our poor on-prem server is dying today.
  3. A bunch of friendly researchers are actively working on said library, implementing better algorithms and doing a lot of experiments. They aren’t going to be very happy to learn a new programming language, waiting for things to compile and fighting with the borrow checker. They would appreciate us not moving their cheese too far.

Dipping our toes

It is time to introduce our friend the profiler.

Python has a built in Profiler (cProfile), but in this case it’s not really the right tool for the job:

  1. It’ll introduce a lot of overhead to all the Python code, and none for native code, so our results might be biased.
  2. We won’t be able to see into native frames, meaning we aren’t going to be able to see into our Rust code.

We are going to use py-spy (GitHub).

py-spy is a sampling profiler which can see into native frames.

They also mercifully publish pre-built wheels to pypi, so we can just pip install py-spy and get to work.

We also need something to measure.

# measure.py
import time
import poly_match
import os

# Reduce noise, actually improve perf in our case.
os.environ["OPENBLAS_NUM_THREADS"] = "1"

polygons, points = poly_match.generate_example()

# We are going to increase this as the code gets faster and faster.
NUM_ITER = 10

t0 = time.perf_counter()
for _ in range(NUM_ITER):
    poly_match.main(polygons, points)
t1 = time.perf_counter()

took = (t1 - t0) / NUM_ITER
print(f"Took and avg of {took * 1000:.2f}ms per iteration")

It’s not very scientific, but it’s going to take us very far.

“Good benchmarking is hard. Having said that, do not stress too much about having a perfect benchmarking setup, particularly when you start optimizing a program.”

~ Nicholas Nethercote, in “The Rust Performance Book”

Running this script will give us our baseline:

$ python measure.py
Took an avg of 293.41ms per iteration

For the original library, we used 50 different examples to make sure all cases are covered.

This matched the overall system perf, meaning we can start working on crushing this number.

Side note: We can also measure using PyPy (we’ll also add a warmup to allow the JIT to do its magic).

$ conda create -n pypyenv -c conda-forge pypy numpy && conda activate pypyenv
$ pypy measure_with_warmup.py
Took an avg of 1495.81ms per iteration

Measure first

So, let’s find out what is so slow here.

$ py-spy record --native -o profile.svg -- python measure.py
py-spy> Sampling process 100 times a second. Press Control-C to exit.

Took an avg of 365.43ms per iteration

py-spy> Stopped sampling because process exited
py-spy> Wrote flamegraph data to 'profile.svg'. Samples: 391 Errors: 0

Already, we can see that the overhead is pretty small. Just for comparison, using cProfile we get this:

$ python -m cProfile measure.py
Took an avg of 546.47ms per iteration
         7551778 function calls (7409483 primitive calls) in 7.806 seconds
         ...

We get this nice, reddish graph called a flamegraph:

Each box is a function, and we can see the relative time we spend in each function, including the functions it is calling to (going down the graph/stack). Try clicking on a the norm box to zoom into it.

Here, the main takeaways are:

  1. The vast majority of time is spent in find_close_polygons.
  2. Most of that time is spend doing norm, which is a numpy function.

So, let’s have a look at find_close_polygons:

def find_close_polygons(
    polygon_subset: List[Polygon], point: np.array, max_dist: float
) -> List[Polygon]:
    close_polygons = []
    for poly in polygon_subset:
        if np.linalg.norm(poly.center - point) < max_dist:
            close_polygons.append(poly)

    return close_polygons

We are going to rewrite this function in Rust.

Before diving into the details, it’s important to notice a few things here:

  1. This function accepts & returns complex objects (Polygonnp.array).
  2. The size of the objects is non-trivial (so copying stuff might cost us).
  3. This function is called “a lot” (so overhead we introduce is probably going to matter).

My first Rust module

pyo3 is a crate for interacting between Python and Rust. It has exceptionally good documentation, and they explain the basic setup here.

We are going to call our crate poly_match_rs, and add function called find_close_polygons.

mkdir poly_match_rs && cd "$_"
pip install maturin
maturin init --bindings pyo3
maturin develop

Starting out, our crate is going to look like this:

use pyo3::prelude::*;

#[pyfunction]
fn find_close_polygons() -> PyResult<()> {
    Ok(())
}

#[pymodule]
fn poly_match_rs(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(find_close_polygons, m)?)?;
    Ok(())
}

We also need to remember to execute maturin develop every time we change the Rust library.

And thats it! Let’s call our new function and see what happens.

>>> poly_match_rs.find_close_polygons(polygons, point, max_dist)
E TypeError: poly_match_rs.poly_match_rs.find_close_polygons() takes no arguments (3 given)

v1 - A naive Rust translation

We’ll start with matching the expected API.

PyO3 is pretty smart about Python to Rust conversions, so that’s going to be pretty easy:

#[pyfunction]
fn find_close_polygons(polygons: Vec<PyObject>, point: PyObject, max_dist: f64) -> PyResult<Vec<PyObject>> {
    Ok(vec![])
}

PyObject is (as the name suggest) a generic “anything goes” Python object. We’ll try to interact with it in a bit.

This should make the program run (albeit incorrectly).

I’m going to just copy and paste the original Python function, and fix the syntax.

#[pyfunction]
fn find_close_polygons(polygons: Vec<PyObject>, point: PyObject, max_dist: f64) -> PyResult<Vec<PyObject>> {
    let mut close_polygons = vec![];

    for poly in polygons {
        if norm(poly.center - point) < max_dist {
            close_polygons.push(poly)
        }
    }

    Ok(close_polygons)
}

Cool, but this won’t compile:

% maturin develop
...

error[E0609]: no field `center` on type `Py<PyAny>`
 --> src/lib.rs:8:22
  |
8 |         if norm(poly.center - point) < max_dist {
  |                      ^^^^^^ unknown field


error[E0425]: cannot find function `norm` in this scope
 --> src/lib.rs:8:12
  |
8 |         if norm(poly.center - point) < max_dist {
  |            ^^^^ not found in this scope


error: aborting due to 2 previous errors ] 58/59: poly_match_rs

We need three crates to implement our function:

# For Rust-native array operations.
ndarray = "0.15"

# For a `norm` function for arrays.
ndarray-linalg = "0.16"

# For accessing numpy-created objects, based on `ndarray`.
numpy = "0.18"

First, lets turn the opaque and generic point: PyObject into something we can work with.

Just like we asked PyO3 for a “Vec of PyObjects”, we can ask for a numpy-array, and it’ll auto-convert the argument for us.

use numpy::PyReadonlyArray1;

#[pyfunction]
fn find_close_polygons(
    // An object which says "I have the GIL", so we can access Python-managed memory.
    py: Python<'_>,
    polygons: Vec<PyObject>,
    // A reference to a numpy array we will be able to access.
    point: PyReadonlyArray1<f64>,
    max_dist: f64,
) -> PyResult<Vec<PyObject>> {
    // Convert to `ndarray::ArrayView1`, a fully operational native array.
    let point = point.as_array();
    ...
}

Because point is now an ArrayView1, we can actually use it. For example:

// Make the `norm` function available.
use ndarray_linalg::Norm;

assert_eq!((point.to_owned() - point).norm(), 0.);

Now we just need to get the center of each polygon, and “cast” it to an ArrayView1.

In PyO3, this looks like this:

let center = poly
  .getattr(py, "center")?                 // Python-style getattr, requires a GIL token (`py`).
  .extract::<PyReadonlyArray1<f64>>(py)?  // Tell PyO3 what to convert the result to.
  .as_array()                             // Like `point` before.
  .to_owned();                            // We need one of the sides of the `-` to be "owned".

It’s a bit of a mouthful, but overall the result is a pretty clear line-to-line translation of the original code:

 1use pyo3::prelude::*;
 2
 3use ndarray_linalg::Norm;
 4use numpy::PyReadonlyArray1;
 5
 6#[pyfunction]
 7fn find_close_polygons(
 8    py: Python<'_>,
 9    polygons: Vec<PyObject>,
10    point: PyReadonlyArray1<f64>,
11    max_dist: f64,
12) -> PyResult<Vec<PyObject>> {
13    let mut close_polygons = vec![];
14    let point = point.as_array();
15    for poly in polygons {
16        let center = poly
17            .getattr(py, "center")?
18            .extract::<PyReadonlyArray1<f64>>(py)?
19            .as_array()
20            .to_owned();
21
22        if (center - point).norm() < max_dist {
23            close_polygons.push(poly)
24        }
25    }
26
27    Ok(close_polygons)
28}

vs the original:

def find_close_polygons(
    polygon_subset: List[Polygon], point: np.array, max_dist: float
) -> List[Polygon]:
    close_polygons = []
    for poly in polygon_subset:
        if np.linalg.norm(poly.center - point) < max_dist:
            close_polygons.append(poly)

    return close_polygons

We expect this version to have some advantage over the original function, but how much?

$ (cd ./poly_match_rs/ && maturin develop)
$ python measure.py
Took an avg of 609.46ms per iteration

So.. Is Rust just super slow? No! We just forgot to ask for speed! If we run with maturin develop --release we get much better results:

$ (cd ./poly_match_rs/ && maturin develop --release)
$ python measure.py
Took an avg of 23.44ms per iteration

Now that is a nice speedup!

We also want to see into our native code, so we are going to enable debug symbols in release. While we are at it, we might as well ask for maximum speed.

# added to Cargo.toml
[profile.release]
debug = true       # Debug symbols for our profiler.
lto = true         # Link-time optimization.
codegen-units = 1  # Slower compilation but faster code.

v2 - Rewrite even more in Rust

Now, using the --native flag in py-spy is going to show us both Python and our new native code.

Running py-spy again

$ py-spy record --native -o profile.svg -- python measure.py
py-spy> Sampling process 100 times a second. Press Control-C to exit.

we get this flamegraph (non-red colors are added to so we can refer to them):

Looking at the profiler output, we can see a few interesting things:

  1. The relative size of find_close_polygons::...::trampoline (the symbol Python directly calls) and __pyfunction_find_close_polygons (our actual implementation).
  2. Hovering, they are 95% vs 88% of samples, so the overhead is pretty small.
  3. The actual logic (if (center - point).norm() < max_dist { ... }) which is lib_v1.rs:22 (very small box on the right), is about 9% of the total runtime.
  4. So x10 improvement should still be possible!
  5. Most of the time is spent in lib_v1.rs:16, which is poly.getattr(...).extract(...) and if we zoom in we can see is really just getattr and getting the underlying array using as_array.

The conclusion here is that we need to focus on solving the 3rd point, and the way to do that is to Rewrite Polygon in Rust.

Let’s look at our target:

@dataclass
class Polygon:
    x: np.array
    y: np.array
    _area: float = None

    @cached_property
    def center(self) -> np.array:
        centroid = np.array([self.x, self.y]).mean(axis=1)
        return centroid

    def area(self) -> float:
        if self._area is None:
            self._area = 0.5 * np.abs(
                np.dot(self.x, np.roll(self.y, 1)) - np.dot(self.y, np.roll(self.x, 1))
            )
        return self._area

We’ll want to keep the existing API as much as possible, but we don’t really need area to be that fast (for now).

The actual class might have additional complex stuff, like a merge method which uses ConvexHull from scipy.spatial.

To cut costs (and limit the scope of this already long article), we will only move the “core” functionality of Polygon to Rust, and subclass that from Python to implement the rest of the API.

Our struct is going to look like this:

// `Array1` is a 1d array, and the `numpy` crate will play nicely with it.
use ndarray::Array1;

// `subclass` tells PyO3 to allow subclassing this in Python.
#[pyclass(subclass)]
struct Polygon {
    x: Array1<f64>,
    y: Array1<f64>,
    center: Array1<f64>,
}

Now we need to actually implement it. We want to expose poly.{x, y, center} as:

  1. Properties.
  2. numpy Arrays.

We also need a constructor so Python can create new Polygons.

use numpy::{PyArray1, PyReadonlyArray1, ToPyArray};

#[pymethods]
impl Polygon {
    #[new]
    fn new(x: PyReadonlyArray1<f64>, y: PyReadonlyArray1<f64>) -> Polygon {
        let x = x.as_array();
        let y = y.as_array();
        let center = Array1::from_vec(vec![x.mean().unwrap(), y.mean().unwrap()]);

        Polygon {
            x: x.to_owned(),
            y: y.to_owned(),
            center,
        }
    }

    // the `Py<..>` in the return type is a way of saying "an Object owned by Python".
    #[getter]
    fn x(&self, py: Python<'_>) -> PyResult<Py<PyArray1<f64>>> {
        Ok(self.x.to_pyarray(py).to_owned()) // Create a Python-owned, numpy version of `x`.
    }

    // Same for `y` and `center`.
}

We need to add our new struct as a class to the module:

#[pymodule]
fn poly_match_rs(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_class::<Polygon>()?; // new.
    m.add_function(wrap_pyfunction!(find_close_polygons, m)?)?;
    Ok(())
}

And now we can update the Python code to use it:

class Polygon(poly_match_rs.Polygon):
    _area: float = None

    def area(self) -> float:
        ...

We can compile it and it’ll actually work, but it’ll be much slower! (Remember that xy, and center will now need to create a new numpy array on each access).

To actually improve performance, we need to extract our original Rust-based Polygon from the list of Python-Polygons.

PyO3 is very flexible with this type of operation, so there are a few ways we could do it. One limit we have is that we also need to return Python-Polygons, and we don’t want to do any cloning of the actual data.

It’s possible to manually call .extract::<Polygon>(py)? on each PyObjects, but we ask PyO3 to give us Py<Polygon> directly.

This is a reference to a Python-owned object, which we expect to contain an instance (or a subclass, in our case) of a native pyclass struct.

45#[pyfunction]
46fn find_close_polygons(
47    py: Python<'_>,
48    polygons: Vec<Py<Polygon>>,             // References to Python-owned objects.
49    point: PyReadonlyArray1<f64>,
50    max_dist: f64,
51) -> PyResult<Vec<Py<Polygon>>> {           // Return the same `Py` references, unmodified.
52    let mut close_polygons = vec![];
53    let point = point.as_array();
54    for poly in polygons {
55        let center = poly.borrow(py).center // Need to use the GIL (`py`) to borrow the underlying `Polygon`.
56            .to_owned();
57
58        if (center - point).norm() < max_dist {
59            close_polygons.push(poly)
60        }
61    }
62
63    Ok(close_polygons)
64}

Let’s see what we get using this code:

$ python measure.py
Took an avg of 6.29ms per iteration

We are nearly there! Just x2 to go!

v3 - Avoid allocations

Let’s fire up the profiler one more time.

  1. We start to see select_best_polygon, which now calls some Rust code (when it gets the x & y vectors)
  2. We could fix that, but that’s a very small potential improvement (maybe 10%)
  3. We see we spend about 20% the time on extract_argument (under lib_v2.rs:48), so we are still paying quite a lot on overhead!
  4. But most of the time is in PyIterator::next and PyTypeInfo::is_type_of, which aren’t easy to fix.
  5. We see a bunch of time spent allocating stuff!
  6. lib_v2.rs:58 is our if, and we see drop_in_place and to_owned.
  7. The actual line is about 35% of the overall time, which is a lot more than we expect: this should be the “fast bit” with all the data in place.

Let’s tackle the last point.

This our problematic snippet:

let center = poly.borrow(py).center
    .to_owned();

if (center - point).norm() < max_dist { ... }

What we want is to avoid that to_owned. But we need an owned object for norm, so we’ll have to implement that manually.

(The reason we can improve on ndarray here is that we know that our array is actually just 2 f32s).

This would look like this:

use ndarray_linalg::Scalar;

let center = &poly.as_ref(py).borrow().center;

if ((center[0] - point[0]).square() + (center[1] - point[1]).square()).sqrt() < max_dist {
    close_polygons.push(poly)
}

But, alas, the borrow checker is unhappy with us:

error[E0505]: cannot move out of `poly` because it is borrowed
  --> src/lib.rs:58:33
   |
55 |         let center = &poly.as_ref(py).borrow().center;
   |                       ------------------------
   |                       |
   |                       borrow of `poly` occurs here
   |                       a temporary with access to the borrow is created here ...
...
58 |             close_polygons.push(poly);
   |                                 ^^^^ move out of `poly` occurs here
59 |         }
60 |     }
   |     - ... and the borrow might be used here, when that temporary is dropped and runs the `Drop` code for type `PyRef`

As usual, the borrow checker is correct: we are doing memory crimes.

The simpler fix is to Just Clone, and close_polygons.push(poly.clone()) compiles.

This is actually a very cheap clone, because we only incr the reference count of the Python object.

However, in this case we can also shorten the borrow by doing a classic Rust trick:

let norm = {
    let center = &poly.as_ref(py).borrow().center;

    ((center[0] - point[0]).square() + (center[1] - point[1]).square()).sqrt()
};

if norm < max_dist {
    close_polygons.push(poly)
}

Because poly is only borrowed in the inner scope, once we reach close_polygons.push the compiler can know that we no longer hold that reference, and will happily compile the new version.

And finally, we have

$ python measure.py
Took an avg of 2.90ms per iteration

Which is 100x improvement over the original code.

Summary

We started out with this Python code:

@dataclass
class Polygon:
    x: np.array
    y: np.array
    _area: float = None

    @cached_property
    def center(self) -> np.array:
        centroid = np.array([self.x, self.y]).mean(axis=1)
        return centroid

    def area(self) -> float:
        ...

def find_close_polygons(
    polygon_subset: List[Polygon], point: np.array, max_dist: float
) -> List[Polygon]:
    close_polygons = []
    for poly in polygon_subset:
        if np.linalg.norm(poly.center - point) < max_dist:
            close_polygons.append(poly)

    return close_polygons

# Rest of file (main, select_best_polygon).

We profiled it using py-spy, and even our most naive, line-to-line translation of find_close_polygons resulted in more than x10 improvement.

We did a few profile-rewrite-measure iterations until we finally gained x100 improvement in runtime, while keeping the same API as the original library.

Version Avg time per iteration (ms) Multiplier
Baseline implementation (Python) 293.41 1x
Naive line-to-line Rust translation of find_close_polygons 23.44 12.50x
Polygon implementation in Rust 6.29 46.53x
Optimized allocation implementation in Rust 2.90 101.16x

The final python code looks like this

import poly_match_rs
from poly_match_rs import find_close_polygons

class Polygon(poly_match_rs.Polygon):
    _area: float = None

    def area(self) -> float:
        ...

# Rest of file unchanged (main, select_best_polygon).

which calls this Rust code:

use pyo3::prelude::*;

use ndarray::Array1;
use ndarray_linalg::Scalar;
use numpy::{PyArray1, PyReadonlyArray1, ToPyArray};

#[pyclass(subclass)]
struct Polygon {
    x: Array1<f64>,
    y: Array1<f64>,
    center: Array1<f64>,
}

#[pymethods]
impl Polygon {
    #[new]
    fn new(x: PyReadonlyArray1<f64>, y: PyReadonlyArray1<f64>) -> Polygon {
        let x = x.as_array();
        let y = y.as_array();
        let center = Array1::from_vec(vec![x.mean().unwrap(), y.mean().unwrap()]);

        Polygon {
            x: x.to_owned(),
            y: y.to_owned(),
            center,
        }
    }

    #[getter]
    fn x(&self, py: Python<'_>) -> PyResult<Py<PyArray1<f64>>> {
        Ok(self.x.to_pyarray(py).to_owned())
    }

    // Same for `y` and `center`.
}

#[pyfunction]
fn find_close_polygons(
    py: Python<'_>,
    polygons: Vec<Py<Polygon>>,
    point: PyReadonlyArray1<f64>,
    max_dist: f64,
) -> PyResult<Vec<Py<Polygon>>> {
    let mut close_polygons = vec![];
    let point = point.as_array();
    for poly in polygons {
        let norm = {
            let center = &poly.as_ref(py).borrow().center;

            ((center[0] - point[0]).square() + (center[1] - point[1]).square()).sqrt()
        };

        if norm < max_dist {
            close_polygons.push(poly)
        }
    }

    Ok(close_polygons)
}

#[pymodule]
fn poly_match_rs(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_class::<Polygon>()?;
    m.add_function(wrap_pyfunction!(find_close_polygons, m)?)?;
    Ok(())
}

Takeaways

  • Rust (with the help of pyo3) unlocks true native performance for everyday Python code, with minimal compromises.

  • Python is a superb API for researchers, and crafting fast building blocks with Rust is an extremely powerful combination.

  • Profiling is super interesting, and it pushes you to truly understand everything that’s happening in your code.

And finally: computers are crazy fast. The next time you wait for something to complete, consider firing up a profiler, you might learn something new 🚀

#894 转载:进程模型、线程模型,PostgreSQL 正面临抉择

2023-06-20

面向进程模型是一种数据库系统的架构模型,核心思想是将不同的数据库服务分配给不同的进程,每个进程独立运行,相互之间通过进程间通信(IPC)进行协作。
这种模型被广泛应用于数据库系统中,例如 PostgreSQL 数据库系统。

正如上文所说,进程模型使得 PostgreSQL 可以将不同的服务分配给多个进程独立运行,每个进程负责不同的任务,例如查询处理、并发控制、锁管理等。
进程模型还可以可以保证系统的稳定性和可靠性。当一个进程出现问题时,不会影响到其他进程的正常运行,从而提高了系统的可用性。

这样的特点使得 PostgreSQL 可以同时处理大量的并发请求,提高了系统的性能和响应速度;
除此之外,PostgreSQL 还可以很容易地进行水平扩展,增加更多的节点以应对更高的负载。
不过与此同时,也让 PostgreSQL 面对着管理和维护成本相对较高、需要较为复杂的进程间通信和协调机制、需要消耗更多的系统资源等缺点。

6 月初,Heikki Linnakangas 发布了将 PostgreSQL 转为线程模型的提案。

线程模型是一种数据库系统的架构模型,与面向进程模型类似,它是将不同的数据库服务分配给不同的线程,每个线程独立运行,相互之间通过线程间通信进行协作。线程模型在一些轻量级的数据库系统中得到广泛应用,例如 SQLite。

线程模型与进程模型的最大区别在于,线程模型中所有的线程共享同一个进程的地址空间,每个线程有自己的堆栈,共享代码段和数据段。这意味着线程之间可以直接访问同一份内存,因此线程间通信的成本相对较低,不过这也意味着线程间的数据共享可能会带来安全性问题。

从进程模型转换成线程模型的优缺点:

优点

  1. 更轻量级:线程模型相对于进程模型更加轻量级,可以更加高效地使用系统资源,尤其是在单机上运行多个实例时,线程模型可以将多个实例运行在同一个进程中,减少了系统调用和进程间通信带来的开销。
  2. 更高的响应速度:线程模型中线程之间的通信成本相对较低,因此在高并发场景下具有更高的响应速度。
  3. 更少的内存占用:线程模型中线程共享同一份地址空间,因此可以避免进程模型中同一份代码和数据被多个进程重复加载到内存的问题,节省了系统内存占用。

缺点

  1. 安全性问题:线程之间共享同一份内存,可能会带来安全性问题,例如数据竞争和锁竞争等。
  2. 可靠性问题:线程模型中一个线程崩溃可能会影响到整个进程的稳定性和可靠性。
  3. 多线程编程难度较大:线程之间的通信需要进行同步和互斥,编写多线程程序的难度相对较大。

PostgreSQL 开发者、EnterpriseDB 高级数据库架构师 Andres Freund 指出:

我认为原有流程模型开始产生诸多限制,这个问题在大型设备上体现得尤其明显。
跨进程上下文切换所带来的开销,原本就比在同一进程内的不同线程间切换要更高 —— 我估计这种开销还将持续提升。
面对大量连接,整个体系最终一定会因 TLB 未命中而浪费大量时间。
这是进程模型无法跨进程共享 TLB 的天然属性造成的必然结果。

目前这还仅仅只是一项提议,并且由于 PostgreSQL 被广泛用于生产环境,转换到线程模型的过程需要非常谨慎。开发团队需要在不影响现有生产环境的情况下测试新的线程模型,以确保其稳定性和可靠性。即便这个提议通过,这个转化过程肯定也是无法通过单一版本彻底完成,从网上的各方评价来看,目前大多数人都支持这项提议。

#893 那些个国际法院

2023-06-17

联合国国际法庭

International Court of Justice,ICJ

联合国六大主要机构之一和最主要的司法机关,是主权国家政府间的民事司法裁判机构,根据《联合国宪章》于 1945 年 6 月成立。

前身是国际联盟的常设国际法院(Permanent Court of International Justice)。

因为位于荷兰海牙,所以又叫海牙国际法庭。

官网:https://icj-cij.org/ch

PS:联合国另外五个主要机关:

  1. 联合国秘书处
  2. 联合国安理会
  3. 联合国大会
  4. 经济及社会理事会
  5. 联合国托管理事会

常设仲裁法院

Permanent Court of Arbitration,PCA

世界上现存最古老的国际仲裁机构。根据 1899 年《关于和平解决国际争端的海牙公约》(第一次海牙和平会议)以及 1907 年《和平解决国际争端的公约》(第二次海牙和平会议)成立。
中国在清朝的时候签署协议,成为缔约国。

总部位于海牙和平宫。PS:ICJ 成立之后,也在海牙和平宫办公。两个组织完全相互独立。

不只是处理国家之间的纠纷,还可以接受委托处理私人之间,或者国家和私人之间的一些纠纷。
不过实际上,该组织基本上没有怎么受理过案子,最近还是因为菲律宾控告中国案(南海仲裁案)而被人们注意到。

关于南海仲裁案

  1. 中方态度:常设仲裁法院对于领土问题没有司法管辖权,因此中国不接受、不参与仲裁(2013 年 2 月 19 日,中国正式拒绝参与仲裁案)。
  2. 2016 年 7 月 12 日,在中方全程没有参与的情况下,仲裁庭作出判决:中国对南海不存在历史性所有权,九段线不合法(民国时期是 11 段线),要求中国停止“填海造陆”。
  3. 五个法官中有四人是国际海洋法法庭现任或前任法官,菲律宾指派一位法官代表菲律宾,由于中方没有参加,剩余四人全部由国际海洋法法庭时任庭长柳井俊二指派。
  4. 和预期一样,欧美日澳,部分东盟国家支持仲裁,中国的国际外交基本盘支持中方立场。
  5. 仲裁期间,南海领域中美军事对峙。持续一段时间之后,美军航母自行撤离。
  6. 仲裁结束之后,新任菲律宾总统立即和中国恢复关系,不提此次仲裁。
  7. 联合国、国际法院、国际海洋法法庭先后发表声明,这次仲裁和他们没有关系。

国际刑事法院

International Criminal Court,ICC

根据 2002 年 7 月 1 日开始生效的《国际刑事法院罗马规约》成立。

同样总部位于荷兰海牙,其主要功能是对犯有灭绝种族罪、危害人类罪、战争罪、侵略罪的个人进行起诉和审判。

根据维基百科信息:有 123 个国家已经签署并批准该规约,成为国际刑事法院的成员国。有 31 个国家仅仅只是签署。
五个安理会常任理事国中,英法是成员国,中国没有签署,俄美先签署,随后立即撤回。

因为普京已经被 ICC 判战争罪,按理来说成员国有义务逮捕他。
然后本届金砖国家领导人峰会由 ICC 成员国南非举办,如何处理这个问题备受瞩目。

国家刑事警察组织

International Criminal Police Organization,ICPO
INTERPOL

除联合国之外,第二大国际组织。

和 ICC 没有关联,是国际警务合作的一个平台,主要责任为通报逃税、恐怖活动、有组织犯罪、毒品、走私军火、偷渡、洗钱、儿童色情、贪污和高科技犯罪等大型严重跨国犯罪,不过并无当地执法权。

国际海洋法法庭

International Tribunal for the Law of the Sea,ITLOS

根据《联合国海洋法公约》建立的一个法律组织。始建于 1996 年,总部位于德国汉堡市,是专门审理海洋法案件的国际组织。现为联合国大会观察员组织。

国际仲裁法庭

International Court of Arbitration

国际商会(International Chamber of Commerce)(总部位于巴黎)下属仲裁服务的机构。
不要和 PCA 搞混淆了。

#892 CPU 架构

2023-06-12

OSChina 上发布的资讯《Debian 13 “Trixie” 将提供 RISC-V 64 位支持》中提到:

最新发布的 Debian 12 支持九种架构: AMD64、AArch64、ARMEL、ARMHF、i386、MIPS、64 位 MIPS、POWER 和 IBM System Z。

这应该是现在最流行的 CPU 架构了吧。

  • x86_64
  • AMD64
  • i386
  • ARM
  • AArch64
  • ARMEL:Little Endian
  • ARMHF:Hard Float,使用硬件浮点寄存器和指令来提高计算性能
  • MIPS
  • MIPS
  • MIPS64
  • POWER
  • IBM System Z
  • RISC-V

#891 爸小猪故事

2023-06-11

从前,有三只可爱的小猪,它们是一家的兄弟姐妹。老大哥哥叫皮鲁,老二妹妹叫皮娜,老三弟弟叫皮休。他们住在一个美丽的森林中,一起快乐地生活着。

#890 头上的辫子与心里的辫子

2023-06-11

假发

近日有新闻讨论香港法官至今保留原宗主国戴假发的传统(陋习)。

先说一下我对这个司法假发的一点了解。
16 世纪,欧洲贵族社会性病流行,其中症状之一就是脱发严重。于是,随着时间的流逝,上层社会戴假发的习俗慢慢演化成了地位的象征。
医疗技术发展起来之后,带假发的这种习俗就迅速消亡了,谁会喜欢没事带着这么个劳什子呢,想想都不舒服嘛。

不过,因为英国有立法,规定法律从业人士的着装要求,其中就包括必须戴假发(WTF)。
一种长发及肩,用在刑法庭,一种短些,用在民法庭。

PS:可以搜索一下带假发的法官图片,简直丑的一批,我想应该是没有人会喜欢这玩意儿吧。
PS:美国第三任总统托马斯·杰斐逊:“(英国法官)像躲在棉絮下面向外窥视的老鼠”。

因为英国到处侵略与殖民,将本国的法律推及到很多地方,包括中国香港。
所以,部分地区的司法界保留了这个传统直到近代。

根据维基百科的信息,英国、加拿大、澳大利亚、新西兰已经部分废除了假发在法庭中的使用。但香港司法界至今坚持必须戴假发上庭。
我对司法制度不了解,但是这个假发就是一个配饰,完全无关司法精神吧,为什么必须用法律法规来强制佩戴呢?
这个和清朝遗民舍不得辫子是何其相识!

辫子

说到辫子,不禁想起了《建党大业》中,辜鸿铭的经典台词:“我的辫子长在脑后,笑我的人,辫子长在心头。老夫头上的辫子是有形的,而诸公心中的辫子却是无形的。”
这句话说的其实也挺在理的,因为比人家弱,就什么都要跟人家学习,觉得自己家什么都是糟粕,这何尝不是辫子长在心头呢!
辫子就是不自信,就是觉得自己低人一头,就是外国的月亮比较圆。

假发是外在的辫子,皇民心态是内在的辫子。而我们希望他们能彻底抹除原宗主国的痕迹,有没有一点辫子的意味呢?
我的意思是,从发展的眼光看问题,他们这顶可笑的假发早晚是会消失的,但我们是否过于着急了,这种心态是不是对于过去的屈辱历史过于耿耿于怀了,算不算文化上的不自信呢?

参考资料与拓展阅读

#889 关于自托管邮件服务

2023-06-10

看到科技爱好者周刊推荐的一篇文章,介绍了自托管邮件服务的一些现状,主要是 Gmail 这样的主流邮箱服务提供商(MSP)拒收来自自托管邮件服务的邮件(或标记成垃圾邮件),导致自托管邮件服务的运营遇到很大的困难。

电子邮件在因特网没有出现之前就已经诞生,简单、开放,易于开发和使用,人人都能成为 Email 网络中的一个节点。实际上,大部分人都是使用的一些大 MSP 的服务,但也有部分人(或者组织)使用的是自己部署的邮件服务。他们会发现哪怕所有应该做的都做了,比如 SPF,DKIM,DMARC,他们的邮件还是经常无法正常投递出去(被拒、限流等),或者在收件人的垃圾文件夹中。

在一定程度上,MSP 的做法也是可以理解的,垃圾邮件泛滥成灾,确实防不胜防。因为邮件服务本身是毫无门槛。除非上一个手机实名制这样的严格管控,或许能解决这个问题。

文章提出的主要价值点是,什么情况下我们有必要自建邮箱服务?

  1. 准备好投入很多时间和精力来维护这套系统
  2. 搭建系统
  3. 留意 SPF 和 DMARC 报告
  4. 有服务器管理能力(Linux,Docker)
  5. ISP 支持开放 25、143、465、587、993 端口
  6. 静态 IP + rDNS 配置权限
  7. 一个合适的域名

#888 gotip 命令

2023-05-30

gotip 是官方推出的,从最新的开发分支下拉代码,编译生成 go 运行时的工具。
如果希望体验最新特性,可以在编译成功之后,可以直接用 gotip 取代 go 命令用来执行 go 程序。
gotip 就可以理解为开发版的 go。

go install golang.org/dl/gotip@latest

gotip
gotip: not downloaded. Run 'gotip download' to install to C:\Users\nosch\sdk\gotip

gotip download
Cloning into 'C:\Users\nosch\sdk\gotip'...
...
Building Go cmd/dist using C:\Program Files\Go. (go1.20.5 windows/amd64)
Building Go toolchain1 using C:\Program Files\Go.
Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
Building Go toolchain2 using go_bootstrap and Go toolchain1.
Building Go toolchain3 using go_bootstrap and Go toolchain2.
Building packages and commands for windows/amd64.

成功的时候:
---
Installed Go for windows/amd64 in C:\Users\nosch\sdk\gotip
Installed commands in C:\Users\nosch\sdk\gotip
Success. You may now run 'gotip'!

失败的时候:
# runtime/cgo
gcc_libinit_windows.c: In function '_cgo_beginthread':
gcc_libinit_windows.c:143:27: error: implicit declaration of function '_beginthread'; did you mean '_cgo_beginthread'? [-Werror=implicit-function-declaration]
  143 |                 thandle = _beginthread(func, 0, arg);
      |                           ^~~~~~~~~~~~
      |                           _cgo_beginthread
cc1: all warnings being treated as errors
go tool dist: FAILED: C:\Users\nosch\sdk\gotip\pkg\tool\windows_amd64\go_bootstrap install std: exit status 1
Success. You may now run 'gotip'!

#887 Go set 类型

2023-05-29

Go 并没有支持集合类型,我们需要自己实现:

https://go.dev/play/p/uVDCiN4Cbpt

package main

import "fmt"

type Set map[string]bool

func (s Set) Add(item string) {
    s[item] = true
}

func (s Set) Remove(item string) {
    delete(s, item)
}

func (s Set) Contains(item string) bool {
    _, exists := s[item]
    return exists
}

func main() {
    mySet := make(Set)
    mySet.Add("apple")
    mySet.Add("banana")
    mySet.Add("orange")

    for item := range mySet {
        fmt.Println(item)
    }

    fmt.Println(mySet.Contains("apple")) // 输出: true
    fmt.Println(mySet.Contains("grape")) // 输出: false

    mySet.Remove("banana")
    fmt.Println(mySet.Contains("banana")) // 输出: false
}

注意:

  1. 使用 map 做底层存储,因此实现的 set 也是无序的
  2. map 不是线程安全的,如果有并发操作,需要加锁
  3. 如果真的要使用集合类型,应该再扩充一下交集,差集等方法

改进

参考 https://github.com/deckarep/golang-set 的设计:

https://go.dev/play/p/BKWT84lXfuz

package main

import "fmt"

type Set[T comparable] map[T]struct{}

func (s Set[T]) Add(item T) {
    s[item] = struct{}{}
}

func (s Set[T]) Remove(item T) {
    delete(s, item)
}

func (s Set[T]) Contains(item T) bool {
    _, exists := s[item]
    return exists
}

func main() {
    mySet := make(Set[string])
    mySet.Add("apple")
    mySet.Add("banana")
    mySet.Add("orange")

    for item := range mySet {
        fmt.Println(item)
    }

    fmt.Println(mySet.Contains("apple")) // 输出: true
    fmt.Println(mySet.Contains("grape")) // 输出: false

    mySet.Remove("banana")
    fmt.Println(mySet.Contains("banana")) // 输出: false
}

优化点

  1. 空结构体不占空间
  2. 泛型让代码复用性更好