Linux
2021-12-12
stat /tmp/edge_shutdown_crash.txt
文件:/tmp/edge_shutdown_crash.txt
大小:2 块:8 IO 块:4096 普通文件
设备:801h/2049d Inode:9306564 硬链接:1
权限:(0664/-rw-rw-r--) Uid:( 1000/ markjour) Gid:( 1000/ markjour)
最近访问:2021-12-12 10:24:17.318846619 +0800
最近更改:2021-12-12 10:42:27.982887293 +0800
最近改动:2021-12-12 10:42:27.982887293 +0800
创建时间:2021-12-12 10:24:17.318846619 +0800
# %w time of file birth, human-readable; - if unknown
# %W time of file birth, seconds since Epoch; 0 if unknown
# %x time of last access, human-readable
# %X time of last access, seconds since Epoch
# %y time of last data modification, human-readable
# %Y time of last data modification, seconds since Epoch
# %z time of last status change, human-readable
# %Z time of last status change, seconds since Epoch
stat -c "%w" /tmp/edge_shutdown_crash.txt
2021-12-12 10:24:17.318846619 +0800
sudo debugfs -R 'stat /tmp/edge_shutdown_crash.txt' /dev/sda1 | cat
debugfs 1.46.3 (27-Jul-2021)
Inode: 9306564 Type: regular Mode: 0664 Flags: 0x80000
Generation: 3654863154 Version: 0x00000000:00000001
User: 1000 Group: 1000 Project: 0 Size: 2
File ACL: 0
Links: 1 Blockcount: 8
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x61b56193:ea56adf4 -- Sun Dec 12 10:42:27 2021
atime: 0x61b55d51:4c04da6c -- Sun Dec 12 10:24:17 2021
mtime: 0x61b56193:ea56adf4 -- Sun Dec 12 10:42:27 2021
crtime: 0x61b55d51:4c04da6c -- Sun Dec 12 10:24:17 2021
Size of extra inode fields: 32
Inode checksum: 0xd223c644
EXTENTS:
(0):39635628
架构 微服务 etcd
2021-12-09
简介
etcd 是 CoreOS 的一个子项目,KV 数据库,原是设计用于存储集群共享配置数据,基于 Apache 协议开源。
PS: CoreOS 2018 年被 RedHat 收购,然后更名为 Container Linux, 后来和 RedHat 的 Project Atomic 合并,形成 Fedora CoreOS 项目。
本质是一个 KV 存储,从这个角度看和 Redis 很像。但:
Item |
etcd |
Redis |
性能 |
高 |
非常高 |
存储方式 |
磁盘 + 内存缓存 |
内存 |
持久化 |
boltdb |
AOF + rdb |
数据类型 |
string |
丰富 |
API |
gRPC |
RESP |
一致性 |
Raft |
复制 |
PS:性能:更高 QPS + 更低延迟
PS:RESP:REdis Serialization Protocol(TCP 文本协议)
作用
本质就是一个轻量级的 KV 存储系统,由于其高并发、支持分布式(强一致性)、支持版本控制、支持实时更新通知等特点,所以常用于:
- 服务发现:帮助分布式系统中的服务相互找到并建立连接。
- 配置管理:集中管理和动态更新分布式系统的配置信息。
- 分布式锁:确保在分布式环境中,同一时间只有一个客户端能获取锁,避免资源冲突。
概念
-
Raft 算法
Raft 是一种用于管理复制日志的一致性算法,节点之间通过复制日志来达成一致状态。
当客户端发起写请求时,领导者(Leader)节点接收请求并将其记录到日志中,接着把日志条目复制到其他跟随者(Follower)节点。
只有当多数节点都成功复制了该日志条目后,该条目才会被提交,数据更新才会被应用。
-
角色相关概念
- Node(节点)/Member(成员):指 etcd 集群中的单个服务器实例,每个节点都有一个唯一标识,存储着部分或全部的数据副本,并且参与集群的一致性协议。
- Leader(领导者):负责处理客户端写请求和协调日志复制的节点。同一时刻集群中只有一个领导者,它会接收客户端的写操作,并将这些操作广播给其他跟随者节点。
- Follower(跟随者):跟随领导者的指令,接收并复制领导者发送的日志条目。跟随者不直接处理客户端的写请求,只响应领导者的请求。
- Candidate(候选人):在 Raft 算法的选举过程中,当跟随者在一定时间内没有收到领导者的心跳信息时,它会转变为候选人状态。
候选人会发起新一轮的选举,向其他节点请求投票。如果获得多数节点的投票,候选人就会成为新的领导者。
- Peer(对等节点):指集群中地位平等的其他节点,节点之间相互通信以实现数据复制、选举等功能。
-
Term(任期)
逻辑时间概念,用于划分不同的选举周期,确保选举的正确性和防止过期的领导者重新掌权,同时在日志复制和消息传递中也起到重要的同步和协调作用。
每个任期从一次选举开始,有一个唯一的递增编号。
在一个任期内,最多只能有一个领导者。
如果选举失败或领导者故障,会开启一个新的任期。
结构


和 Redis 对比
- 两者都是 key-value 类型存储
- Redis 是 C 写的,而 etcd 是 Go 写的
- Redis 支持多种数据类型,而 etcd 不支持
- Redis 支持 Lua 编程,而 etcd 不支持
- Redis 支持简单的权限控制,而 etcd 不支持
- Redis 采用 RESP 私有协议,而 etcd 采用 GRPC 或 HTTP (JSON)
- Redis 支持复制集的方式同步数据,而 etcd 则是通过 Raft 实现强一致性
- Redis 通过 RDB (快照) / AOF (增量) 的方式持久化
部署 (Ubuntu)
sudo apt install -y etcd
etcd --version
sudo systemctl start etcd
sudo systemctl status etcd
部署 (Docker)
docker-compose.yml
:
version: "3"
networks:
etcd_net:
driver: bridge
ipam:
driver: default
config:
- subnet: ${NETWORK_CONFIG_SUBNET}
services:
etcd-0:
networks:
etcd_net:
ipv4_address: ${ETCD_01_NETWORKS_ETCD_NET_ADDRESS}
image: quay.io/coreos/etcd:latest
ports:
- ${ETCD_01_NETWORKS_ETCD_NET_ADDRESS}:4001:4001
- ${ETCD_01_NETWORKS_ETCD_NET_ADDRESS}:2380:2380
- ${ETCD_01_NETWORKS_ETCD_NET_ADDRESS}:2379:2379
hostname: etcd-0
environment:
- GOMAXPROCS=2
command: >-
/usr/local/bin/etcd
-name etcd-0
-advertise-client-urls http://etcd-0:2379,http://etcd-0:4001
-listen-client-urls http://${ETCD_01_NETWORKS_ETCD_NET_ADDRESS}:2379,http://${ETCD_01_NETWORKS_ETCD_NET_ADDRESS}:4001
-initial-advertise-peer-urls http://etcd-0:2380
-listen-peer-urls http://${ETCD_01_NETWORKS_ETCD_NET_ADDRESS}:2380
-initial-cluster-token etcd-cluster
-initial-cluster etcd-0=http://etcd-0:2380,etcd-1=http://etcd-1:2380,etcd-2=http://etcd-2:2380
-initial-cluster-state new
etcd-1:
networks:
etcd_net:
ipv4_address: ${ETCD_02_NETWORKS_ETCD_NET_ADDRESS}
image: quay.io/coreos/etcd:latest
ports:
- ${ETCD_02_NETWORKS_ETCD_NET_ADDRESS}:4001:4001
- ${ETCD_02_NETWORKS_ETCD_NET_ADDRESS}:2380:2380
- ${ETCD_02_NETWORKS_ETCD_NET_ADDRESS}:2379:2379
hostname: etcd-1
environment:
- GOMAXPROCS=2
command: >-
/usr/local/bin/etcd
-name etcd-1
-advertise-client-urls http://etcd-1:2379,http://etcd-1:4001
-listen-client-urls http://${ETCD_02_NETWORKS_ETCD_NET_ADDRESS}:2379,http://${ETCD_02_NETWORKS_ETCD_NET_ADDRESS}:4001
-initial-advertise-peer-urls http://etcd-1:2380
-listen-peer-urls http://${ETCD_02_NETWORKS_ETCD_NET_ADDRESS}:2380
-initial-cluster-token etcd-cluster
-initial-cluster etcd-0=http://etcd-0:2380,etcd-1=http://etcd-1:2380,etcd-2=http://etcd-2:2380
-initial-cluster-state new
etcd-2:
networks:
etcd_net:
ipv4_address: ${ETCD_03_NETWORKS_ETCD_NET_ADDRESS}
image: quay.io/coreos/etcd:latest
ports:
- ${ETCD_03_NETWORKS_ETCD_NET_ADDRESS}:4001:4001
- ${ETCD_03_NETWORKS_ETCD_NET_ADDRESS}:2380:2380
- ${ETCD_03_NETWORKS_ETCD_NET_ADDRESS}:2379:2379
hostname: etcd-2
environment:
- GOMAXPROCS=2
command: >-
/usr/local/bin/etcd
-name etcd-2
-advertise-client-urls http://etcd-2:2379,http://etcd-2:4001
-listen-client-urls http://${ETCD_03_NETWORKS_ETCD_NET_ADDRESS}:2379,http://${ETCD_03_NETWORKS_ETCD_NET_ADDRESS}:4001
-initial-advertise-peer-urls http://etcd-2:2380
-listen-peer-urls http://${ETCD_03_NETWORKS_ETCD_NET_ADDRESS}:2380
-initial-cluster-token etcd-cluster
-initial-cluster etcd-0=http://etcd-0:2380,etcd-1=http://etcd-1:2380,etcd-2=http://etcd-2:2380
-initial-cluster-state new
etcdctl
etcdctl - A simple command line client for etcd.
ENDPOINTS=ip:port,ip:port,ip:port
# set
etcdctl --endpoints=$ENDPOINTS put test1 111
etcdctl --endpoints=$ENDPOINTS put test2 222
etcdctl --endpoints=$ENDPOINTS put test3 333
# get
etcdctl --endpoints=$ENDPOINTS get test1
etcdctl --endpoints=$ENDPOINTS get test1 --write-out="json"
etcdctl --endpoints=$ENDPOINTS get test --prefix
# delete
etcdctl --endpoints=$ENDPOINTS del test1
etcdctl --endpoints=$ENDPOINTS del test --prefix
# 集群信息
etcdctl --write-out=table --endpoints=$ENDPOINTS endpoint status
etcdctl --write-out=table --endpoints=$ENDPOINTS member list
示例 (Golang)
go get go.etcd.io/etcd/client/v3
package main
import (
"context"
"fmt"
"time"
etcd "go.etcd.io/etcd/client/v3"
)
func main() {
cli, err := etcd.New(etcd.Config{
Endpoints: []string{"192.168.31.204:2379"},
DialTimeout: 3 * time.Second,
})
if err != nil {
fmt.Printf("etcd ConnectError: %v\n", err)
return
}
defer cli.Close()
ctx, cancel := context.WithTimeout(context.Background(), time.Second)
resp, err := cli.Get(ctx, "test1")
cancel()
if err != nil {
fmt.Printf("etcd GetError: %v\n", err)
}
for _, kv := range resp.Kvs {
fmt.Printf("etcd KeyVlue: %s, %s\n", kv.Key, kv.Value)
}
}
示例 (Python)
python3 -m pip install -u etcd3
import etcd3
etcd = etcd3.client(host='etcd-host-01', port=2379)
testkey = '/key'
testval = '1234'
etcd.put(testkey, testval)
v = etcd.get(testkey)
print(v)
assert v == testval
etcd.delete(testkey)
参考资料与拓展阅读
开发者
2021-12-08
公众号四猿外的文章《这 10 个程序员的好习惯,让我变强了》,很有同感,这里总结一下:
- 要看官方文档,网上的资料鱼龙混杂
- 开发者的可靠性:自测没有问题了再交付
- 日志设计合理,保证调试时,有恰当的信息来定位问题
- 精通 Git
- 优先功能实现,别急着优化
过早优化是魔鬼,应该根据实际运行的情况来优化
- 先做明确的需求,不确定或者模糊的需求先往后放
- 积极主动:发现问题,提出方案,解决问题
- 开发时间评估,保证有冗余时间,以处理可能的意外
- 学习编程主要靠上手写代码,不要光看文档
- 英语。不需要很好,能看懂英文文档就行
科学
2021-12-05

架构 日志 Loki
2021-12-05
Loki
Grafana 公司出品的一个日志系统。才出来没两年,是一个相对较年轻的项目,不过已经有一定知名度了。
业界最为知名的日志系统是 ELK,它对日志做全文索引,搜索起来最快、最灵活,同时大量索引导致存储成本相对较高。
Loki 则将日志分成时间戳、标签、正文三部分,标签就是索引,存储在
Promtail
Grafana
Grafana 是一个数据面板,常用于监控系统。它本身不会收集和存储数据,而是通过接入其他数据源来实现。
通过内置的插件,Loki 可以支持各种关系型数据库和时序数据库(Zabbix 一般配套使用 MySQL 做存储,Prometheus 本身就可以认为是一个时序数据库),也支持 Loki,Elasticsearch 这样的数据源。
实验
Install Loki & Promtail
# 获取最新版本号
# LOKI_VERSION=$(curl -s https://api.github.com/repos/grafana/loki/releases/latest | jq -r .tag_name)
LOKI_VERSION=$(curl -s https://api.github.com/repos/grafana/loki/releases/latest | grep -Po '"tag_name": "\Kv[0-9.]+')
# 下载 loki & promtail
curl -O -L "https://github.com/grafana/loki/releases/download/${LOKI_VERSION}/loki-linux-amd64.zip"
curl -O -L "https://github.com/grafana/loki/releases/download/${LOKI_VERSION}/promtail-linux-amd64.zip"
# loki : 18M -> 57M
# promtail: 21M -> 74M
# 解压 & 设置
unzip loki-linux-amd64.zip promtail-linux-amd64.zip
sudo mv -n loki-linux-amd64 /usr/local/bin/loki
sudo mv -n promtail-linux-amd64 /usr/local/bin/promtail
# chmod a+x /usr/local/bin/{loki,promtail} # already 755
# 下载配置文件
sudo -E wget -qO /etc/loki.config.yaml "https://raw.githubusercontent.com/grafana/loki/${LOKI_VERSION}/cmd/loki/loki-local-config.yaml"
sudo -E wget -qO /etc/promtail.config.yaml "https://raw.githubusercontent.com/grafana/loki/${LOKI_VERSION}/clients/cmd/promtail/promtail-local-config.yaml"
ls -l /etc/{loki,promtail}.config.yaml
# 启动 loki
loki -config.file /etc/loki.config.yaml
# 在另一个终端查看
browse http://localhost:3100/metrics
# 启动 promtail
Install Grafana
Install on Debian or Ubuntu
sudo apt-get install -y apt-transport-https software-properties-common wget
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
# Bate 版本
# echo "deb https://packages.grafana.com/oss/deb beta main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
sudo apt-get update
sudo apt-get install -y grafana
# 无法创建主目录"/usr/share/grafana"
# sudo systemctl daemon-reload
# sudo systemctl enable grafana-server
sudo systemctl start grafana-server
browse http://localhost:3000
参考资料与拓展阅读
操作系统 Fedora
2021-12-04
牺牲:灵活性
优势:稳定性和安全性(只读文件系统,不容易被破坏,方便回滚)
- 桌面使用
- Fedora Silverblue:基于 Fedora,使用 Flatpak 管理应用程序,具有原子更新功能.
- Vanilla OS:基于 Debian Sid,使用 Ext4 文件系统,通过 Apx 工具安装软件.
- openSUSE Aeon:简化了 Btrfs 的复杂性,适合桌面用户,结合了 Flatpak 和稳定的系统基础.
- NixOS
- Endless OS
- 服务器使用
- Fedora CoreOS:注重最小化和自动原子更新,适合容器托管或 Kubernetes 集群.
- Flatcar Container Linux:继承了 CoreOS Container Linux 的遗产,专注于容器化环境.
- openSUSE MicroOS (Server Edition):基于 openSUSE Tumbleweed,提供事务性更新和基于 Btrfs 的不可变根文件系统.
- AWS Bottlerocket
- Talos Linux: 专注于 Kubernetes 集群.
- 其他
- Ubuntu Core
- Photon OS
- blendOS
Java 云原生
2021-12-03
今天,25 岁的 Java 仍然是最具有统治力的编程语言,长期占据编程语言排行榜的首位,拥有一千二百万的庞大开发者群体,全世界有四百五十亿部物理设备使用着 Java 技术,同时,在云端数据中心的虚拟化环境里,还运行着超过两百五十亿个 Java 虚拟机的进程实例 (数据来自Oracle的WebCast)。
以上这些数据是 Java 过去 25 年巨大成就的功勋佐证,更是 Java 技术体系维持自己“天下第一”编程语言的坚实壁垒。Java 与其他语言竞争,底气从来不在于语法、类库有多么先进好用,而是来自它庞大的用户群和极其成熟的软件生态,这在朝夕之间难以撼动。然而,这个现在看起来仍然坚不可摧的 Java 帝国,其统治地位的稳固程度不仅没有高枕无忧,反而说是危机四伏也不为过。目前已经有了可预见的、足以威胁动摇其根基的潜在可能性正在酝酿,并随云原生时代而降临。
SSH 网络代理 Curl
2021-12-03
ssh user@host curl http://anyurl
先建立 SSH 代理,在 curl:
ssh -D 19888 -CNfq user@host
curl -x socks5h://localhost:19888 http://anyurl
后面的 CNfq 也是非常好记的,懂的懂。
至于什么含义,可以 man ssh
。
Nginx
2021-12-03
背景:公司有两套环境,其中有一个 Web 服务,包含一个比较耗时的接口。
今天收到反馈说环境 B 中的接口每次都是半天没有反应,等执行完一次性输出,A 环境中则是流式的,一点一点输出。
然后相关同事希望 A 环境体验好一些,希望 B 环境能和 A 环境保持一致。
经过检查,发现其中一遍经过了 Nginx 的代理,马上就想到了代理缓存的问题。
联系运维修改配置,在相关 server 中添加 proxy_buffering off;
,问题果然解决。
开发者 Curl
2021-11-28
Here is a command I use often while measuring why an HTTP request is taking too long:
curl -L -w "time_namelookup: %{time_namelookup}
time_connect: %{time_connect}
time_appconnect: %{time_appconnect}
time_pretransfer: %{time_pretransfer}
time_redirect: %{time_redirect}
time_starttransfer: %{time_starttransfer}
time_total: %{time_total}
" https://example.com/
Here is the same command written as a one-liner, so that I can copy it easily from this page with a triple-click whenever I need it in future:
curl -L -w "time_namelookup: %{time_namelookup}\ntime_connect: %{time_connect}\ntime_appconnect: %{time_appconnect}\ntime_pretransfer: %{time_pretransfer}\ntime_redirect: %{time_redirect}\ntime_starttransfer: %{time_starttransfer}\ntime_total: %{time_total}\n" https://example.com/
Here is how the output of the above command typically looks:
$ curl -L -w "namelookup: %{time_namelookup}\nconnect: %{time_connect}\nappconnect: %{time_appconnect}\npretransfer: %{time_pretransfer}\nstarttransfer: %{time_starttransfer}\ntotal: %{time_total}\n" https://example.com/
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
...
</html>
time_namelookup: 0.001403
time_connect: 0.245464
time_appconnect: 0.757656
time_pretransfer: 0.757823
time_redirect: 0.000000
time_starttransfer: 0.982111
time_total: 0.982326
In the output above, I have omitted most of the HTML output and replaced the omitted part with ellipsis for the sake of brevity.
The list below provides a description of each number in the output above. This information is picked straight from the manual page of curl 7.20.0. Here are the details:
time_namelookup
: The time, in seconds, it took from the start until the name resolving was completed.
time_connect
: The time, in seconds, it took from the start until the TCP connect to the remote host (or proxy) was completed.
time_appconnect
: The time, in seconds, it took from the start until the SSL/SSH/etc connect/handshake to the remote host was completed. (Added in 7.19.0)
time_pretransfer
: The time, in seconds, it took from the start until the file transfer was just about to begin. This includes all pre-transfer commands and negotiations that are specific to the particular protocol(s) involved.
time_redirect
: The time, in seconds, it took for all redirection steps include name lookup, connect, pretransfer and transfer before the final transaction was started. time_redirect shows the complete execution time for multiple redirections. (Added in 7.12.3)
time_starttransfer
: The time, in seconds, it took from the start until the first byte was just about to be transferred. This includes time_pretransfer and also the time the server needed to calculate the result.
time_total
: The total time, in seconds, that the full operation lasted. The time will be displayed with millisecond resolution.
An important thing worth noting here is that the difference in the numbers for time_appconnect
and time_connect
time tells us how much time is spent in SSL/TLS handshake. For a cleartext connection without SSL/TLS, this number is reported as zero. Here is an example output that demonstrates this:
$ curl -L -w "time_namelookup: %{time_namelookup}\ntime_connect: %{time_connect}\ntime_appconnect: %{time_appconnect}\ntime_pretransfer: %{time_pretransfer}\ntime_redirect: %{time_redirect}\ntime_starttransfer: %{time_starttransfer}\ntime_total: %{time_total}\n" http://example.com/
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
...
</html>
time_namelookup: 0.001507
time_connect: 0.247032
time_appconnect: 0.000000
time_pretransfer: 0.247122
time_redirect: 0.000000
time_starttransfer: 0.512645
time_total: 0.512853
Also note that time_redirect
is zero in both outputs above. That is because no redirection occurs while visiting example.com. Here is another example that shows how the output looks when a redirection occurs:
$ curl -L -w "time_namelookup: %{time_namelookup}\ntime_connect: %{time_connect}\ntime_appconnect: %{time_appconnect}\ntime_pretransfer: %{time_pretransfer}\ntime_redirect: %{time_redirect}\ntime_starttransfer: %{time_starttransfer}\ntime_total: %{time_total}\n" https://susam.in/blog
<!DOCTYPE HTML>
<html>
...
</html>
time_namelookup: 0.001886
time_connect: 0.152445
time_appconnect: 0.465326
time_pretransfer: 0.465413
time_redirect: 0.614289
time_starttransfer: 0.763997
time_total: 0.765413
When faced with a potential latency issue in web services, this is often one of the first commands I run several times from multiple clients because the results form this command help to get a quick sense of the layer that might be responsible for the latency issue.