#9 MySQL 编码字符集

2015-05-11

各种类型的编码

mysql> SHOW VARIABLES LIKE 'character%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | gbk                        |
| character_set_connection | gbk                        |
| character_set_database   | utf8mb4                    |
| character_set_filesystem | binary                     |
| character_set_results    | gbk                        |
| character_set_server     | utf8mb4                    |
| character_set_system     | utf8mb3                    |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

5.7 及更早版本默认字符集和 Collation 是 latin1 和 latin1_swedish_ci
8.0 之后改成 utf8mb4 和 utf8mb4_0900_ai_ci

上面就有 6 种 全局的编码:

  • client 客户端编码
  • connection 连接编码
  • database 数据库编码,创建表时的默认编码
    load data, 以及创建
  • filesystem 文件系统编码
  • results 结果编码
  • server 服务端编码,创建数据库时的默认编码
  • system 系统编码

还有两种局部的编码:表的编码和列(字段)的编码。
列主要是 char, varchar, text 类型。

字符集

# 都可以 LIKE 搜索
SHOW CHARACTER SET;
SHOW CHARSET;
SHOW CHAR SET;
Charset Description Default collation Maxlen
armscii8 ARMSCII-8 Armenian armscii8_general_ci 1
ascii US ASCII ascii_general_ci 1
big5 Big5 Traditional Chinese big5_chinese_ci 2
binary Binary pseudo charset binary 1
cp1250 Windows Central European cp1250_general_ci 1
cp1251 Windows Cyrillic cp1251_general_ci 1
cp1256 Windows Arabic cp1256_general_ci 1
cp1257 Windows Baltic cp1257_general_ci 1
cp850 DOS West European cp850_general_ci 1
cp852 DOS Central European cp852_general_ci 1
cp866 DOS Russian cp866_general_ci 1
cp932 SJIS for Windows Japanese cp932_japanese_ci 2
dec8 DEC West European dec8_swedish_ci 1
eucjpms UJIS for Windows Japanese eucjpms_japanese_ci 3
euckr EUC-KR Korean euckr_korean_ci 2
gb18030 China National Standard GB18030 gb18030_chinese_ci 4
gb2312 GB2312 Simplified Chinese gb2312_chinese_ci 2
gbk GBK Simplified Chinese gbk_chinese_ci 2
geostd8 GEOSTD8 Georgian geostd8_general_ci 1
greek ISO 8859-7 Greek greek_general_ci 1
hebrew ISO 8859-8 Hebrew hebrew_general_ci 1
hp8 HP West European hp8_english_ci 1
keybcs2 DOS Kamenicky Czech-Slovak keybcs2_general_ci 1
koi8r KOI8-R Relcom Russian koi8r_general_ci 1
koi8u KOI8-U Ukrainian koi8u_general_ci 1
latin1 cp1252 West European latin1_swedish_ci 1
latin2 ISO 8859-2 Central European latin2_general_ci 1
latin5 ISO 8859-9 Turkish latin5_turkish_ci 1
latin7 ISO 8859-13 Baltic latin7_general_ci 1
macce Mac Central European macce_general_ci 1
macroman Mac West European macroman_general_ci 1
sjis Shift-JIS Japanese sjis_japanese_ci 2
swe7 7bit Swedish swe7_swedish_ci 1
tis620 TIS620 Thai tis620_thai_ci 1
ucs2 UCS-2 Unicode ucs2_general_ci 2
ujis EUC-JP Japanese ujis_japanese_ci 3
utf16 UTF-16 Unicode utf16_general_ci 4
utf16le UTF-16LE Unicode utf16le_general_ci 4
utf32 UTF-32 Unicode utf32_general_ci 4
utf8mb3 UTF-8 Unicode utf8mb3_general_ci 3
utf8mb4 UTF-8 Unicode utf8mb4_0900_ai_ci 4
SELECT * FROM information_schema.character_sets;
SELECT * FROM information_schema.character_sets WHERE CHARACTER_SET_NAME LIKE "%utf%";
CHARACTER_SET_NAME DEFAULT_COLLATE_NAME DESCRIPTION MAXLEN
utf8mb3 utf8mb3_general_ci UTF-8 Unicode 3
utf16 utf16_general_ci UTF-16 Unicode 4
utf16le utf16le_general_ci UTF-16LE Unicode 4
utf32 utf32_general_ci UTF-32 Unicode 4
utf8mb4 utf8mb4_0900_ai_ci UTF-8 Unicode 4

排序规则

SHOW COLLATION;

这就多了,两百多。

SHOW COLLATION LIKE "%ascii%";
Collation Charset Id Default Compiled Sortlen Pad_attribute
ascii_bin ascii 65 Yes 1 PAD SPACE
ascii_general_ci ascii 11 Yes Yes 1 PAD SPACE
SELECT * FROM information_schema.collations WHERE CHARACTER_SET_NAME = "utf8mb4" AND COLLATION_NAME LIKE "%zh%";
COLLATION_NAME CHARACTER_SET_NAME ID IS_DEFAULT IS_COMPILED SORTLEN PAD_ATTRIBUTE
utf8mb4_zh_0900_as_cs utf8mb4 308 Yes 0 NO PAD

命名规则

字符集名称,语言,通用后缀

  • ai Accent-insensitive 重音不敏感
  • as Accent-sensitive 重音敏感
  • ci Case-insensitive 大小写不敏感
  • cs Case-sensitive 大小写敏感
  • ks Kana-sensitive 假名敏感(日语)
  • bin 二进制

8.0 之后,很多编码中多了 0900 字样,表示 Unicode 9.0 规范。

参考资料与拓展阅读

#7 SQLAlchemy LIKE

2014-09-20

例子:搜索用户表 user 中字段 phone 包含 520 的行。

SQL

和 pymysql 等库一样的用:

keyword = '520'
conn.execute('select * from user where phone like "%%%s%%";' % keyword)
conn.execute('select * from user where phone like "%%%s%%";', keyword)

.like 方法

q = session.query(model.User.id, model.User.phone).filter(model.User.phone.like(f'%{keyword}%'))
qs = qs.all()
# print(qs.statement)
# SELECT "user".id, "user".phone
# FROM "user"
# WHERE "user".phone LIKE :phone_1

对应的大小写不敏感方法有 ilike (lower("user".phone) LIKE lower(:phone_1))
还有:not_like, not_ilike

.contains 方法

print(session.query(model.User.id, model.User.phone).filter(model.User.phone.contains(keyword)).statement)
# SELECT "user".id, "user".phone
# FROM "user"
# WHERE ("user".phone LIKE '%' || :phone_1 || '%')

.regexp_match 方法(1.4 新增)

对应的是 MySQL 支持的 REGEXP 操作符。

print(session.query(model.User.id, model.User.phone).filter(model.User.phone.regexp_match(keyword)).statement)
session.query(model.User.id, model.User.phone).filter(model.User.phone.regexp_match(keyword)).all()
# SELECT "user".id, "user".phone
# FROM "user"
# WHERE "user".phone <regexp> :phone_1

.startswith.endswith

print(session.query(model.User.id, model.User.phone).filter(model.User.phone.startswith(keyword)).statement)
# SELECT "user".id, "user".phone
# FROM "user"
# WHERE ("user".phone LIKE :phone_1 || '%')
print(session.query(model.User.id, model.User.phone).filter(model.User.phone.endswith(keyword)).statement)
# SELECT "user".id, "user".phone
# FROM "user"
# WHERE ("user".phone LIKE '%' || :phone_1)

.match 方法

对应的是数据库的 MATCH (col1,col2,...) AGAINST (expr [search_modifier]) 全文索引方法。
对单字段同样可用,不过需要先建立 FULLTEXT 索引。

print(session.query(model.User.id, model.User.phone).filter(model.User.phone.match(keyword)).statement)
session.query(model.User.id, model.User.phone).filter(model.User.phone.match(keyword)).all()
# SELECT "user".id, "user".phone
# FROM "user"
# WHERE "user".phone MATCH :phone_1

参考资料与拓展阅读

#6 MySQL set 类型

2014-08-20

顾名思义,就是集合类型。

set('a', 'b', ...)

每个字段可以是指定选项中的若干个(包含 0 个)。

#5 MySQL 时间

2014-03-06

几种形式

  1. 字符串,比如:
  2. varchar(14),存:yyyymmddHHMMSS 格式
  3. varchar(19),存:yyyy-mm-dd HH:MM:SS 格式
  4. varchar(23),存:yyyy-mm-dd HH:MM:SS.fff 格式
  5. varchar(26),存:yyyy-mm-dd HH:MM:SS.ffffff 格式
  6. varchar(24),存:yyyy-mm-ddTHH:MM:SS+0800yyyy-mm-dd HH:MM:SS 0800 格式
  7. 整形数
  8. int 1970 - 2038
    time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime((1 << 31) - 1))
    # 2038-01-19 03:14:07
    
    - int unsigned 1970 - 2106
    time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime((1 << 32) - 1))
    # 2106-02-07 06:28:15
    
    - bigint 可以用到世界毁灭
  9. 注意:也可以任意指定一个时间为起点,比如 2020-01-01 00:00:00
  10. TIMESTAMP 类型
  11. 相当于上面的 int 类型,2038 问题
  12. DATETIME 类型

PS: DATE, TIME, YEAR 三种类型:

  • DATE
  • TIME
  • YEAR

比较

1. 存储效率

2. 时间比较

3. 是否方便索引

#4 MySQL Timestamp

2014-03-03

优点

Timestamp 是按 UTC 时间进行存储。

2038 问题

img

MySQL 的 timestamp 类型挺好的,不过有个严重的 2038 问题,我不知道到时候这个类型会如何处理。

The TIMESTAMP data type is used for values that contain both date and time parts. TIMESTAMP has a range of '1970-01-01 00:00:01' UTC to '2038-01-19 03:14:07' UTC.

(1 << 31) / (3600 * 24 * 365)

根据时间范围,这应该是用的一个有符号的 32 位整型数实现的,为什么不用无符号数呢,不是又可以往后续 68 年么?

我预测,MySQL 的某一个新版本,将会增加一个 TIMESTAMP64 类型,采用 64 位有符号整形数存时间戳。
或直接将 TIMESTAMP 拓展到 64 位,先支持配置和命令行参数开启这个特性,然后后续版本将其视作默认设置。

参考资料与拓展阅读

#1 SQLite

2013-07-12

简介

  1. 嵌入式数据库
  2. 第一版 2000 年发布
  3. 公有领域
  4. 实现了大多数 SQL-92 标准,包括事务(ACID)
  5. 使用最广泛的 DB 引擎

命令

.help
.databases
.tables
.shema <tableName> # show sql
.fullschema
.quit

.mode list      # 默认,没有头部,竖线隔开
.mode tabs      # tab 隔开
.mode column    # 可读性强些
.mode line      # 像 MySQL \G
sqlite> .show
        echo: off
         eqp: off
     explain: auto
     headers: off
        mode: list
   nullvalue: ""
      output: stdout
colseparator: "|"
rowseparator: "\n"
       stats: off
       width:
    filename: /tmp/history.db

sqlite> .dbinfo main
database page size:  4096
write format:        2
read format:         2
reserved bytes:      0
file change counter: 3110
database page count: 6669
freelist page count: 0
schema cookie:       7
schema format:       4
default cache size:  0
autovacuum top root: 0
incremental vacuum:  0
text encoding:       1 (utf8)
user version:        2
application id:      0
software version:    3033000
number of tables:    4
number of indexes:   6
number of triggers:  0
number of views:     0
schema size:         785
data version         2

常用操作

查看表结构

.schema tablename
pragma dbname.table_info(tablename)
pragma dbname.table_xinfo(tablename) # 包含虚拟表中的隐藏列
SELECT * FROM sqlite_master WHERE tbl_name = 'tablename';

工具

  1. SQLite Database Browser
  2. heidisql
  3. 浏览器拓展
  4. Navicat for SQLite

附:sqlite help

sqlite> .help
.archive ...             Manage SQL archives
.auth ON|OFF             Show authorizer callbacks
.backup ?DB? FILE        Backup DB (default "main") to FILE
.bail on|off             Stop after hitting an error.  Default OFF
.binary on|off           Turn binary output on or off.  Default OFF
.cd DIRECTORY            Change the working directory to DIRECTORY
.changes on|off          Show number of rows changed by SQL
.check GLOB              Fail if output since .testcase does not match
.clone NEWDB             Clone data into NEWDB from the existing database
.connection [close] [#]  Open or close an auxiliary database connection
.databases               List names and files of attached databases
.dbconfig ?op? ?val?     List or change sqlite3_db_config() options
.dbinfo ?DB?             Show status information about the database
.dump ?OBJECTS?          Render database content as SQL
.echo on|off             Turn command echo on or off
.eqp on|off|full|...     Enable or disable automatic EXPLAIN QUERY PLAN
.excel                   Display the output of next command in spreadsheet
.exit ?CODE?             Exit this program with return-code CODE
.expert                  EXPERIMENTAL. Suggest indexes for queries
.explain ?on|off|auto?   Change the EXPLAIN formatting mode.  Default: auto
.filectrl CMD ...        Run various sqlite3_file_control() operations
.fullschema ?--indent?   Show schema and the content of sqlite_stat tables
.headers on|off          Turn display of headers on or off
.help ?-all? ?PATTERN?   Show help text for PATTERN
.import FILE TABLE       Import data from FILE into TABLE
.imposter INDEX TABLE    Create imposter table TABLE on index INDEX
.indexes ?TABLE?         Show names of indexes
.limit ?LIMIT? ?VAL?     Display or change the value of an SQLITE_LIMIT
.lint OPTIONS            Report potential schema issues.
.load FILE ?ENTRY?       Load an extension library
.log FILE|off            Turn logging on or off.  FILE can be stderr/stdout
.mode MODE ?TABLE?       Set output mode
.nonce STRING            Disable safe mode for one command if the nonce matches
.nullvalue STRING        Use STRING in place of NULL values
.once ?OPTIONS? ?FILE?   Output for the next SQL command only to FILE
.open ?OPTIONS? ?FILE?   Close existing database and reopen FILE
.output ?FILE?           Send output to FILE or stdout if FILE is omitted
.parameter CMD ...       Manage SQL parameter bindings
.print STRING...         Print literal STRING
.progress N              Invoke progress handler after every N opcodes
.prompt MAIN CONTINUE    Replace the standard prompts
.quit                    Exit this program
.read FILE               Read input from FILE
.recover                 Recover as much data as possible from corrupt db.
.restore ?DB? FILE       Restore content of DB (default "main") from FILE
.save FILE               Write in-memory database into FILE
.scanstats on|off        Turn sqlite3_stmt_scanstatus() metrics on or off
.schema ?PATTERN?        Show the CREATE statements matching PATTERN
.selftest ?OPTIONS?      Run tests defined in the SELFTEST table
.separator COL ?ROW?     Change the column and row separators
.session ?NAME? CMD ...  Create or control sessions
.sha3sum ...             Compute a SHA3 hash of database content
.shell CMD ARGS...       Run CMD ARGS... in a system shell
.show                    Show the current values for various settings
.stats ?ARG?             Show stats or turn stats on or off
.system CMD ARGS...      Run CMD ARGS... in a system shell
.tables ?TABLE?          List names of tables matching LIKE pattern TABLE
.testcase NAME           Begin redirecting output to 'testcase-out.txt'
.testctrl CMD ...        Run various sqlite3_test_control() operations
.timeout MS              Try opening locked tables for MS milliseconds
.timer on|off            Turn SQL timer on or off
.trace ?OPTIONS?         Output each SQL statement as it is run
.vfsinfo ?AUX?           Information about the top-level VFS
.vfslist                 List all available VFSes
.vfsname ?AUX?           Print the name of the VFS stack
.width NUM1 NUM2 ...     Set minimum column widths for columnar output