TOC

正则表达式

作者： catroll
日期： 2021-04-14
标签：正则表达式

https://zh.wikipedia.org/wiki/正则表达式
2021/12/19，正则表达式历史

正则表达式, Regular Expression, 简写: regex, regexp, 正则。
一般用来以指定模式对字符串进行查找，替换，切割。

基本知识

Linux 下, grep, find 等命令支持几种风格：

find -regextype help
# find: 未知的正则表达式类型 ‘help’；合法的类型是 ‘findutils-default’, ‘ed’, ‘emacs’, ‘gnu-awk’, ‘grep’, ‘posix-awk’, ‘awk’, ‘posix-basic’, ‘posix-egrep’, ‘egrep’, ‘posix-extended’, ‘posix-minimal-basic’, ‘sed’。
man grep | grep -E '\-regexp$'
#        -E, --extended-regexp
#        -G, --basic-regexp
#        -P, --perl-regexp
#        -w, --word-regexp
#        -x, --line-regexp

因为历史原因，大体上可以分为：

基本型正则表达式（Basic Regular Expression，BRE）
扩展型正则表达式（Extended Regular Express，ERE）
PCRE（Perl兼容正则表达式，Perl Compatible Regular Expressions）

我见过的编程语言中都是采用最为强大的 PCRE 风格：

PHP 是 pgeg_xxx 系列方法。
Python 是 re 包。
Go 是 regexp 包。
C 直接使用 regex.h (libc)
C 老版本也是 regex.h，或者使用第三方库（如 boost）, C 11 之后就内置正则支持了（std::regex, 默认 ECMAScript 风格）

语法

特殊字符

| Characters / constructs | Corresponding article |
| :------------------------------------------------------------------------------------------------------------------------------------------------- | :---------------------------- | ----------------- |
| \ , . , \cX , \d , \D , \f , \n , \r , \s , \S , \t , \v ,
\w , \W , \0 , \xhh , \uhhhh , \uhhhhh , [\b] | Character classes |
| ^ , $ , x(?=y) , x(?!y) , (?<=y)x , (?<!y)x , \b , \B | Assertions |
| (x) , (?:x) , (?<Name>x) , x | y,[xyz],[^xyz],\Number | Groups and ranges |
| * , + , ? , x{n} , x{n,} , x{n,m} | Quantifiers |
| \p{UnicodeProperty} , \P{UnicodeProperty} | Unicode property escapes |

范围

.
\w , \d , \s , \W , \D , \S
[abc] [^abc] [a-z]

分组

(abc)
\1
(?:abc)
(?=abc)
(?!abc)

量词

a* , a+ , a?
a{7,9} , a{7} , a{7,}
a+? , a{2,}?

其他

^ , $
ab|cd or
\b , \B word

修饰符

JavaScript:

g global
i case insensitive
m multiline
s single line (dotall)
u unicode
y sticky

Go:

Python:

re.A, re.ASCII
re.I, re.IGNORECASE
re.M, re.MULTILINE
re.S, re.DOTALL
re.DEBUG
re.X, re.VERBOSE

参考资料与拓展阅读

发布于码厩技术博客的所有文章，除注明转载外，均为作者原创，欢迎转载，但必须注明出处。
尊重他人劳动，共创开源社区！转载请注明以下信息：
转载来源：码厩技术博客 [https://www.markjour.com]
原文标题：正则表达式
原文地址：/article/20210414-regex.html

如果你有魔法，你可以看到一个评论框～

一	二	三	四	五	六	日