#26 HTTP 状态码

2013-10-01

状态码

Informational responses (100–199)

信息

  • 100 Continue
  • 101 Switching Protocol
  • 102 Processing
  • 103 Early Hints

Successful responses (200–299)

成功

  • 200 OK
  • 201 Created
  • 202 Accepted
  • 203 Non-Authoritative Information
  • 204 No Content
  • 205 Reset Content
  • 206 Partial Content
  • 207 Multi-Status
  • 208 Already Reported
  • 226 IM Used

Redirects (300–399)

重定向

  • 300 Multiple Choice
  • 301 Moved Permanently
  • 302 Found
  • 303 See Other
  • 304 Not Modified
  • 307 Temporary Redirect
  • 308 Permanent Redirect

Client errors (400–499)

客户端错误

  • 400 Bad Request
  • 401 Unauthorized
  • 402 Payment Required
  • 403 Forbidden
  • 404 Not Found
  • 405 Method Not Allowed
  • 406 Not Acceptable
  • 407 Proxy Authentication Required
  • 408 Request Timeout
  • 409 Conflict
  • 410 Gone
  • 411 Length Required
  • 412 Precondition Failed
  • 413 Payload Too Large
  • 414 URI Too Long
  • 415 Unsupported Media Type
  • 416 Range Not Satisfiable
  • 417 Expectation Failed
  • 418 I'm a teapot
  • 421 Misdirected Request
  • 422 Unprocessable Entity
  • 423 Locked
  • 424 Failed Dependency
  • 425 Too Early
  • 426 Upgrade Required
  • 428 Precondition Required
  • 429 Too Many Requests
  • 431 Request Header Fields Too Large
  • 451 Unavailable For Legal Reasons

Server errors (500–599)

服务器错误

  • 500 Internal Server Error
  • 501 Not Implemented
  • 502 Bad Gateway
  • 503 Service Unavailable
  • 504 Gateway Timeout
  • 505 HTTP Version Not Supported
  • 506 Variant Also Negotiates
  • 507 Insufficient Storage
  • 508 Loop Detected
  • 510 Not Extended
  • 511 Network Authentication Required

JSON

{
    "100": "Continue",
    "101": "Switching Protocol",
    "102": "Processing",
    "103": "Early Hints",
    "200": "OK",
    "201": "Created",
    "202": "Accepted",
    "203": "Non-Authoritative Information",
    "204": "No Content",
    "205": "Reset Content",
    "206": "Partial Content",
    "207": "Multi-Status",
    "208": "Already Reported",
    "226": "IM Used",
    "300": "Multiple Choice",
    "301": "Moved Permanently",
    "302": "Found",
    "303": "See Other",
    "304": "Not Modified",
    "307": "Temporary Redirect",
    "308": "Permanent Redirect",
    "400": "Bad Request",
    "401": "Unauthorized",
    "402": "Payment Required",
    "403": "Forbidden",
    "404": "Not Found",
    "405": "Method Not Allowed",
    "406": "Not Acceptable",
    "407": "Proxy Authentication Required",
    "408": "Request Timeout",
    "409": "Conflict",
    "410": "Gone",
    "411": "Length Required",
    "412": "Precondition Failed",
    "413": "Payload Too Large",
    "414": "URI Too Long",
    "415": "Unsupported Media Type",
    "416": "Range Not Satisfiable",
    "417": "Expectation Failed",
    "418": "I'm a teapot",
    "421": "Misdirected Request",
    "422": "Unprocessable Entity",
    "423": "Locked",
    "424": "Failed Dependency",
    "425": "Too Early",
    "426": "Upgrade Required",
    "428": "Precondition Required",
    "429": "Too Many Requests",
    "431": "Request Header Fields Too Large",
    "451": "Unavailable For Legal Reasons",
    "500": "Internal Server Error",
    "501": "Not Implemented",
    "502": "Bad Gateway",
    "503": "Service Unavailable",
    "504": "Gateway Timeout",
    "505": "HTTP Version Not Supported",
    "506": "Variant Also Negotiates",
    "507": "Insufficient Storage",
    "508": "Loop Detected",
    "510": "Not Extended",
    "511": "Network Authentication Required"
}

参考资料与拓展阅读

#25 SEO:sitemap 和 robots.txt

2013-09-30

给本站加上了 sitemap.xmlrobots.txt

SEO 常用工具中有两个:站点地图(sitemap)和爬虫协议禁止抓取文件(robots.txt)。

请原谅我用一个这么长的名字来称呼 robots.txt

站点地图

  1. UTF-8 编码
  2. 单个文件大小不能超过 50MB,其中 URL 数量不能超过 50 万,如果超出限制可以创建多个 sitemap,然后通过 sitemap 索引文件将其连接。
  3. 遵守 XML 规范,需要对 & 符号 (&)、单引号 (')、双引号 (")、小于号 (<) 和大于号 (>) 进行实体符号转义。很重要!

结构类似:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
        <loc>http://www.example.com/aaa</loc>
        <lastmod>2005-01-01</lastmod>
        <changefreq>monthly</changefreq>
        <priority>0.8</priority>
    </url>
    <url>
        <loc>http://www.example.com/bbb</loc>
        <lastmod>2005-01-01</lastmod>
        <changefreq>weekly</changefreq>
        <priority>0.7</priority>
    </url>
    <url>
        <loc>http://www.example.com/ccc</loc>
        <lastmod>2005-01-01</lastmod>
        <changefreq>daily</changefreq>
        <priority>0.7</priority>
    </url>
</urlset>

PS: XML 也可以使用样式,本站就有个简单的样式,可以打开 sitemap 页面看看。
就是在 urlset 前加了个样式表:<?xml-stylesheet type="text/xsl" href="sitemap.xsl"?> 样式表连接

爬虫协议

告诉爬虫不要采集指定的路径。

不过这个其实谈不上协议,只是行业内的一个约定而已,大厂家尚且不一定严格遵守(百度诉360违反Robots协议案开庭 百度索赔1亿元)。

其他 SEO 工具或方法

反链

以下内容来自:维基百科 - 反向链接

反向链接是指A站通过域名或锚文本指向B站,从而使网站权重得到提升。

增加反向链接方法

在论坛签名内加上网站名,发言时就会带上网站链接
创建博客,在更新的文章内加上网址
到各个相关门户投稿,并加上网站链接
购买链接,这种方法不太稳定,不建议采用
在别人博客留言
和其他网站交换友情链接

反向链接作用

对于SEO而言,反向链接能够使网站获得好的排名,所以,反向链接的好坏直接影响网站的整体权重和流量。

参考

#20 FHS (Filesystem Hierarchy Standard)

2013-09-13

文件系统层次化标准

Version Release date
1.0 1994-02-14
1.1 1994-10-09
1.2 1995-03-28
2.0 1997-10-26
2.1 2000-04-12
2.2 2001-05-23
2.3 2004-01-29
3.0 2015-05-18

基础结构

目录 CentOS 7 Ubuntu 含义
/     根目录
/sys/     sysfs 虚拟FS,内核信息
/proc/     proc 虚拟FS,进程信息
/tmp/     临时文件
/home/     用户主目录
/root/      
/boot/     启动相关:内核 + grub
/bin/ /usr/bin/ /usr/bin/ 系统程序
/sbin/ /usr/sbin/ /usr/sbin/ 系统程序(系统管理)
/lib/ /usr/lib/ /usr/lib/ 动态库
/lib64/ /usr/lib64/ /usr/lib64/ 动态库
/usr/      
/etc/     配置文件
/var/      
/opt/      
/data/      
/srv/      
/dev/     设备
/mnt/     挂载点
/media/     挂载点
/lost+found/ -   使用中的已删除文件
  /run/ /run/  
    /snap/  
    /cdrom/  

/etc/

/var/

目录 含义
/var/log/  
/var/cache/  
/var/mail/  
/var/run/  

PS:较新版本中,/var/run/ 变成了 /run/ 的软链接。

/usr/

UNIX Software Resource

目录 含义
/usr/bin/ 应用程序
/usr/sbin/ 应用程序(系统管理)
/usr/lib/  
/usr/lib64/  
/usr/local/  
/usr/include/  
/usr/share/  
/usr/src/  

PS://usr//usr/local/ 三层具有类似的目录结构
PS:较新版本中 /bin//sbin//lib//lib64//usr/ 目录下的同名目录合并了。

用户主目录

参考资料与拓展阅读

#19 SQLite

2013-07-12

简介

  1. 嵌入式数据库
  2. 第一版 2000 年发布
  3. 公有领域
  4. 实现了大多数 SQL-92 标准,包括事务(ACID)
  5. 使用最广泛的 DB 引擎

命令

.help
.databases
.tables
.shema <tableName> # show sql
.fullschema
.quit

.mode list      # 默认,没有头部,竖线隔开
.mode tabs      # tab 隔开
.mode column    # 可读性强些
.mode line      # 像 MySQL \G
sqlite> .show
        echo: off
         eqp: off
     explain: auto
     headers: off
        mode: list
   nullvalue: ""
      output: stdout
colseparator: "|"
rowseparator: "\n"
       stats: off
       width:
    filename: /tmp/history.db

sqlite> .dbinfo main
database page size:  4096
write format:        2
read format:         2
reserved bytes:      0
file change counter: 3110
database page count: 6669
freelist page count: 0
schema cookie:       7
schema format:       4
default cache size:  0
autovacuum top root: 0
incremental vacuum:  0
text encoding:       1 (utf8)
user version:        2
application id:      0
software version:    3033000
number of tables:    4
number of indexes:   6
number of triggers:  0
number of views:     0
schema size:         785
data version         2

常用操作

查看表结构

.schema tablename
pragma dbname.table_info(tablename)
pragma dbname.table_xinfo(tablename) # 包含虚拟表中的隐藏列
SELECT * FROM sqlite_master WHERE tbl_name = 'tablename';

工具

  1. SQLite Database Browser
  2. heidisql
  3. 浏览器拓展
  4. Navicat for SQLite

附:sqlite help

sqlite> .help
.archive ...             Manage SQL archives
.auth ON|OFF             Show authorizer callbacks
.backup ?DB? FILE        Backup DB (default "main") to FILE
.bail on|off             Stop after hitting an error.  Default OFF
.binary on|off           Turn binary output on or off.  Default OFF
.cd DIRECTORY            Change the working directory to DIRECTORY
.changes on|off          Show number of rows changed by SQL
.check GLOB              Fail if output since .testcase does not match
.clone NEWDB             Clone data into NEWDB from the existing database
.connection [close] [#]  Open or close an auxiliary database connection
.databases               List names and files of attached databases
.dbconfig ?op? ?val?     List or change sqlite3_db_config() options
.dbinfo ?DB?             Show status information about the database
.dump ?OBJECTS?          Render database content as SQL
.echo on|off             Turn command echo on or off
.eqp on|off|full|...     Enable or disable automatic EXPLAIN QUERY PLAN
.excel                   Display the output of next command in spreadsheet
.exit ?CODE?             Exit this program with return-code CODE
.expert                  EXPERIMENTAL. Suggest indexes for queries
.explain ?on|off|auto?   Change the EXPLAIN formatting mode.  Default: auto
.filectrl CMD ...        Run various sqlite3_file_control() operations
.fullschema ?--indent?   Show schema and the content of sqlite_stat tables
.headers on|off          Turn display of headers on or off
.help ?-all? ?PATTERN?   Show help text for PATTERN
.import FILE TABLE       Import data from FILE into TABLE
.imposter INDEX TABLE    Create imposter table TABLE on index INDEX
.indexes ?TABLE?         Show names of indexes
.limit ?LIMIT? ?VAL?     Display or change the value of an SQLITE_LIMIT
.lint OPTIONS            Report potential schema issues.
.load FILE ?ENTRY?       Load an extension library
.log FILE|off            Turn logging on or off.  FILE can be stderr/stdout
.mode MODE ?TABLE?       Set output mode
.nonce STRING            Disable safe mode for one command if the nonce matches
.nullvalue STRING        Use STRING in place of NULL values
.once ?OPTIONS? ?FILE?   Output for the next SQL command only to FILE
.open ?OPTIONS? ?FILE?   Close existing database and reopen FILE
.output ?FILE?           Send output to FILE or stdout if FILE is omitted
.parameter CMD ...       Manage SQL parameter bindings
.print STRING...         Print literal STRING
.progress N              Invoke progress handler after every N opcodes
.prompt MAIN CONTINUE    Replace the standard prompts
.quit                    Exit this program
.read FILE               Read input from FILE
.recover                 Recover as much data as possible from corrupt db.
.restore ?DB? FILE       Restore content of DB (default "main") from FILE
.save FILE               Write in-memory database into FILE
.scanstats on|off        Turn sqlite3_stmt_scanstatus() metrics on or off
.schema ?PATTERN?        Show the CREATE statements matching PATTERN
.selftest ?OPTIONS?      Run tests defined in the SELFTEST table
.separator COL ?ROW?     Change the column and row separators
.session ?NAME? CMD ...  Create or control sessions
.sha3sum ...             Compute a SHA3 hash of database content
.shell CMD ARGS...       Run CMD ARGS... in a system shell
.show                    Show the current values for various settings
.stats ?ARG?             Show stats or turn stats on or off
.system CMD ARGS...      Run CMD ARGS... in a system shell
.tables ?TABLE?          List names of tables matching LIKE pattern TABLE
.testcase NAME           Begin redirecting output to 'testcase-out.txt'
.testctrl CMD ...        Run various sqlite3_test_control() operations
.timeout MS              Try opening locked tables for MS milliseconds
.timer on|off            Turn SQL timer on or off
.trace ?OPTIONS?         Output each SQL statement as it is run
.vfsinfo ?AUX?           Information about the top-level VFS
.vfslist                 List all available VFSes
.vfsname ?AUX?           Print the name of the VFS stack
.width NUM1 NUM2 ...     Set minimum column widths for columnar output

#17 Python 可迭代对象

2013-04-30

可迭代对象(Iterable)就是可以用 for 循环遍历的数据结构,包括字符串、元组、列表、字典、集合等内置类型,也可以通过一些方法来是实现自定义可迭代对象。

https://docs.python.org/3/glossary.html#term-iterable
https://docs.python.org/3/glossary.html#term-iterator

迭代协议与序列协议

  • 迭代器协议(Iterator Protocol):如果一个对象包含 __next__ 方法,那就是一个迭代器。

迭代器可以使用 next 函数连续获取数据,内部实现就是调用 __next__ 方法获取一个数据项,如果循环结束,就会抛出 StopIteration

上面讲的是 CPython 实现,实际上 Python 语法规范要求迭代器一定要有 __iter__ 方法。
一般实现中,都有加上 __iter__ 方法,直接 return self

  • 可迭代对象协议(Iterable Protocol):如果一个对象包含 __iter__ 方法(返回一个迭代器对象),那么就是一个可迭代对象,

可以用在 for 循环,zipmapfilter 等需要可迭代对象的场景中。

  • 序列对象协议(Sequence Protocol):如果一个对象包含 __getitem__ 方法(使用 0 开始的整数作为索引),和 __len__ 方法,
    就可以执行一些序列对象的操作,比如...(?)。
    字符串,元组,列表都是序列对象。set 无序,dict 是按 key 索引,所以都不是序列对象。

  • 迭代器协议(Iterator Protocol):如果一个对象包含 __iter__ 方法,和 __next__ 方法,那就是一个迭代器。
    迭代器可以使用 next 函数连续获取数据,内部实现就是调用 __next__ 方法获取一个数据项,如果循环结束,就会抛出 StopIteration
    CPython 没有按照语言规范要求迭代器一定要有 __iter__ 方法。但一般实现中,都是加上 __iter__ 方法,return self

  • https://docs.python.org/3/c-api/abstract.html
  • Iterator Protocol https://docs.python.org/3/c-api/iter.html
  • Sequence Protocol https://docs.python.org/3/c-api/sequence.html

iter

iter(iterable)
iter(callable, sentinel) # sentinel:任意值都行

iter(1)
TypeError: 'int' object is not iterable
iter(1, 1)
TypeError: iter(v, w): v must be callable

iter 用于生成一个迭代器。

  • for 循环遍历的时候,会通过 iter 方法创建一个迭代器。
  • 可以创建迭代器的对象,必须支持 iterable protocol (包含 __iter__() 方法),
    或者支持 sequence protocol (包含 __getitem__() 方法,使用 0 开始的整数作为索引)

原理

迭代器(Iterator)

迭代器是可迭代对象的一种。它是一个具有 __iter__()__next__() 方法的对象,可以逐个返回可迭代对象的元素。
字符串、元组、列表、字典、集合这些内置类型就都不是迭代器。

PEP 234 – Iterators
https://peps.python.org/pep-0234/

自定义可迭代对象

class MyIterable:
    def __init__(self, data):
        self.data = data

    def __iter__(self):
        self.index = 0
        return self

    def __next__(self):
        if self.index < len(self.data):
            result = self.data[self.index]
            self.index += 1
            return result
        else:
            raise StopIteration

a = MyIterable(['apple', 'banana', 'cherry'])
for item in a:
    print(item)