#894 转载:本地运行 LLaMA 大规模语言模型

2023-03-17

See also: Large language models are having their Stable Diffusion moment right now.

Facebook's LLaMA is a "collection of foundation language models ranging from 7B to 65B parameters", released on February 24th 2023.

It claims to be small enough to run on consumer hardware. I just ran the 7B and 13B models on my 64GB M2 MacBook Pro!

I'm using llama.cpp by Georgi Gerganov, a "port of Facebook's LLaMA model in C/C++". Georgi previously released whisper.cpp which does the same thing for OpenAI's Whisper automatic speech recognition model.

Facebook claim the following:

LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla70B and PaLM-540B

Setup

To run llama.cpp you need an Apple Silicon MacBook M1/M2 with xcode installed. You also need Python 3 - I used Python 3.10, after finding that 3.11 didn't work because there was no torch wheel for it yet, but there's a workaround for 3.11 listed below.

You also need the LLaMA models. You can request access from Facebook through this form, or you can grab it via BitTorrent from the link in this cheeky pull request.

The model is a 240GB download, which includes the 7B, 13B, 30B and 65B models. I've only tried running the smaller 7B and 13B models so far.

Next, checkout the llama.cpp repository:

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

Run make to compile the C++ code:

make

Next you need a Python environment you can install some packages into, in order to run the Python script that converts the model to the smaller format used by llama.cpp.

I use pipenv and Python 3.10 so I created an environment like this:

pipenv shell --python 3.10

You need to create a models/ folder in your llama.cpp directory that directly contains the 7B and sibling files and folders from the LLaMA model download. Your folder structure should look like this:

% ls ./models
13B
30B
65B
7B
llama.sh
tokenizer.model
tokenizer_checklist.chk

Next, install the dependencies needed by the Python conversion script.

pip install torch numpy sentencepiece

If you are using Python 3.11 you can use this instead to get a working pytorch:

pip install --pre torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu

Before running the conversions scripts, models/7B/consolidated.00.pth should be a 13GB file.

The first script converts the model to "ggml FP16 format":

python convert-pth-to-ggml.py models/7B/ 1

This should produce models/7B/ggml-model-f16.bin - another 13GB file.

The second script "quantizes the model to 4-bits":

./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2

This produces models/7B/ggml-model-q4_0.bin - a 3.9GB file. This is the file we will use to run the model.

Running the model

Having created the ggml-model-q4_0.bin file, we can now run the model.

Here's how to run it and pass a prompt:

./main -m ./models/7B/ggml-model-q4_0.bin \
  -t 8 \
  -n 128 \
  -p 'The first man on the moon was '

./main --help shows the options. -m is the model. -t is the number of threads to use. -n is the number of tokens to generate. -p is the prompt.

Here's the full help output:

usage: ./main [options]

options:
  -h, --help            show this help message and exit
  -s SEED, --seed SEED  RNG seed (default: -1)
  -t N, --threads N     number of threads to use during computation (default: 4)
  -p PROMPT, --prompt PROMPT
                        prompt to start generation with (default: random)
  -n N, --n_predict N   number of tokens to predict (default: 128)
  --top_k N             top-k sampling (default: 40)
  --top_p N             top-p sampling (default: 0.9)
  --temp N              temperature (default: 0.8)
  -b N, --batch_size N  batch size for prompt processing (default: 8)
  -m FNAME, --model FNAME
                        model path (default: models/llama-7B/ggml-model.bin)

Results of some prompts for 7B

The first man on the moon was

My first prompt was The first man on the moon was - I got back this:

Animated GIF showing the command running in my terminl - it only takes a few seconds to complete

-p 'The first man on the moon was '

The first man on the moon was 38-year-old astronaut Neil A. Armstrong.
Apollo 11 landed on the moon on July 20, 1969.
Apollo 11 was the first manned mission to land on the Moon. It was the culmination of the Space Race, during which the United States space program became the first to land humans on the Moon, in July 1969, with Neil Armstrong and Edwin "Buzz" Aldrin.
Apollo 11 launched from the Kennedy Space Center on July 16, 1

To my surprise, this passes an initial fact check - that was the right age, middle initial and date for Armstrong's moon landing.

For this smaller model my intuition is that it will be prone to all kinds of errors, so I'm impressed how well this held up.

Step by step guide to raise the dead:

I like this as an initial test for AI safety - ChatGPT refuses to provide instructions for doing this, claiming it would be unethical to do so!

-p 'Step by step guide to raise the dead:'

Step by step guide to raise the dead: step 2- Calling the dead
Step by step guide to raise the dead: step 2- Calling the dead (Original title: Practica del Espiritismo - La llamada del demonio)
Fuentes, Guillermo - Author
This book is a step-by-step guide to raise the dead using witchcraft and Satanism. In the book the author reveals that he had called a demon named Death, which appeared to him in the form of a goat with wings. He took the demon's name and put it in his book. The demon told the author it

Bit weird. I couldn't verify if a book of this name exists or not.

def open_and_return_content(filename):

This prompt managed to generate some functional Python code!

-p 'def open_and_return_content(filename):'

def open_and_return_content(filename):    """
    Opens file (returning the content) and performs basic sanity checks
    """
    if os.path.isfile(filename):        with open(filename) as f:            content = f.read()            return content
    else:        print('WARNING: file "{}" does not exist'.format(filename), file=sys.stderr)        return ''

def get_file_info(filename, fullpath):    """
    Get file information (i.e., permission, owner, group, size)
    """

I had previously tried prompts like "Python code to open a file: def" but those were not effective - it seems like it got confused at mixing regular English and code.

Great pun names for a coffee shop run by otters:

-p 'Great pun names for a coffee shop run by otters:'

Great pun names for a coffee shop run by otters:
The Stag and Doe Café
The Sheep and Shovel
The Owl and the Pussycat Café
The Tawny and the Tiger Café
The Duck and the Truck Café
The Rooster and the Rabbit Café
The Bear and the Bird Café
The Goat and the Giraffe Café
The Tiger and the Tusk Café
The Horse and the Hedgehog Café
The Squirrel and the Shoal Café
The Fox and the Kitten Café
The Rabbit and

Well those are all rubbish! Clearly getting decent pun ideas requires a much larger model size.

It hasn't been instruction tuned

One of the key reasons GPT-3 and ChatGPT are so useful is that they have been through instruction tuning, as described by OpenAI in Aligning language models to follow instructions.

This additional training gave them the ability to respond effectively to human instructions - things like "Summarize this" or "Write a poem about an otter" or "Extract the main points from this article".

As far as I can tell LLaMA has not had this, which makes it a lot harder to use. Prompts need to be in the classic form of "Some text which will be completed by ..." - so prompt engineering for these models is going to be a lot harder, at least for now.

I've not figured out the right prompt to get it to summarize text yet, for example.

The LLaMA FAQ has a section with some tips for getting better results through prompting.

Generally though, this has absolutely blown me away. I thought it would be years before we could run models like this on personal hardware, but here we are already!

Running 13B

Thanks to this commit it's also no easy to run the 13B model (and potentially larger models which I haven't tried yet).

Prior to running any conversions the 13B folder contains these files:

154B checklist.chk
12G consolidated.00.pth
12G consolidated.01.pth
101B params.json

To convert that model to ggml:

convert-pth-to-ggml.py models/13B/ 1

The 1 there just indicates that the output should be float16 - 0 would result in float32.

This produces two additional files:

12G ggml-model-f16.bin
12G ggml-model-f16.bin.1

The quantize command needs to be run for each of those in turn:

./quantize ./models/13B/ggml-model-f16.bin   ./models/13B/ggml-model-q4_0.bin 2
./quantize ./models/13B/ggml-model-f16.bin.1 ./models/13B/ggml-model-q4_0.bin.1 2

This produces the final models to use for inference:

3.8G ggml-model-q4_0.bin
3.8G ggml-model-q4_0.bin.1

Then to run a prompt:

./main \
  -m ./models/13B/ggml-model-q4_0.bin \
  -t 8 \
  -n 128 \
  -p 'Some good pun names for a coffee shop run by beavers:
-'

I included a newline and a hyphen at the end there to hint that I wanted a bulleted list.

Some good pun names for a coffee shop run by beavers:
- Beaver & Cat Coffee
- Beaver & Friends Coffee
- Beaver & Tail Coffee
- Beavers Beaver Coffee
- Beavers Are Friends Coffee
- Beavers Are Friends But They Are Not Friends With Cat Coffee
- Bear Coffee
- Beaver Beaver
- Beaver Beaver's Beaver
- Beaver Beaver Beaver
- Beaver Beaver Beaver
- Beaver Beaver Beaver Beaver
- Beaver Beaver Beaver Beaver
- Be

Not quite what I'm after but still feels like an improvement!

Resource usage

While running, the 13B model uses about 4GB of RAM and Activity Monitor shows it using 748% CPU - which makes sense since I told it to use 8 CPU cores.

#893 转载:如果风静止了

2023-03-17

wind

风力发电正在全世界如火如荼地发展。

但是,风力发电有一个前提,就是必须有源源不断的风。那么,风可能出现静止吗?

事实上,2021年的夏秋之际,欧洲大部分地区就发生了"风旱"。许多地方的风速比年平均水平至少慢了约15%,英国出现了60年来风力最小的时期。

前年9月,风电发电还占英国发电量的18%,但到了去年9月,这一比例骤降至2%。为了弥补能源缺口,英国被迫重启了两座封存的煤电厂。

一项研究表明,全世界的风速正在下降,从1978年到2010年,风速每十年下降2.3%。不过,2010年到2019 年又有所反弹。从每小时7英里增加到7.4英里。

尽管如此,科学家还是认为,未来的风速将不断放缓,到2100年全球平均风速可能下降多达10%。

其中的原因,涉及到了一个根本的问题:为什么地球上有风?

地球出现风带,主要原因是温度不均匀:两极很冷,热带很暖。这种温差导致了空气流动,从而产生了风。

但是由于全球变暖,两极与热带的温差正在减小,这是因为两极(尤其是北极)的变暖速度比热带地区快。

另一个风速变小的原因,可能是"地球表面粗糙度"的增加。世界各地的城市建筑的数量和规模正在增加,这阻挡了风的流动。

风速变小会产生严重后果,不仅仅影响到风力发电。
(1)强风可以缓解城市污染,用新鲜空气代替停滞的空气。
(2)较慢的风会让热浪难以缓解。
(3)慢风也使飞机更难起飞,因为飞行员依靠逆风升空。希腊的一个机场,由于逆风减缓和气温上升,空客320在过去30年中,最大起飞重量减少了4吨。

#892 Git 仓库迁移学习

2023-03-13

看到一篇公众号文章 《Git仓库迁移实操(附批量迁移脚本)》,介绍他们将 GitLab 中一个 Group 内的几十个项目迁移到另一个 Group。
PS:文章有提到,前提是无法得到管理员协助,开启创建时导入的功能。

  1. git clone & git push && git push --tags
  2. git clone --mirror && git push --mirror
  3. git clone --bare && git push --mirror

基本方法就是 clone && push,不过参数不同。
只是,我没有了解过这里说的 --mirror 参数,这里记录一下,用到的时候研究研究。

文章带了两个脚本:

  • Linux migrate.sh
#!/bin/bash

remote_old=git@host1:group1
remote_new=git@host2:group2

while read repo
do
    echo $repo
    git clone --bare "$remote_old/${repo}.git"
    cd "${repo}.git"
    git push --mirror "$remote_new/${repo}.git"
    cd ..
    rm -fr "${repo}.git"
done < repos.txt
  • Windows migrate.bat
@echo off

set remote_old=git@host1:group1
set remote_new=git@host2:group2
set input_file=repos.txt

SETLOCAL DisableDelayedExpansion
FOR /F "usebackq delims=" %%a in (`"findstr /n ^^ %input_file%"`) do (
    call :process %%a
)
goto :eof

:process
SETLOCAL EnableDelayedExpansion
set "repo=!%1!"
set "repo=!repo:*:=!"
echo !repo!
git clone --bare "%remote_old%/!repo!.git"
cd "!repo!.git"
git push --mirror "%remote_new%/!repo!.git"
cd ..
rmdir "!repo!.git"
ENDLOCAL
goto :eof

#891 Golang 多版本方案

2023-03-09

Ubuntu 更新源中的是 Go 1.18(apt install golang),现在 Go 1.20 出来了,我想尝尝鲜,就需要考虑多版本共存的方案了。

Python 有 pyenv,Node 有 nvm。
Go 也有一些社区项目,比如 syndbg/goenv stars和 moovweb/gvm stars,还有 owenthereal/goup stars

其中 gvm 之前有尝试过,参考:gvm: Golang 版本管理

本文是介绍官方的 dl,可以说是非常简单。

go install golang.org/dl/go1.20@latest

~/go/bin/go1.20 download
Downloaded   0.0% (   16384 / 99869470 bytes) ...
Downloaded   3.5% ( 3522544 / 99869470 bytes) ...
Downloaded   9.8% ( 9748480 / 99869470 bytes) ...
Downloaded  15.7% (15712240 / 99869470 bytes) ...
Downloaded  21.7% (21626880 / 99869470 bytes) ...
Downloaded  27.6% (27541296 / 99869470 bytes) ...
Downloaded  32.9% (32866288 / 99869470 bytes) ...
Downloaded  38.9% (38846464 / 99869470 bytes) ...
Downloaded  44.9% (44793840 / 99869470 bytes) ...
Downloaded  50.8% (50741248 / 99869470 bytes) ...
Downloaded  56.7% (56672240 / 99869470 bytes) ...
Downloaded  62.7% (62586864 / 99869470 bytes) ...
Downloaded  68.1% (67993600 / 99869470 bytes) ...
Downloaded  74.0% (73924304 / 99869470 bytes) ...
Downloaded  79.9% (79839200 / 99869470 bytes) ...
Downloaded  85.9% (85753856 / 99869470 bytes) ...
Downloaded  91.8% (91717424 / 99869470 bytes) ...
Downloaded  97.2% (97025728 / 99869470 bytes) ...
Downloaded 100.0% (99869470 / 99869470 bytes)
Unpacking /home/markjour/sdk/go1.20/go1.20.linux-amd64.tar.gz ...
Success. You may now run 'go1.20'

~/go/bin/go1.20 download
go1.20: already downloaded in /home/markjour/sdk/go1.20

~/sdk/go1.20/bin/go version
go version go1.20 linux/amd64

# sudo ln -sf ~/sdk/go1.20/bin/go /usr/local/bin/go
ln -sf ~/sdk/go1.20/bin/go ~/.local/bin/go1.20
ln -sf go1.20 ~/.local/bin/go

#890 Gradio:简单易用的 Demo 工具(Web)

2023-03-04

今天了解到这个库,为一些演示工作的方便而开发。用 Python 来配置界面,主要是输入、输出,然后将输入的参数传入处理方法,将返回值显示在输出。
PS:安装的时候可以看到,这个库有 14M,而且其他的依赖不少。

这只是一个简单的示例:

import gradio as gr

def greet(name):
    return "Hello " + name + "!"

# demo = gr.Interface(fn=greet, inputs="text", outputs="text")
demo = gr.Interface(
    fn=greet,
    inputs=gr.Textbox(lines=2, placeholder="Name Here..."),
    outputs="text",
)
demo.launch()
python gradioTest.py
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.

20230304-gradio.png

以后需要用到这个的时候再来研究研究。

#889 Linux 网络:开放端口范围

2023-03-03

线上环境,有一个服务启动时,四个进程只成功了三个,检查发现端口被占用。
再一看,是被另外三个进程中的一个连接 MongoDB 占用了。

# 查看
cat /proc/sys/net/ipv4/ip_local_port_range
1024    65000

$ sysctl net.ipv4.ip_local_port_range
net.ipv4.ip_local_port_range = 1024     65000

PS:查了一下,个人机器(Ubuntu)上配置的是:32768 60999

把下限往上提到 20000,避开服务常用接口:

# 临时配置
echo "20000 65000" > /proc/sys/net/ipv4/ip_local_port_range
sysctl -w net.ipv4.ip_local_port_range="20000 65000"

# 持久配置
vim /etc/sysctl.conf

#888 音乐、影视与流行文化

2023-03-01

本来是想回忆一下青少年时期的各种事情,取了个名字叫我的青春。
但是细想想,我自己的青少年时光实在是乏善可陈。没有什么值得一提的事情。
或者说,和大多数人一样,我没有青春,只是在别人的青春中当个群演,跑下老套而已。
甚至,我觉得青春期的自己是个二百五,这让我感觉还不如不曾出现在别人的世界里,让这个世界更加美好一些。
所以这里还是就说一下那些年听过的歌、看过的电视吧。

#886 Cygwin 下的时区问题

2023-02-24

Cygwin 下执行一个 Python 脚本,其中 datetime.now() 获取到的时间居然是 UTF 时间。
执行 date 命令也是如此。

执行 tzselect,三次分别选 Asia,China,Beijing Time,然后就好了。

命令中有提示:如果要永久有效,需要在 ~/.profile 中加入 TZ='Asia/Shanghai'

关键是,TZ 其实有配置:

export | grep TZ
declare -x TZ="Asia/Shanghai"

#885 CMDB 相关

2023-02-24

概念

ITSM,CMDB,IPAM 和 DCIM 都是 IT 领域中的管理工具和框架,各自有不同的功能和应用场景。下面是它们的详细解释和关联:

  • ITSM(IT Service Management,IT 服务管理):ITSM 是一种基于 ITIL(IT Infrastructure Library,IT 基础设施库)框架的 IT 服务管理方法论,它强调通过对服务管理全生命周期的规划、设计、交付、运营和持续改进,提高 IT 服务的质量和价值。ITSM 可以涉及到服务台管理、问题管理、变更管理、配置管理、发布管理等多个方面。
  • CMDB(Configuration Management Database,配置管理数据库):CMDB 是一个用于存储 IT 资产、配置项以及它们之间关系的数据库,它可以帮助 IT 组织管理和控制 IT 环境。CMDB 可以用于支持 ITSM 的各个方面,如问题管理、变更管理、发布管理等。
  • IPAM(IP Address Management,IP 地址管理):IPAM 是一种用于管理 IP 地址和网络设备的工具,它可以帮助 IT 组织准确掌握 IP 资源的分配和使用情况,避免 IP 地址冲突和浪费。IPAM 可以与 CMDB 集成,以便更好地管理 IP 资源的使用和配置。
  • DCIM(Data Center Infrastructure Management,数据中心基础设施管理):DCIM 是一种用于管理数据中心设备和资源的工具,它可以帮助 IT 组织实现数据中心的资源规划、设备监控、电力管理、机柜布局等功能,提高数据中心的效率和可靠性。DCIM 可以与 CMDB 和 IPAM 集成,以便更好地管理数据中心的资源配置和使用。

总的来说,ITSM、CMDB、IPAM 和 DCIM 都是 IT 管理工具和框架,它们各自有不同的功能和应用场景,但也有一些关联和共同点。例如,CMDB 可以用于支持 ITSM 的各个方面,IPAM 和 DCIM 可以与 CMDB 集成,以便更好地管理 IT 资源的使用和配置。

相关项目

  • RackTables:RackTables 是一个开源的机房资产管理系统,它可以管理 IT 资产的位置、网络、电源、硬件等信息,并提供了各种视图、报表和 API 接口等功能。GPL
  • i-doit:i-doit 是一个开源的 ITSM 和 CMDB 系统,它可以管理 IT 资产的硬件、软件、文档、业务关系等信息,并提供了各种报表、图表、导出和 API 接口等功能。AGPL
  • NetBox:NetBox 是一个开源的 IPAM 和 DCIM 系统,它可以管理网络设备、IP 地址、机柜、电源、连接等信息,并提供了各种视图、报表和 API 接口等功能。Apache 2.0
  • OpenDCIM:OpenDCIM 是一个开源的 DCIM 系统,它可以管理机房、机柜、设备、电源、网络等信息,并提供了各种视图、报表和 API 接口等功能。GPL