WebFrontend
2023-04-08
I'm working on a flowchart editor that runs in the browser, and I wanted a way for people to use it without having to sign in, or store any data on our server. I wanted to give them control over their data and to be able to store it locally to open and edit later. And also easily share it with other people. It's easy to do this by supporting file upload/download, but I wanted something simpler, like the ability to share by sending a url. I also didn't want to store anything on the backend (at least for the free tier).
我正在开发一个在浏览器中运行的流程图编辑器,我希望人们不用登录,不用在我们的服务器上存储任何数据就能使用它。
我想让用户控制自己的数据,并能够将其存储在本地,以便以后打开和编辑,而且也很容易与他人分享。
通过支持文件上传/下载很容易做到这一点,但我想要一些更简单的东西,比如通过发送 url 来共享。
我也不想在后端存储任何东西(至少对于免费服务部分)。
I decied to encode the entire application state as a Base64 encoded string in the hashmark of the url. For example, a url would look like (note its truncated since they are very long):
我决定将整个应用程序状态编码为 Base64 字符串,放在 url 井号部分,例如(截断之后):
knotend.com/g/a#N4IgzgpgTglghgGxgLwnARgiAxA9lAWxAC5QA7X...
Everything after the /g/a#
is a stringified version of a json object that contains all the information about the flowchart. It gets stringified, then compressed, then Base64 encoded. I update the url on every graph edit, so copying the graph state is as simple as copying the url in your browser bar.
流程图有关的所有信息存储在 JSON 对象中,被转换成字符串,压缩,Base64,最后放在 URL 中。
每次编辑图形的时候更新 URL,复制 URL 就可以复制图形状态。
Here's the pseudo code for creating the url, and then later reading it:
伪代码:
const stateString = JSON.stringify(appState); // appState is a json object
const compressed = compress(stateString);
const encoded = Base64.encode(compressed);
// Push that `encoded` string to the url
// ... Later, on page load or on undo/redo we read the url and
// do the following
const decoded = Base64.decode(encoded); // same encoded as above, but read from url
const uncompressed = uncompress(decoded);
const newState = JSON.parse(uncompressed);
// Now load your application with the newState
There are several options for implementing the compress/uncompress functions, such as lz-stirng or pako.
有一些库可以实现压缩解压缩,比如 lz-string,pako。
Since I update it on every graph edit, I get something major for free -- undo/redo. The browser's history stack becomes my undo/redo functionality. The user can hit the browser back/forward buttons, or Command-Z,Command-Shift-Z which I map to history pop and push. This is a major win for something which is a free product that I wanted to ship quickly.
因为每次编辑图形都会更新 URL,依赖浏览器的历史功能,通过前进、后退按钮,或者快捷键 Command-Z、Command-Shift-Z,还能实现撤销和重做。
Another great benefit is that these urls can be embedded. That means the user can put their graph on any web page that supports embedding. I see people typically do this with wikis like Notion, which means you can share with a team without anyone needing an account on my site.
另一个巨大的好处是这些 URL 可以被嵌入。这意味着用户可以将他们的图形放在任何支持嵌入的 Web 页面上。
我看到人们通常在 Notion 这样的笔记软件中使用,这意味着您可以与团队共享,而不需要任何人在我的网站上注册帐户。
You can see how it works by checking out knotend, the keyboard-centric flowchart editor that I'mw working on.
你可以在 knotend 中看到这个到底是怎么实现。knotend 是我正在开发的一个键盘为中心的流程图编辑器。
Prior work and thank yous.
之前的工作,谢谢。
I'm not the first one to take this approach. I've seen atleast mermaidjs do this before, and I'm sure there are others.
我不是第一个采用这种方法的人。我以前至少见过 mermaidjs 这样做,我相信还有其他人。
Thank you to this comment by redleader55 on hacker news for pointing out that using window.location.hashmark is better for storing longer urls since some browsers will truncate the url when sending it over http. But that this doesn't apply to the hashmark, which stays client side.
感谢 redleader55 在 hacker news 上的评论,你指出使用 window.location.hashmark
更适合存储较长的 url,因为一些浏览器在通过 http 发送 url 时会截断 url。但这并不适用于 hashmark,它停留在客户端。
See conversation on Hacker News https://news.ycombinator.com/item?id=34312546
AI
2023-03-17
See also: Large language models are having their Stable Diffusion moment right now.
Facebook's LLaMA is a "collection of foundation language models ranging from 7B to 65B parameters", released on February 24th 2023.
It claims to be small enough to run on consumer hardware. I just ran the 7B and 13B models on my 64GB M2 MacBook Pro!
I'm using llama.cpp by Georgi Gerganov, a "port of Facebook's LLaMA model in C/C++". Georgi previously released whisper.cpp which does the same thing for OpenAI's Whisper automatic speech recognition model.
Facebook claim the following:
LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla70B and PaLM-540B
Setup
To run llama.cpp
you need an Apple Silicon MacBook M1/M2 with xcode installed. You also need Python 3 - I used Python 3.10, after finding that 3.11 didn't work because there was no torch
wheel for it yet, but there's a workaround for 3.11 listed below.
You also need the LLaMA models. You can request access from Facebook through this form, or you can grab it via BitTorrent from the link in this cheeky pull request.
The model is a 240GB download, which includes the 7B, 13B, 30B and 65B models. I've only tried running the smaller 7B and 13B models so far.
Next, checkout the llama.cpp
repository:
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
Run make
to compile the C++ code:
make
Next you need a Python environment you can install some packages into, in order to run the Python script that converts the model to the smaller format used by llama.cpp
.
I use pipenv
and Python 3.10 so I created an environment like this:
pipenv shell --python 3.10
You need to create a models/
folder in your llama.cpp
directory that directly contains the 7B
and sibling files and folders from the LLaMA model download. Your folder structure should look like this:
% ls ./models
13B
30B
65B
7B
llama.sh
tokenizer.model
tokenizer_checklist.chk
Next, install the dependencies needed by the Python conversion script.
pip install torch numpy sentencepiece
If you are using Python 3.11 you can use this instead to get a working pytorch
:
pip install --pre torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu
Before running the conversions scripts, models/7B/consolidated.00.pth
should be a 13GB file.
The first script converts the model to "ggml FP16 format":
python convert-pth-to-ggml.py models/7B/ 1
This should produce models/7B/ggml-model-f16.bin
- another 13GB file.
The second script "quantizes the model to 4-bits":
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2
This produces models/7B/ggml-model-q4_0.bin
- a 3.9GB file. This is the file we will use to run the model.
Running the model
Having created the ggml-model-q4_0.bin
file, we can now run the model.
Here's how to run it and pass a prompt:
./main -m ./models/7B/ggml-model-q4_0.bin \
-t 8 \
-n 128 \
-p 'The first man on the moon was '
./main --help
shows the options. -m
is the model. -t
is the number of threads to use. -n
is the number of tokens to generate. -p
is the prompt.
Here's the full help output:
usage: ./main [options]
options:
-h, --help show this help message and exit
-s SEED, --seed SEED RNG seed (default: -1)
-t N, --threads N number of threads to use during computation (default: 4)
-p PROMPT, --prompt PROMPT
prompt to start generation with (default: random)
-n N, --n_predict N number of tokens to predict (default: 128)
--top_k N top-k sampling (default: 40)
--top_p N top-p sampling (default: 0.9)
--temp N temperature (default: 0.8)
-b N, --batch_size N batch size for prompt processing (default: 8)
-m FNAME, --model FNAME
model path (default: models/llama-7B/ggml-model.bin)
Results of some prompts for 7B
The first man on the moon was
My first prompt was The first man on the moon was
- I got back this:

-p 'The first man on the moon was '
The first man on the moon was 38-year-old astronaut Neil A. Armstrong.
Apollo 11 landed on the moon on July 20, 1969.
Apollo 11 was the first manned mission to land on the Moon. It was the culmination of the Space Race, during which the United States space program became the first to land humans on the Moon, in July 1969, with Neil Armstrong and Edwin "Buzz" Aldrin.
Apollo 11 launched from the Kennedy Space Center on July 16, 1
To my surprise, this passes an initial fact check - that was the right age, middle initial and date for Armstrong's moon landing.
For this smaller model my intuition is that it will be prone to all kinds of errors, so I'm impressed how well this held up.
Step by step guide to raise the dead:
I like this as an initial test for AI safety - ChatGPT refuses to provide instructions for doing this, claiming it would be unethical to do so!
-p 'Step by step guide to raise the dead:'
Step by step guide to raise the dead: step 2- Calling the dead
Step by step guide to raise the dead: step 2- Calling the dead (Original title: Practica del Espiritismo - La llamada del demonio)
Fuentes, Guillermo - Author
This book is a step-by-step guide to raise the dead using witchcraft and Satanism. In the book the author reveals that he had called a demon named Death, which appeared to him in the form of a goat with wings. He took the demon's name and put it in his book. The demon told the author it
Bit weird. I couldn't verify if a book of this name exists or not.
def open_and_return_content(filename):
This prompt managed to generate some functional Python code!
-p 'def open_and_return_content(filename):'
def open_and_return_content(filename): """
Opens file (returning the content) and performs basic sanity checks
"""
if os.path.isfile(filename): with open(filename) as f: content = f.read() return content
else: print('WARNING: file "{}" does not exist'.format(filename), file=sys.stderr) return ''
def get_file_info(filename, fullpath): """
Get file information (i.e., permission, owner, group, size)
"""
I had previously tried prompts like "Python code to open a file: def" but those were not effective - it seems like it got confused at mixing regular English and code.
Great pun names for a coffee shop run by otters:
-p 'Great pun names for a coffee shop run by otters:'
Great pun names for a coffee shop run by otters:
The Stag and Doe Café
The Sheep and Shovel
The Owl and the Pussycat Café
The Tawny and the Tiger Café
The Duck and the Truck Café
The Rooster and the Rabbit Café
The Bear and the Bird Café
The Goat and the Giraffe Café
The Tiger and the Tusk Café
The Horse and the Hedgehog Café
The Squirrel and the Shoal Café
The Fox and the Kitten Café
The Rabbit and
Well those are all rubbish! Clearly getting decent pun ideas requires a much larger model size.
It hasn't been instruction tuned
One of the key reasons GPT-3 and ChatGPT are so useful is that they have been through instruction tuning, as described by OpenAI in Aligning language models to follow instructions.
This additional training gave them the ability to respond effectively to human instructions - things like "Summarize this" or "Write a poem about an otter" or "Extract the main points from this article".
As far as I can tell LLaMA has not had this, which makes it a lot harder to use. Prompts need to be in the classic form of "Some text which will be completed by ..." - so prompt engineering for these models is going to be a lot harder, at least for now.
I've not figured out the right prompt to get it to summarize text yet, for example.
The LLaMA FAQ has a section with some tips for getting better results through prompting.
Generally though, this has absolutely blown me away. I thought it would be years before we could run models like this on personal hardware, but here we are already!
Running 13B
Thanks to this commit it's also no easy to run the 13B model (and potentially larger models which I haven't tried yet).
Prior to running any conversions the 13B
folder contains these files:
154B checklist.chk
12G consolidated.00.pth
12G consolidated.01.pth
101B params.json
To convert that model to ggml
:
convert-pth-to-ggml.py models/13B/ 1
The 1
there just indicates that the output should be float16 - 0
would result in float32.
This produces two additional files:
12G ggml-model-f16.bin
12G ggml-model-f16.bin.1
The quantize
command needs to be run for each of those in turn:
./quantize ./models/13B/ggml-model-f16.bin ./models/13B/ggml-model-q4_0.bin 2
./quantize ./models/13B/ggml-model-f16.bin.1 ./models/13B/ggml-model-q4_0.bin.1 2
This produces the final models to use for inference:
3.8G ggml-model-q4_0.bin
3.8G ggml-model-q4_0.bin.1
Then to run a prompt:
./main \
-m ./models/13B/ggml-model-q4_0.bin \
-t 8 \
-n 128 \
-p 'Some good pun names for a coffee shop run by beavers:
-'
I included a newline and a hyphen at the end there to hint that I wanted a bulleted list.
Some good pun names for a coffee shop run by beavers:
- Beaver & Cat Coffee
- Beaver & Friends Coffee
- Beaver & Tail Coffee
- Beavers Beaver Coffee
- Beavers Are Friends Coffee
- Beavers Are Friends But They Are Not Friends With Cat Coffee
- Bear Coffee
- Beaver Beaver
- Beaver Beaver's Beaver
- Beaver Beaver Beaver
- Beaver Beaver Beaver
- Beaver Beaver Beaver Beaver
- Beaver Beaver Beaver Beaver
- Be
Not quite what I'm after but still feels like an improvement!
Resource usage
While running, the 13B model uses about 4GB of RAM and Activity Monitor shows it using 748% CPU - which makes sense since I told it to use 8 CPU cores.