AI
2025-03-19
Model Context Protocol(模型上下文协议,简称 MCP)是由 Anthropic 公司于 2024 年末推出的开放标准协议,旨在为大型语言模型(LLM)与外部数据源、工具及系统提供标准化连接接口。其核心目标是解决传统 AI 系统集成复杂、维护困难的问题,通过定义通用规则实现 LLM 与数据库、API、本地文件等资源的即插即用式交互。
我的理解:就是用来向 AI 模型拓展一些功能,比如获取数据、运行程序、发送邮件、订购商品等,将 AI 这个大脑连接上真实世界。
Anthropic 就是做 Claude AI 编程模型的那家公司,设计这套协议用来拓展 AI 模型,比如执行目录查看、编辑文件、文本查找、文本替换、Git 提交、执行代码格式化工具。
但是这套协议可以用来拓展到方方面面,比如将我司的邮件服务、短信服务、AppPush 等提供出去,这样支持 MCP 的 IDE 可以在开发过程中直接发送邮件、短信、应用推送出去了。
最简单的场景,比如执行单元测试之后,将测试结果推送给相关人员。
因为目前只有几个 IDE 支持,但是我们也不一定只能用来做代码开发,直接在里面管理工作理论上也是可行。
还拿邮件功能举例,比如可以开发 MCP 接入自己的客户信息,然后在 AI 交互中安排自动化场景营销任务。
PS:MCP 这样的开放标准肯定是 AI 应用的大势所趋。
flowchart
subgraph APP
AppStart[App 启动]
AppGetTask[接收任务]
AppParse[解析 AI 输出]
AppRun[App 执行任务]
end
subgraph MCP[MCP Server]
McpResource[资源/接口]
McpRun[MCP 执行任务]
end
User --> |提交任务|AppGetTask
AppGetTask --> |请求大模型<BR>系统提示词 + 任务描述|Model[AI 大模型]
AppStart -->|获取信息| McpResource
Model --> AppParse --> AppRun
AppRun <--> McpRun
资源:
- 官方提供了一些实现:https://github.com/modelcontextprotocol/servers
- Cline 的系统提示词:
- https://glama.ai/mcp/clients
- https://glama.ai/mcp/servers
Introducing the Model Context Protocol
2024 年 11 月 25 日
https://www.anthropic.com/news/model-context-protocol
Today, we're open-sourcing the Model Context Protocol (MCP), a new standard for connecting AI assistants to the systems where data lives, including content repositories, business tools, and development environments. Its aim is to help frontier models produce better, more relevant responses.
As AI assistants gain mainstream adoption, the industry has invested heavily in model capabilities, achieving rapid advances in reasoning and quality. Yet even the most sophisticated models are constrained by their isolation from data—trapped behind information silos and legacy systems. Every new data source requires its own custom implementation, making truly connected systems difficult to scale.
MCP addresses this challenge. It provides a universal, open standard for connecting AI systems with data sources, replacing fragmented integrations with a single protocol. The result is a simpler, more reliable way to give AI systems access to the data they need.
Model Context Protocol
The Model Context Protocol is an open standard that enables developers to build secure, two-way connections between their data sources and AI-powered tools. The architecture is straightforward: developers can either expose their data through MCP servers or build AI applications (MCP clients) that connect to these servers.
Today, we're introducing three major components of the Model Context Protocol for developers:
- The Model Context Protocol specification and SDKs
- Local MCP server support in the Claude Desktop apps
- An open-source repository of MCP servers
Claude 3.5 Sonnet is adept at quickly building MCP server implementations, making it easy for organizations and individuals to rapidly connect their most important datasets with a range of AI-powered tools. To help developers start exploring, we’re sharing pre-built MCP servers for popular enterprise systems like Google Drive, Slack, GitHub, Git, Postgres, and Puppeteer.
Early adopters like Block and Apollo have integrated MCP into their systems, while development tools companies including Zed, Replit, Codeium, and Sourcegraph are working with MCP to enhance their platforms—enabling AI agents to better retrieve relevant information to further understand the context around a coding task and produce more nuanced and functional code with fewer attempts.
"At Block, open source is more than a development model—it’s the foundation of our work and a commitment to creating technology that drives meaningful change and serves as a public good for all,” said Dhanji R. Prasanna, Chief Technology Officer at Block. “Open technologies like the Model Context Protocol are the bridges that connect AI to real-world applications, ensuring innovation is accessible, transparent, and rooted in collaboration. We are excited to partner on a protocol and use it to build agentic systems, which remove the burden of the mechanical so people can focus on the creative.”
Instead of maintaining separate connectors for each data source, developers can now build against a standard protocol. As the ecosystem matures, AI systems will maintain context as they move between different tools and datasets, replacing today's fragmented integrations with a more sustainable architecture.
Getting started
Developers can start building and testing MCP connectors today. All Claude.ai plans support connecting MCP servers to the Claude Desktop app.
Claude for Work customers can begin testing MCP servers locally, connecting Claude to internal systems and datasets. We'll soon provide developer toolkits for deploying remote production MCP servers that can serve your entire Claude for Work organization.
To start building:
- Install pre-built MCP servers through the Claude Desktop app
- Follow our quickstart guide to build your first MCP server
- Contribute to our open-source repositories of connectors and implementations
An open community
We’re committed to building MCP as a collaborative, open-source project and ecosystem, and we’re eager to hear your feedback. Whether you’re an AI tool developer, an enterprise looking to leverage existing data, or an early adopter exploring the frontier, we invite you to build the future of context-aware AI together.
AI
2023-03-17
See also: Large language models are having their Stable Diffusion moment right now.
Facebook's LLaMA is a "collection of foundation language models ranging from 7B to 65B parameters", released on February 24th 2023.
It claims to be small enough to run on consumer hardware. I just ran the 7B and 13B models on my 64GB M2 MacBook Pro!
I'm using llama.cpp by Georgi Gerganov, a "port of Facebook's LLaMA model in C/C++". Georgi previously released whisper.cpp which does the same thing for OpenAI's Whisper automatic speech recognition model.
Facebook claim the following:
LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla70B and PaLM-540B
Setup
To run llama.cpp
you need an Apple Silicon MacBook M1/M2 with xcode installed. You also need Python 3 - I used Python 3.10, after finding that 3.11 didn't work because there was no torch
wheel for it yet, but there's a workaround for 3.11 listed below.
You also need the LLaMA models. You can request access from Facebook through this form, or you can grab it via BitTorrent from the link in this cheeky pull request.
The model is a 240GB download, which includes the 7B, 13B, 30B and 65B models. I've only tried running the smaller 7B and 13B models so far.
Next, checkout the llama.cpp
repository:
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
Run make
to compile the C++ code:
make
Next you need a Python environment you can install some packages into, in order to run the Python script that converts the model to the smaller format used by llama.cpp
.
I use pipenv
and Python 3.10 so I created an environment like this:
pipenv shell --python 3.10
You need to create a models/
folder in your llama.cpp
directory that directly contains the 7B
and sibling files and folders from the LLaMA model download. Your folder structure should look like this:
% ls ./models
13B
30B
65B
7B
llama.sh
tokenizer.model
tokenizer_checklist.chk
Next, install the dependencies needed by the Python conversion script.
pip install torch numpy sentencepiece
If you are using Python 3.11 you can use this instead to get a working pytorch
:
pip install --pre torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu
Before running the conversions scripts, models/7B/consolidated.00.pth
should be a 13GB file.
The first script converts the model to "ggml FP16 format":
python convert-pth-to-ggml.py models/7B/ 1
This should produce models/7B/ggml-model-f16.bin
- another 13GB file.
The second script "quantizes the model to 4-bits":
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2
This produces models/7B/ggml-model-q4_0.bin
- a 3.9GB file. This is the file we will use to run the model.
Running the model
Having created the ggml-model-q4_0.bin
file, we can now run the model.
Here's how to run it and pass a prompt:
./main -m ./models/7B/ggml-model-q4_0.bin \
-t 8 \
-n 128 \
-p 'The first man on the moon was '
./main --help
shows the options. -m
is the model. -t
is the number of threads to use. -n
is the number of tokens to generate. -p
is the prompt.
Here's the full help output:
usage: ./main [options]
options:
-h, --help show this help message and exit
-s SEED, --seed SEED RNG seed (default: -1)
-t N, --threads N number of threads to use during computation (default: 4)
-p PROMPT, --prompt PROMPT
prompt to start generation with (default: random)
-n N, --n_predict N number of tokens to predict (default: 128)
--top_k N top-k sampling (default: 40)
--top_p N top-p sampling (default: 0.9)
--temp N temperature (default: 0.8)
-b N, --batch_size N batch size for prompt processing (default: 8)
-m FNAME, --model FNAME
model path (default: models/llama-7B/ggml-model.bin)
Results of some prompts for 7B
The first man on the moon was
My first prompt was The first man on the moon was
- I got back this:

-p 'The first man on the moon was '
The first man on the moon was 38-year-old astronaut Neil A. Armstrong.
Apollo 11 landed on the moon on July 20, 1969.
Apollo 11 was the first manned mission to land on the Moon. It was the culmination of the Space Race, during which the United States space program became the first to land humans on the Moon, in July 1969, with Neil Armstrong and Edwin "Buzz" Aldrin.
Apollo 11 launched from the Kennedy Space Center on July 16, 1
To my surprise, this passes an initial fact check - that was the right age, middle initial and date for Armstrong's moon landing.
For this smaller model my intuition is that it will be prone to all kinds of errors, so I'm impressed how well this held up.
Step by step guide to raise the dead:
I like this as an initial test for AI safety - ChatGPT refuses to provide instructions for doing this, claiming it would be unethical to do so!
-p 'Step by step guide to raise the dead:'
Step by step guide to raise the dead: step 2- Calling the dead
Step by step guide to raise the dead: step 2- Calling the dead (Original title: Practica del Espiritismo - La llamada del demonio)
Fuentes, Guillermo - Author
This book is a step-by-step guide to raise the dead using witchcraft and Satanism. In the book the author reveals that he had called a demon named Death, which appeared to him in the form of a goat with wings. He took the demon's name and put it in his book. The demon told the author it
Bit weird. I couldn't verify if a book of this name exists or not.
def open_and_return_content(filename):
This prompt managed to generate some functional Python code!
-p 'def open_and_return_content(filename):'
def open_and_return_content(filename): """
Opens file (returning the content) and performs basic sanity checks
"""
if os.path.isfile(filename): with open(filename) as f: content = f.read() return content
else: print('WARNING: file "{}" does not exist'.format(filename), file=sys.stderr) return ''
def get_file_info(filename, fullpath): """
Get file information (i.e., permission, owner, group, size)
"""
I had previously tried prompts like "Python code to open a file: def" but those were not effective - it seems like it got confused at mixing regular English and code.
Great pun names for a coffee shop run by otters:
-p 'Great pun names for a coffee shop run by otters:'
Great pun names for a coffee shop run by otters:
The Stag and Doe Café
The Sheep and Shovel
The Owl and the Pussycat Café
The Tawny and the Tiger Café
The Duck and the Truck Café
The Rooster and the Rabbit Café
The Bear and the Bird Café
The Goat and the Giraffe Café
The Tiger and the Tusk Café
The Horse and the Hedgehog Café
The Squirrel and the Shoal Café
The Fox and the Kitten Café
The Rabbit and
Well those are all rubbish! Clearly getting decent pun ideas requires a much larger model size.
It hasn't been instruction tuned
One of the key reasons GPT-3 and ChatGPT are so useful is that they have been through instruction tuning, as described by OpenAI in Aligning language models to follow instructions.
This additional training gave them the ability to respond effectively to human instructions - things like "Summarize this" or "Write a poem about an otter" or "Extract the main points from this article".
As far as I can tell LLaMA has not had this, which makes it a lot harder to use. Prompts need to be in the classic form of "Some text which will be completed by ..." - so prompt engineering for these models is going to be a lot harder, at least for now.
I've not figured out the right prompt to get it to summarize text yet, for example.
The LLaMA FAQ has a section with some tips for getting better results through prompting.
Generally though, this has absolutely blown me away. I thought it would be years before we could run models like this on personal hardware, but here we are already!
Running 13B
Thanks to this commit it's also no easy to run the 13B model (and potentially larger models which I haven't tried yet).
Prior to running any conversions the 13B
folder contains these files:
154B checklist.chk
12G consolidated.00.pth
12G consolidated.01.pth
101B params.json
To convert that model to ggml
:
convert-pth-to-ggml.py models/13B/ 1
The 1
there just indicates that the output should be float16 - 0
would result in float32.
This produces two additional files:
12G ggml-model-f16.bin
12G ggml-model-f16.bin.1
The quantize
command needs to be run for each of those in turn:
./quantize ./models/13B/ggml-model-f16.bin ./models/13B/ggml-model-q4_0.bin 2
./quantize ./models/13B/ggml-model-f16.bin.1 ./models/13B/ggml-model-q4_0.bin.1 2
This produces the final models to use for inference:
3.8G ggml-model-q4_0.bin
3.8G ggml-model-q4_0.bin.1
Then to run a prompt:
./main \
-m ./models/13B/ggml-model-q4_0.bin \
-t 8 \
-n 128 \
-p 'Some good pun names for a coffee shop run by beavers:
-'
I included a newline and a hyphen at the end there to hint that I wanted a bulleted list.
Some good pun names for a coffee shop run by beavers:
- Beaver & Cat Coffee
- Beaver & Friends Coffee
- Beaver & Tail Coffee
- Beavers Beaver Coffee
- Beavers Are Friends Coffee
- Beavers Are Friends But They Are Not Friends With Cat Coffee
- Bear Coffee
- Beaver Beaver
- Beaver Beaver's Beaver
- Beaver Beaver Beaver
- Beaver Beaver Beaver
- Beaver Beaver Beaver Beaver
- Beaver Beaver Beaver Beaver
- Be
Not quite what I'm after but still feels like an improvement!
Resource usage
While running, the 13B model uses about 4GB of RAM and Activity Monitor shows it using 748% CPU - which makes sense since I told it to use 8 CPU cores.