Ollama 是一个开源的人工智能平台，它主要用于提供强大的 AI 模型接口，并帮助开发者和团队快速构建、集成和管理 AI 应用。Ollama 通过提供本地化的 AI 模型，可以帮助用户在不依赖外部 API 的情况下运行 AI 应用。

1. Ollama Windows

首先从 https://ollama.com/ 下载对系统版本的 ollama，本地双击安装之后，打开 cmd 并输入 ollama 命令，如果出现以下提示，则表示安装成功。

上图中展示出来的操作命令作用如下：

# 1. 列出所有的模型
ollama list
NAME                       ID              SIZE      MODIFIED
deepseek-r1:8b             28f8fd6cdc67    4.9 GB    14 hours ago
nomic-embed-text:latest    0a109f422b47    274 MB    17 hours ago
deepseek-r1:1.5b           a42b25d8c10a    1.1 GB    20 hours ago

# 2. 显示模型详细信息
ollama show deepseek-r1:1.5b
  Model
    architecture        qwen2
    parameters          1.8B
    context length      131072
    embedding length    1536
    quantization        Q4_K_M

  Parameters
    stop    "<｜begin▁of▁sentence｜>"
    stop    "<｜end▁of▁sentence｜>"
    stop    "<｜User｜>"
    stop    "<｜Assistant｜>"

  License
    MIT License
    Copyright (c) 2023 DeepSeek


# 3. 复制一个模型
ollama cp deepseek-r1:1.5b new-deepseek-r1:1.5b
copied 'deepseek-r1:1.5b' to 'new-deepseek-r1:1.5b'

# 4. 列出正在运行的模型
ollama ps
NAME                       ID              SIZE      PROCESSOR    UNTIL
nomic-embed-text:latest    0a109f422b47    849 MB    100% GPU     2 minutes from now

# 5. 删除一个模型
C:\Users\china\Desktop>ollama rm new-deepseek-r1:1.5b
deleted 'new-deepseek-r1:1.5b'

# 6. 从模型注册表中拉取一个模型
ollama pull nomic-embed-text:latest
pulling manifest
pulling 970aa74c0a90... 100% ▕████████████████████████████████████████████████████████▏ 274 MB
pulling c71d239df917... 100% ▕████████████████████████████████████████████████████████▏  11 KB
pulling ce4a164fc046... 100% ▕████████████████████████████████████████████████████████▏   17 B
pulling 31df23ea7daa... 100% ▕████████████████████████████████████████████████████████▏  420 B
verifying sha256 digest
writing manifest
success

# 7. 运行一个模型
ollama run deepseek-r1:8b

# 8. 停止运行的模型
ollama stop deepseek-r1:8b

# 9. 启动 Ollama 的 API 服务
# set OLLAMA_HOST="0.0.0.0:19988" 设置绑定IP和端口
ollama serve

2. Open WebUI

GitHub：https://github.com/open-webui/open-webui，安装过程如下：

首先，安装 Python 3.11 以上的版本；
然后，使用 pip install open-webui 命令安装交互界面；
接着，在 cmd 中输入 open-webui serve 命令启动服务
最后，在浏览器输入 http://localhost:8080 打开界面

注意：初次打开页面需要进行用户注册。

ollama 提供的模型 API 默认是通过 127.0.0.1:11434 来访问，如果进行过自定义，则需要进行如下配置：

3. Ollama Python Library

Ollama Python Library 为在 Python 3.8 及更高版本的项目中集成 Ollama 提供了简便的方式。使用下面命令来安装该库：

pip install ollama

https://github.com/ollama/ollama-python

3.1 模型操作

import ollama
from tqdm import tqdm

def test():
    client = ollama.Client(host='http://127.0.0.1:11434')

    # 1. 获得本地模型列表
    response = client.list()
    print('本地模型:', [model.model for model in response.models])

    # 2. 查看具体模型信息
    response = client.show('deepseek-r1:1.5b')
    print('模型信息:', response)

    # 3. 查看正在运行模型
    response = client.ps()
    print('运行模型:', [model.model for model in response.models])

    # 4. 拷贝/删除本地模型
    response = client.copy(source='deepseek-r1:1.5b', destination='new-deepseek-r1:1.5b')
    print('模型拷贝:', response.status)
    response = client.delete(model='new-deepseek-r1:1.5b')
    print('模型删除:', response.status)
    

if __name__ == '__main__':
    test()

3.2 内容生成

from ollama import Client
import sys

# 1. 输入文本
def test01():
    client = Client(host='http://127.0.0.1:11434')
    response = client.generate(model='deepseek-r1:8b', prompt='请问宋江是谁？', stream=True)
    for chunk in response:
        if chunk.response.count('\n') == 2:
            chunk.response = chunk.response[:-1]
        print(chunk.response, end='')
    print()


# 2. 输入图像
def test02():
    client = Client(host='http://127.0.0.1:11434')
    image = open('demo.png', 'rb').read()
    response = client.generate(model='llava:latest',
                               prompt='这个数字是几？用中文回复。',
                               images=[image,],
                               stream=False)
    print(response)


if __name__ == '__main__':
    test01()
    test02()

3.3 多轮聊天

from ollama import Client
import re


def test():
    client = Client(host='http://127.0.0.1:11434')

    # 存储聊天历史
    messages = []
    options = { 'temperature': 1 }

    while True:
        user_input = input('请输入聊天内容: ')
        if user_input == 'exit':
            break

        messages += [ {'role': 'user', 'content': user_input} ]
        # 模型推理
        response = client.chat(model='deepseek-r1:8b', messages=messages, stream=False, options=options)
        # 打印回复内容
        response = response.message.content
        response = re.sub(r'<think>.*</think>', '', response, flags=re.DOTALL).strip()
        print('助手回复的内容: ', response)
        print('-' * 100)

        # 记录聊天历史
        messages += [{'role': 'assistant', 'content': response}]


if __name__ == '__main__':
    test()

3.4 文本嵌入

from ollama import Client
import numpy as np


def test():
    client = Client(host='http://127.0.0.1:11434')
    response = client.embed(model='nomic-embed-text:latest', input=['我是中国人!', '我是华夏人!'])
    print(response)
    print(np.array(response.embeddings).shape)


if __name__ == '__main__':
    test()

3.5 工具调用

from ollama import chat

# 定义工具函数
def addition(a: int, b: int) -> int:
  return a + b

def subtraction(a: int, b: int) -> int:
  return a - b


# 定义工具信息
subtraction_tool = {
  'type': 'function',
  'function': {
    'name': 'subtraction',
    'description': '两个数字相减',
    'parameters': {
      'type': 'object',
      'required': ['a', 'b'],
      'properties': {
        'a': {'type': 'integer', 'description': '第一个数字'},
        'b': {'type': 'integer', 'description': '第二个数字'},
      },
    },
  },
}

addition_tool = {
  'type': 'function',
  'function': {
    'name': 'addition',
    'description': '两个数字相加',
    'parameters': {
      'type': 'object',
      'required': ['a', 'b'],
      'properties': {
        'a': {'type': 'integer', 'description': '第一个数字'},
        'b': {'type': 'integer', 'description': '第二个数字'},
      },
    },
  },
}

call_functions = {'addition': addition, 'subtraction': subtraction}

if __name__ == '__main__':
  messages = [{'role': 'user', 'content': '请问，3 - 8 等于几？'}]
  response = chat('llama3.1:latest', messages=messages, tools=[addition_tool, subtraction_tool], options={'temperature': 0})
  print(response)
  if response.message.tool_calls:
    for tool in response.message.tool_calls:
      func_name = tool.function.name
      func_args = tool.function.arguments
      print(func_name, func_args)
      if func_name in call_functions:
        ret = call_functions[func_name](**func_args)
        print('ret = ', ret);

 评论 (2)

取消回复

孟老师请帮我看下这是啥问题呀04-10 17:08回复
多轮对话.py::test FAILED [100%]请输入聊天内容:
多轮对话.py:4 (test)
def test():
client = Client(host='http://127.0.0.1:11434')

# 存储聊天历史
messages = []
options = { 'temperature': 1 }

while True:
> user_input = input('请输入聊天内容: ')

多轮对话.py:13:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self =
size = -1

def read(self, size: int = -1) -> str:
> raise OSError(
"pytest: reading from stdin while output is captured! Consider using `-s`."
)
E OSError: pytest: reading from stdin while output is captured! Consider using `-s`.

venv\Lib\site-packages\_pytest\capture.py:227: OSError

============================== 1 failed in 3.11s ==============================

Process finished with exit code 1
- 孟宝亮04-11 18:55回复
  加下我 QQ 27722290,发给我你的文件，我看下。

基于 Ollama 部署大语言模型