
如何快速實現REST API集成以優化業務流程
"model": "llama2",
"prompt": "What color is the sky at different times of the day? Respond using JSON",
"format": "json",
"stream": false
}'
高級參數都可以在請求中攜帶,比如keep_alive
,默認是5分鐘,5分鐘內沒有任何操作,釋放內存。如果是-1
,是一直加載在內存。
響應返回的格式:
{
"model": "llama2",
"created_at": "2023-11-09T21:07:55.186497Z",
"response": "{\n\"morning\": {\n\"color\": \"blue\"\n},\n\"noon\": {\n\"color\": \"blue-gray\"\n},\n\"afternoon\": {\n\"color\": \"warm gray\"\n},\n\"evening\": {\n\"color\": \"orange\"\n}\n}\n",
"done": true,
"context": [1, 2, 3],
"total_duration": 4648158584,
"load_duration": 4071084,
"prompt_eval_count": 36,
"prompt_eval_duration": 439038000,
"eval_count": 180,
"eval_duration": 4196918000
}
在 powershell
訪問API
格式為:
(Invoke-WebRequest -method POST -Body '{"model":"llama2", "prompt":"Why is the sky blue?", "stream": false}' -uri http://localhost:11434/api/generate ).Content | ConvertFrom-json
python
訪問API
:
url_generate = "http://localhost:11434/api/generate"
def get_response(url, data):
response = requests.post(url, json=data)
response_dict = json.loads(response.text)
response_content = response_dict["response"]
return response_content
data = {
"model": "gemma:7b",
"prompt": "Why is the sky blue?",
"stream": False
}
res = get_response(url_generate,data)
print(res)
上面是通過python
對接口進行訪問,可在程序代碼直接調用,適合批量操作,生成結果。
正常請求時,options
都省略了,options
可以設置很多參數,比如temperature
,是否使用gpu
,上下文的長度等,都在此設置。下面是一個包含options
的請求:
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt": "Why is the sky blue?",
"stream": false,
"options": {
"num_keep": 5,
"seed": 42,
"num_predict": 100,
"top_k": 20,
"top_p": 0.9,
"tfs_z": 0.5,
"typical_p": 0.7,
"repeat_last_n": 33,
"temperature": 0.8,
"repeat_penalty": 1.2,
"presence_penalty": 1.5,
"frequency_penalty": 1.0,
"mirostat": 1,
"mirostat_tau": 0.8,
"mirostat_eta": 0.6,
"penalize_newline": true,
"stop": ["\n", "user:"],
"numa": false,
"num_ctx": 1024,
"num_batch": 2,
"num_gqa": 1,
"num_gpu": 1,
"main_gpu": 0,
"low_vram": false,
"f16_kv": true,
"vocab_only": false,
"use_mmap": true,
"use_mlock": false,
"rope_frequency_base": 1.1,
"rope_frequency_scale": 0.8,
"num_thread": 8
}
}'
格式
POST /api/chat
和上面生成補全很像。
參數
model
:(必填)型號名稱messages
:聊天的消息,這個可以用來保留聊天記憶該message
對象具有以下字段:
role
:消息的角色,system
或者user
assistant
content
: 消息內容images
(可選):要包含在消息中的圖像列表(對于多模式模型,例如llava
)高級參數(可選):
format
:返回響應的格式。目前唯一接受的值是json
options
:模型文件文檔中列出的其他模型參數,例如temperature
template
:要使用的提示模板(覆蓋 中定義的內容Modelfile
)stream
:false
響應是否作為單個響應對象返回,而不是對象流keep_alive
:控制模型在請求后加載到內存中的時間(默認值5m
:)發送聊天請求:
curl http://localhost:11434/api/chat -d '{
"model": "llama2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
}
]
}'
和generate
的區別,message
和prompt
對應,prompt
后面直接跟要聊的內容,而message
里面還有role
角色,user
相當于提問的內容。
響應返回的內容:
{
"model": "llama2",
"created_at": "2023-08-04T19:22:45.499127Z",
"done": true,
"total_duration": 4883583458,
"load_duration": 1334875,
"prompt_eval_count": 26,
"prompt_eval_duration": 342546000,
"eval_count": 282,
"eval_duration": 4535599000
}
還可以發送帶聊天記錄的請求:
curl http://localhost:11434/api/chat -d '{
"model": "llama2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
},
{
"role": "assistant",
"content": "due to rayleigh scattering."
},
{
"role": "user",
"content": "how is that different than mie scattering?"
}
]
}'
python
格式的生成聊天補全:
url_chat = "http://localhost:11434/api/chat"
data = {
"model": "llama2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
},
"stream": False
}
response = requests.post(url_chat, json=data)
response_dict = json.loads(response.text)
print(response_dict)
格式
POST /api/create
name
:要創建的模型的名稱modelfile
(可選):模型文件的內容stream
:(可選)如果false
響應將作為單個響應對象返回,而不是對象流path
(可選):模型文件的路徑modelfile
后面直接是modelfile
的內容,比如基于那個模型,有那些設定,創建模型的請求:
curl http://localhost:11434/api/create -d '{
"name": "mario",
"modelfile": "FROM llama2\nSYSTEM You are mario from Super Mario Bros."
}'
基于llama2
創建一個模型,系統角色進行設定。返回結果就不多做介紹。
使用python
創建一個模型:
url_create = "http://localhost:11434/api/create"
data = {
"name": "mario",
"modelfile": "FROM llama2\nSYSTEM You are mario from Super Mario Bros."
}
response = requests.post(url, json=data)
response_dict = json.loads(response.text)
print(response_dict)
這個python
和上面的相同的功能。
格式
GET /api/tags
列出本地所有模型。
使用python
顯示模型。
url_list = "http://localhost:11434/api/tags"
def get_list(url):
response = requests.get(url)
response_dict = json.loads(response.text)
model_names = [model["name"] for model in response_dict["models"]]
names = []
# 打印所有模型的名稱
for name in model_names:
names.append(name)
for idx, name in enumerate(names, start=1):
print(f"{idx}. {name}")
return names
get_list(url_list)
返回結果:
1. codellama:13b
2. codellama:7b-code
3. gemma:2b
4. gemma:7b
5. gemma_7b:latest
6. gemma_sumary:latest
7. llama2:7b
8. llama2:latest
9. llava:7b
10. llava:v1.6
11. mistral:latest
12. mistrallite:latest
13. nomic-embed-text:latest
14. qwen:1.8b
15. qwen:4b
16. qwen:7b
格式
POST /api/show
顯示有關模型的信息,包括詳細信息、模型文件、模板、參數、許可證和系統提示。
參數
name
:要顯示的模型名稱請求
curl http://localhost:11434/api/show -d '{
"name": "llama2"
}'
使用python
顯示模型信息:
url_show_info = "http://localhost:11434/api/show"
def show_model_info(url,model_name):
data = {
"name": model_name
}
response = requests.post(url, json=data)
response_dict = json.loads(response.text)
print(response_dict)
show_model_info(url_show_info,"gemma:7b")
返回的結果:
{'license': 'Gemma Terms of Use \n\nLast modified: February 21, 2024\n\nBy using, reproducing, modifying, distributing, performing or displaying any portion or element of Gemma, Model Derivatives including via any Hosted Service, (each as defined below) (collectively, the "Gemma Services") or otherwise accepting the terms of this Agreement, you agree to be bound by this Agreement.\n\nSection 1: DEFINITIONS\n1.1 Definitions\n(a) "Agreement" or "Gemma Terms of Use" means these terms and conditions that govern the use, reproduction, Distribution or modification of the Gemma Services and any terms and conditions incorporated by reference.\n\n(b) "Distribution" or "Distribute" means any transmission, publication, or other sharing of Gemma or Model Derivatives to a third party, including by providing or making Gemma or its functionality available as a hosted service via API, web access, or any other electronic or remote means ("Hosted Service").\n\n(c) "Gemma" means the set of machine learning language models, trained model weights and parameters identified at ai.google.dev/gemma, regardless of the source that you obtained it from.\n\n(d) "Google" means Google LLC.\n\n(e) "Model Derivatives" means all (i) modifications to Gemma, (ii) works based on Gemma, or (iii) any other machine learning model which is created by transfer of patterns of the weights, parameters, operations, or Output of Gemma, to that model in order to cause that model to perform similarly to Gemma, including distillation methods that use intermediate data representations or methods based on the generation of synthetic data Outputs by Gemma for training that model. For clarity, Outputs are not deemed Model Derivatives.\n\n(f) "Output" means the information content output of Gemma or a Model Derivative that results from operating or otherwise using Gemma or the Model Derivative, including via a Hosted Service.\n\n1.2\nAs used in this Agreement, "including" means "including without limitation".\n\nSection 2: ELIGIBILITY AND USAGE\n2.1 Eligibility\nYou represent and warrant that you have the legal capacity to enter into this Agreement (including being of sufficient age of consent). If you are accessing or using any of the Gemma Services for or on behalf of a legal entity, (a) you are entering into this Agreement on behalf of yourself and that legal entity, (b) you represent and warrant that you have the authority to act on behalf of and bind that entity to this Agreement and (c) references to "you" or "your" in the remainder of this Agreement refers to both you (as an individual) and that entity.\n\n2.2 Use\nYou may use, reproduce, modify, Distribute, perform or display any of the Gemma Services only in accordance with the terms of this Agreement, and must not violate (or encourage or permit anyone else to violate) any term of this Agreement.\n\nSection 3: DISTRIBUTION AND RESTRICTIONS\n3.1 Distribution and Redistribution\nYou may reproduce or Distribute copies of Gemma or Model Derivatives if you meet all of the following conditions:\n\nYou must include the use restrictions referenced in Section 3.2 as an enforceable provision in any
.......
除了以上功能,還可以復制模型,刪除模型,拉取模型,另外,如果有ollama的帳號,還可把模型推到ollama的服務器。
windows用戶默認存儲位置:
C:\Users\<username>\.ollama\models
更改默認存儲位置,在環境變量中設置OLLAMA_MODELS
對應存儲位置,實現模型存儲位置更改。
可能有從HuggingFace
下載的gguf
模型,可以通過modelfile
創建模型導入gguf
模型。創建一個Modelfile
文件:
FROM ./mistral-7b-v0.1.Q4_0.gguf
通過這個Modelfile創建新模型:
ollama create example -f Modelfile
example
為新模型名,使用時直接調用這個模型名就可以。
正常運行模型時,很少對參數進行設置,在發送請求時,可以通過options
對參數進行設置,比如設置上下文的token
數:
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt": "Why is the sky blue?",
"options": {
"num_ctx": 4096
}
}'
默認是2048,這里修改成了4096,還可以設置比如是否使用gpu,后臺服務跑起來,剛出來這些東西,都可以在參數里進行設置。
兼容openai
接口,通過openai
的包可以直接調用訪問ollama
提供的后臺服務。
from openai import OpenAI
client = OpenAI(
base_url='http://localhost:11434/v1/',
# required but ignored
api_key='ollama',
)
chat_completion = client.chat.completions.create(
messages=[
{
'role': 'user',
'content': 'Say this is a test',
}
],
model='llama2',
)
得到:
ChatCompletion(id='chatcmpl-173', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='\nThe question " Why is the sky blue? " is a common one, and there are several reasons why the sky appears blue to our eyes. Here are some possible explanations:\n\n1. Rayleigh scattering: When sunlight enters Earth\'s atmosphere, it encounters tiny molecules of gases such as nitrogen and oxygen. These molecules scatter the light in all directions, but they scatter shorter (blue) wavelengths more than longer (red) wavelengths. This is known as Rayleigh scattering. As a result, the blue light is dispersed throughout the atmosphere, giving the sky its blue appearance.\n2. Mie scattering: In addition to Rayleigh scattering, there is also a phenomenon called Mie scattering, which occurs when light encounters much larger particles in the atmosphere, such as dust and water droplets. These particles can also scatter light, but they preferentially scatter longer (red) wavelengths, which can make the sky appear more red or orange during sunrise and sunset.\n3. Angel\'s breath: Another explanation for why the sky appears blue is due to a phenomenon called "angel\'s breath." This occurs when sunlight passes through a layer of cool air near the Earth\'s surface, causing the light to be scattered in all directions and take on a bluish hue.\n4. Optical properties of the atmosphere: The atmosphere has its own optical properties, which can affect how light is transmitted and scattered. For example, the atmosphere scatters shorter wavelengths (such as blue and violet) more than longer wavelengths (such as red and orange), which can contribute to the blue color of the sky.\n5. Perspective: The way we perceive the color of the sky can also be affected by perspective. From a distance, the sky may appear blue because our brains are wired to perceive blue as a color that is further away. This is known as the "Perspective Problem."\n\nIt\'s worth noting that the color of the sky can vary depending on the time of day, the amount of sunlight, and other environmental factors. For example, during sunrise and sunset, the sky may appear more red or orange due to the scattering of light by atmospheric particles.', role='assistant', function_call=None, tool_calls=None))], created=1710810193, model='llama2:7b', object='chat.completion', system_fingerprint='fp_ollama', usage=CompletionUsage(completion_tokens=498, prompt_tokens=34, total_tokens=532))
最后一個實現翻譯助手,這么多大模型,中西語料足夠,讓他充當個免費翻譯沒問題吧。我愿意在網上找英文資源,有時會沒有字幕,自己英語又不好,如果能把字幕翻譯的活干好了,這個大模型學習,也算有所收獲。下面通過python
代碼,訪問ollama
,給他設定一個身份,讓他充當一個翻譯的角色,后面只給他英文內容,他直接輸出中文內容(”Translate the following into chinese and only show me the translated”)。只是一個demo
,字幕提取,讀取翻譯應該都可以搞定。下面演示是要翻譯的內容為grok
網頁介紹內容,看一下他翻譯的效果。
import requests
import json
text = """
We are releasing the base model weights and network architecture of Grok-1, our large language model. Grok-1 is a 314 billion parameter Mixture-of-Experts model trained from scratch by xAI.
This is the raw base model checkpoint from the Grok-1 pre-training phase, which concluded in October 2023. This means that the model is not fine-tuned for any specific application, such as dialogue.
We are releasing the weights and the architecture under the Apache 2.0 license.
To get started with using the model, follow the instructions at github.com/xai-org/grok.
Model Details
Base model trained on a large amount of text data, not fine-tuned for any particular task.
314B parameter Mixture-of-Experts model with 25% of the weights active on a given token.
Trained from scratch by xAI using a custom training stack on top of JAX and Rust in October 2023.
The cover image was generated using Midjourney based on the following prompt proposed by Grok: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines.
"""
#"Describe the bug. When selecting to use a self hosted ollama instance, there is no way to do 2 things:Set the server endpoint for the ollama instance. in my case I have a desktop machine with a good GPU and run ollama there, when coding on my laptop i want to use the ollama instance on my desktop, no matter what value is set for cody.autocomplete.advanced.serverEndpoint, cody will always attempt to use http://localhost:11434, so i cannot sepcify the ip of my desktop machine hosting ollama.Use a different model on ollama - no matter what value is set for cody.autocomplete.advanced.model, for example when llama-code-13b is selected, the vscode output tab for cody always says: █ CodyCompletionProvider:initialized: unstable-ollama/codellama:7b-code "
url_generate = "http://localhost:11434/api/generate"
data = {
"model": "mistral:latest",
"prompt": f"{text}",#"Why is the sky blue?",
"system":"Translate the following into chinese and only show me the translated",
"stream": False
}
def get_response(url, data):
response = requests.post(url, json=data)
response_dict = json.loads(response.text)
response_content = response_dict["response"]
return response_content
res = get_response(url_generate,data)
print(res)
大概演示一下,具體細節再調整吧。今天內容到些結束。
本文章轉載微信公眾號@峰哥Python筆記