def recognize_image():
response = client.chat.completions.create(
model="gpt-4-vision-preview",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "這個圖片里面有什么"},
{
"type": "image_url",
"image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
},
],
}
],
max_tokens=300,
)

print(response.choices[0])

原文如下,返回的內容如下:

'這張圖片展示了一個自然景觀,中間有一條木制的棧道穿過一片綠色的草地。草地上有不同類型的草和植物,背景是樹木和一片多云的天空。
整個場景給人一種寧靜和平靜的感覺,可能是一個適合散步的地方。圖片的色彩鮮艷,天空的藍色和草地的綠色形成了鮮明的對比。'
識別本地圖片(Base64編碼形式)

如果本地有一個圖像或一組圖像,則可以以 base 64 編碼格式將它們傳遞給模型。這種方式識別圖片的時間很久,圖片編碼后的字符很長,建議使用URL的方式。請求用到的還是/v1/chat/completions接口。

client = OpenAI(api_key=api_key)

def recognize_encode_image():
image_path = "img_2.png"
with open(image_path, "rb") as image_file:
base64_image = base64.b64encode(image_file.read()).decode('utf-8')
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}

payload = {
"model": "gpt-4-vision-preview",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What’s in this image?"
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
}
]
}
],
"max_tokens": 300
}

response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)

print(response.json())

輸入圖像:

輸出描述:圖片顯示了一只貓和一只狗非??拷?,顯得親密和友好。貓的一只前爪伸出,爪子的粉紅色肉墊清晰可見,它的眼神似乎透露出好奇或輕微的警惕。狗的表情則比較輕松,眼睛直視鏡頭。
背景因為焦距的原因變得模糊,無法分辨細節。這種畫面很可能被用來表現動物之間的和諧共處或者強調它們可愛的一面。

識別多個圖像

聊天 API 能夠接收和處理 base64 編碼格式或圖像網址的多個圖像輸入。該模型將處理每張圖像,并使用來自所有圖像的信息來回答問題。

client = OpenAI(api_key=api_key)

def recognize_multiple_images():
response = client.chat.completions.create(
model="gpt-4-vision-preview",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
},
{
"type": "text",
"text": "What’s in these images? Is there any difference between them?",
},
{
"type": "image_url",
"image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
},
],
}
],
max_tokens=300,
)
print(response.choices[0])

圖像生成

OpenAI提供了基于文本提示生成圖像,支持的底層模型可以是DALL·E 3 和 DALL·E 2。整體的效果也是比較非常不錯,細節到位,圖片質量高清。 默認情況下,圖像是以 standard 質量生成的,但在使用 DALL·E 3 可以設置 quality: “hd” 以增強細節。方形、標準質量的圖像生成速度最快。圖像的大小可以是 1024×1024、1024×1792 或 1792×1024 像素。

client = OpenAI(api_key=api_key)

def generate_image():
response = client.images.generate(
model="dall-e-3",
pompt="一只柴犬趴在沙發上",
size="1024x1024",
quality="standard",
n=1,
)

image_url = response.data[0].url
print('image_url:', image_url)

生成的圖像返回的URL格式類似該鏈接:https://oaidalleapiprodscus.blob.core.windows.net/private/xxx

編輯圖像

除了生成圖像,還可以通過讓模型根據新的文本提示替換預先存在的圖像的某些區域來創建圖像的編輯版本,目前僅限于DALL· E 2模型。 圖像編輯也稱為“修復”,允許您通過上傳圖像和蒙版來編輯或擴展圖像,并指示應替換哪些區域。蒙版的透明區域指示應編輯圖像的位置,提示應描述完整的新圖像,而不僅僅是擦除的區域。

client = OpenAI(api_key=api_key)

def edit_image():
response = client.images.edit(
model="dall-e-2", # only dall-e-2
image=open("img.png", "rb"),
mask=open("img_1.png", "rb"),
prompt="A sunlit indoor lounge area with a pool containing a flamingo",
n=1,
size="1024x1024"
)

image_url = response.data[0].url
print('image_url:', image_url)

Prompt:一個陽光明媚的室內休息區,有一個有火烈鳥的游泳池:

上傳的圖片和蒙版必須都是小于 4MB 的方形 PNG 圖片,并且尺寸必須相同。生成輸出時不使用蒙版的非透明區域,因此它們不一定需要像上面的示例那樣與原始圖像匹配。

圖像變體

圖像變體是基于現有圖片中的主體內容生成主體不變的類似圖像,目前僅限DALL· E2模型。

client = OpenAI(api_key=api_key)

def change_image():
response = client.images.create_variation(
image=open("img_2.png", "rb"),
n=2,
size="1024x1024"
)

image_url = response.data[0].url
print('image_url:', image_url)

輸入和輸出圖像如下

與編輯類似,輸入圖像必須是大小小于 4MB 的方形 PNG 圖像。

文本

OpenAI 的文本生成模型可以理解自然語言、代碼和圖像。模型提供文本輸出以響應其輸入。這些模型的輸入也稱為“Prompt提示”,通常是通過提供如何成功完成任務的說明或一些示例。

聊天

最常用的就是日常的文本對話功能,輸入用戶的需求,模型輸出理解后的內容。目前gpt-4模型的效果是在眾多模型中效果最佳的,費用也更加貴一點。

from openai import OpenAI
client = OpenAI(api_key=api_key)

response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
{"role": "user", "content": "Where was it played?"}
]
)
print('response:', response.choices[0].message.content)

相應的內容格式如下:

{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "The 2020 World Series was played in Texas at Globe Life Field in Arlington.",
"role": "assistant"
}
}
],
"created": 1677664795,
"id": "chatcmpl-7QyqpwdfhqwajicIEznoc6Q47XAyW",
"model": "gpt-3.5-turbo-0613",
"object": "chat.completion",
"usage": {
"completion_tokens": 17,
"prompt_tokens": 57,
"total_tokens": 74
}
}

返回JSON內容

有時候需要模型返回JSON格式的內容,為了防止返回格式異常并提高模型性能,在調用 gpt-4-1106-preview 或 gpt-3.5-turbo-1106 時,可以將 response_format 參數設置為{ “type”: “json_object” } 以啟用 JSON 模式。啟用 JSON 模式后,模型被限制為僅生成解析為有效 JSON 的字符串。

client = OpenAI(api_key=api_key)

def return_json():
response = client.chat.completions.create(
model="gpt-4-1106-preview", # gpt-3.5-turbo-1106
messages=[
{"role": "system", "content": "You are a helpful assistant, return JSON format."}, # JSON is necessary
{"role": "user", "content": "誰贏了2020年的世界杯?"},
],
response_format={"type": "json_object"} # text default
)
print('response:', response.choices[0].message.content)

提問:誰贏了2022年的世界杯? 返回的內容如下:

{
"winner": "Argentina",
"event": "FIFA World Cup 2022",
"runner_up": "France",
"location": "Qatar",
"date": "December 18, 2022"
}

如果啟用了response_format={“type”: “json_object”},但不在Message中加入json相關描述,就會強制報錯,如下:

{'error': {'message': "'messages' must contain the word 'json' in some form, to use 'response_format' of type 'json_object'.", 'type': 'invalid_request_error', 'param': 'messages', 'code': None}}

重現結果

默認情況下,模型返回的聊天內容是不確定的(這意味著模型輸出可能因請求而異)?,F在OpenAI通過在請求中的seed參數和system_fingerprint響應字段來提供對確定性輸出的一些控制。 將 seed 參數設置為選擇的任何整數,并在希望確定性輸出的請求中使用相同的值。確保所有其他參數(如 prompt 或 temperature )在請求之間完全相同。

client = OpenAI(api_key=api_key)

def reproduce_answer():
response = client.chat.completions.create(
model="gpt-3.5-turbo-1106", # gpt-3.5-turbo-1106
messages=[
{"role": "user", "content": "隨機生成一個長度為10的字符串"},
],
seed=10086
)
print('system_fingerprint:', response) # 在 gpt-3.5-turbo-1106中有值輸出
print('response:', response.choices[0].message.content)

有時,由于 OpenAI 對這邊的模型配置進行了必要的更改,確定性可能會受到影響。為了幫助跟蹤這些更改,公開了system_fingerprint字段。如果此值不同,則由于OpenAI對系統做了更改,可能會看到不同的輸出。 gpt-3.5-turbo模型中system_fingerprint字段返回為None。 如下舉例說明了兩次隨機生成一個長度為10的字符串的結果的一致的。

函數調用

并非所有模型版本都使用函數調用數據進行訓練。以下模型支持函數調用:

此外,以下型號還支持并行函數調用:

示例如下,提問San Francisco, Tokyo, and Paris的天氣情況,而天氣的API接口是本地函數,返回的天氣數據再傳給OpenAI接口進行總結回答。這樣就可以擴展模型的功能,將本地函數或接口與模型能力結合,實現更多復雜的需求。

import openai
import json

client = OpenAI(api_key=api_key)

# Example dummy function hard coded to return the same weather
# In production, this could be your backend API or an external API
def get_current_weather(location, unit="fahrenheit"):
"""Get the current weather in a given location"""
if "tokyo" in location.lower():
return json.dumps({"location": location, "temperature": "10", "unit": "celsius"})
elif "san francisco" in location.lower():
return json.dumps({"location": location, "temperature": "72", "unit": "fahrenheit"})
else:
return json.dumps({"location": location, "temperature": "22", "unit": "celsius"})

def run_conversation():
# Step 1: send the conversation and available functions to the model
messages = [{"role": "user", "content": "What's the weather like in San Francisco, Tokyo, and Paris?"}]
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
},
}
]
response = client.chat.completions.create(
model="gpt-3.5-turbo-1106",
messages=messages,
tools=tools,
tool_choice="auto", # auto is default, but we'll be explicit
)
response_message = response.choices[0].message
tool_calls = response_message.tool_calls
# Step 2: check if the model wanted to call a function
if tool_calls:
# Step 3: call the function
# Note: the JSON response may not always be valid; be sure to handle errors
available_functions = {
"get_current_weather": get_current_weather,
} # only one function in this example, but you can have multiple
messages.append(response_message) # extend conversation with assistant's reply
# Step 4: send the info for each function call and function response to the model
for tool_call in tool_calls:
function_name = tool_call.function.name
function_to_call = available_functions[function_name]
function_args = json.loads(tool_call.function.arguments)
function_response = function_to_call(
location=function_args.get("location"),
unit=function_args.get("unit"),
)
messages.append(
{
"tool_call_id": tool_call.id,
"role": "tool",
"name": function_name,
"content": function_response,
}
) # extend conversation with function response
second_response = client.chat.completions.create(
model="gpt-3.5-turbo-1106",
messages=messages,
) # get a new response from the model where it can see the function response
return second_response
print(run_conversation())

返回內容如下,成功調用了本地的天氣函數,返回了總結后的這幾個地區的天氣情況。

Currently, the weather in San Francisco is 72°F and partly cloudy. In Tokyo, the weather is 10°C and cloudy. In Paris, the weather is 22°C and partly cloudy.'

生成向量

支持文本生成向量,建議在幾乎所有情況下使用 text-embedding-ada-002。因為它更好、更便宜、使用更簡單。 示例如下

client = OpenAI(api_key=api_key)

def get_embedding(text, model="text-embedding-ada-002"):
text = text.replace("\n", " ")
return client.embeddings.create(input = [text], model=model)['data'][0]['embedding']

音頻

文本生成語音

OpenAI提供了支持文本生成語音的接口。目前支持多種預設好的不同的聲音(alloy 、 echo fable 、 onyx nova 、和 shimmer ),可以試聽以找到符合想要的語氣和聽眾的聲音。 對于實時應用程序,標準 tts-1 模型提供最低的延遲,但質量低于 tts-1-hd 模型。由于音頻的生成方式,在某些情況下可能會 tts-1 生成比 tts-1-hd 更靜態的內容。 默認響應格式為“mp3”,但也可以使用其他格式,如“opus”、“aac”或“flac”。

注意的是:

文本轉語音的示例如下:

client = OpenAI(api_key=api_key)

def tts():
speech_file_path = Path(__file__).parent / "speech.mp3"
response = client.audio.speech.create(
model="tts-1", # tts-1-hd
voice="alloy", # alloy, echo, fable, onyx, nova, and shimmer
# input="Today is a wonderful day to build something people love!"
input="在海外游戲行業中,用戶獲?。║ser Acquisition,UA)通常是指通過購買效果廣告來獲取流量,類似于國內的買量。海外投放面臨的主要挑戰是如何選擇適合的媒體和素材,以吸引有價值的用戶,并評估廣告投放的效果。"
)

response.stream_to_file(speech_file_path)

語音 API 支持使用區塊傳輸編碼的實時音頻流式處理。這意味著在生成完整文件并使其可訪問之前,可以播放音頻。前提是需要安裝FFmpeg相關可執行文件,下載:https://github.com/BtbN/FFmpeg-Builds/releases

client = OpenAI(api_key=api_key)

def stream_and_play():
text = '今天的天氣怎么樣?可以去公園玩嗎?'
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input=text,
)
# Convert the binary response content to a byte stream
byte_stream = io.BytesIO(response.content)

# Read the audio data from the byte stream
audio = AudioSegment.from_file(byte_stream, format="mp3")

# Play the audio
play(audio)

語音轉文字

默認情況下,響應類型將為包含原始文本的 json,也可以將 response_format 設置為 text(僅有音頻的文本內容)。

client = OpenAI(api_key=api_key)

def stt():
audio_file = open("speech.mp3", "rb")
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
# response_format="text"
)
print(transcript)

返回結果,語音識別的準確度還是非常高的。

# 原音頻文案:
在海外游戲行業中,用戶獲?。║ser Acquisition,UA)通常是指通過購買效果廣告來獲取流量,類似于國內的買量。海外投放面臨的主要挑戰是如何選擇適合的媒體和素材,以吸引有價值的用戶,并評估廣告投放的效果。

# 轉文字后的內容:
在海外游戲行業中,用戶獲取User Acquisition UA,通常是指通過購買效果廣告來獲取流量,類似于國內的買量。 海外投放面臨的主要挑戰是如何選擇適合的媒體和素材,以吸引有價值的用戶并評估廣告投放的效果。

翻譯 API 將任何受支持語言的音頻文件作為輸入,并在必要時將音頻轉錄為英語。這與上面的語音轉文字不同,因為輸出不是原始輸入語言,而是翻譯成英語文本。

client = OpenAI(api_key=api_key)

def tanslate_audio():
audio_file = open("speech.mp3", "rb")
transcript = client.audio.translations.create(
model="whisper-1",
file=audio_file
)
print(transcript)

輸出內容:

# 原音頻文案:
在海外游戲行業中,用戶獲?。║ser Acquisition,UA)通常是指通過購買效果廣告來獲取流量,類似于國內的買量。海外投放面臨的主要挑戰是如何選擇適合的媒體和素材,以吸引有價值的用戶,并評估廣告投放的效果。

# 翻譯后的文案:
In the overseas gaming industry, user acquisition, U.A., usually refers to the acquisition of traffic by purchasing effective advertising, similar to domestic purchasing.
The main challenge faced by overseas advertising is how to choose the right media and material to attract valuable users and assess the effect of advertising.

Assistants API

Assistants API 允許在自己的應用程序中構建 AI 助手。助手可以利用模型、工具和知識來響應用戶查詢。

助手 API 目前支持三種類型的工具:代碼解釋器、檢索和函數調用。

調用 Assistants API 需要傳遞 beta 版 HTTP 標頭。如果使用的是 OpenAI 的官方 Python 和 Node.js SDK,則會自動處理此問題。

OpenAI-Beta:?assistants=v1

絕大部分情況下可以指定任何 GPT-3.5 或 GPT-4 模型,包括微調模型。檢索工具需要 gpt-3.5-turbo-1106 和 gpt-4-1106-preview 模型。

使用步驟

接下來介紹創建和使用 Code Interpreter 助手的關鍵步驟。涉及的幾個關鍵對象的關系如下圖,對象的含義和交互可以在下面的步驟中逐步理解。

第一步:創建助手

可以指定助手的名稱,設定注助手的角色,以及助手的類型(如code_interpreter),設置tools字段。

assistant = client.beta.assistants.create(
name="Math Tutor",
instructions="You are a personal math tutor. Write and run code to answer math questions.",
tools=[{"type": "code_interpreter"}],
model="gpt-4-1106-preview"
)
第二步:創建線程

線程沒有大小限制??梢愿鶕枰獙⑷我鈹盗康南鬟f給線程。API 將使用相關的優化技術(如截斷)確保對模型的請求符合最大上下文窗口。

thread = client.beta.threads.create()
第三步:向線程添加消息

消息包含用戶的文本(比如用戶要計算一個一元一次方程),以及用戶上傳的任何文件(可選),目前不支持圖像文件。

client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="I need to solve the equation 3x + 11 = 14. Can you help me?" )
第四步:運行助手

要使 Assistant 響應用戶消息,您需要創建一個 Run。這使得助手讀取線程并決定是調用工具還是簡單地使用模型來最好地回答用戶查詢。隨著Run的進行,助手會將 Messages 追加到線程中,并帶有 role=”assistant”標識。

run = client.beta.threads.runs.create(
thread_id=thread.id,
assistant_id=assistant.id
)
第五步:查詢運行結果

這將創建一個“運行狀態”。您可以定期檢索 Run 以檢查其狀態,以查看它是否已移動到 completed ,運行狀態的生命周期如下。

在運行任務后,檢查狀態的代碼如下:

while True:
run = client.beta.threads.runs.retrieve(
thread_id=thread.id,
run_id=run.id
)
if run.status == "completed":
break
time.sleep(2)

當狀態是completed,就可以獲取返回的消息,代碼中對消息列表messages 進行逆序輸出,是因為最新的消息在數組的最前面。

messages = client.beta.threads.messages.list(
thread_id=thread.id
)

for msg in reversed(messages.data):
if msg.content[0].type == "text":
print(f"{msg.role}:", msg.content[0].text.value)
else:
print(f"{msg.role}:", msg.content[0])
完整示例

創建助手并運行的完整代碼示例如下,用戶的需求是讓助手生成代碼并運行代碼去解決用戶的一個一元一次方程的問題。

def run_assistant():
# step 1: 創建助手
assistant = client.beta.assistants.create(
name="Math Tutor",
instructions="You are a personal math tutor. Write and run code to answer math questions.",
tools=[{"type": "code_interpreter"}],
model="gpt-4-1106-preview"
)
# step 2: 創建線程
thread = client.beta.threads.create()
# step 3: 創建消息
client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="I need to solve the equation 3x + 11 = 14. Can you help me?" ) # step 4: 運行助手 run = client.beta.threads.runs.create( thread_id=thread.id, assistant_id=assistant.id ) # step 5: 輪詢運行結果 while True: run = client.beta.threads.runs.retrieve( thread_id=thread.id, run_id=run.id ) if run.status == "completed": break time.sleep(2) # step 6: 輸出運行結果 messages = client.beta.threads.messages.list( thread_id=thread.id ) for msg in reversed(messages.data): if msg.content[0].type == "text": print(f"{msg.role}:", msg.content[0].text.value) else: print(f"{msg.role}:", msg.content[0])

運行的結果如下:

user: I need to solve the equation 3x + 11 = 14. Can you help me?
assistant: Of course. To solve the equation 3x + 11 = 14 for x, you would follow these steps:

1. Subtract 11 from both sides of the equation to get 3x = 14 - 11.
2. Simplify the right-hand side of the equation.
3. Divide both sides by 3 to isolate x.

Let's do that calculation.
assistant: The solution to the equation 3x + 11 = 14 is \( x = 1 \).

支持的工具

CodeInterpreter

上述舉例創建助手就是典型的CodeInterpreter用法,模型根據用戶請求的性質決定何時在運行中調用代碼解釋器。 可以通過在助手中提示 instructions 來提升此行為(例如,“編寫代碼來解決此問題”)。code_interpreter 傳入 Assistant 對象的 tools 參數以啟用 Code Interpreter

assistant = client.beta.assistants.create(
instructions="You are a personal math tutor. When asked a math question, write and run code to answer the question.",
model="gpt-4-1106-preview",
tools=[{"type": "code_interpreter"}]
)
輸入

Code Interpreter 可以分析文件中的數據。當希望向助理提供大量數據或允許用戶上傳自己的文件進行分析時,此功能非常有用。 在助手級別傳遞的文件可由使用此助手的所有Run訪問,即在創建助手時把文件id傳遞給file_ids字段。

# Upload a file with an "assistants" purpose
file = client.files.create(
file=open("speech.py", "rb"),
purpose='assistants'
)

# Create an assistant using the file ID
assistant = client.beta.assistants.create(
instructions="You are a personal math tutor. When asked a math question, write and run code to answer the question.",
model="gpt-4-1106-preview",
tools=[{"type": "code_interpreter"}],
file_ids=[file.id]
)

文件也可以在線程級別傳遞。這些文件只能在特定線程中訪問,即在創建線程時把文件id傳遞給file_ids字段, 作為消息創建請求的一部分傳遞:

thread = client.beta.threads.create(
messages=[
{
"role": "user",
"content": "I need to solve the equation 3x + 11 = 14. Can you help me?", "file_ids": [file.id] } ] )

文件的最大大小為 512 MB。 Code Interpreter 支持多種文件格式,包括 .csv 、 .pdf 等等 .json 。有關支持的文件擴展名(及其相應的MIME類型)的更多詳細信息,請參閱下面的“支持的文件”部分。 要從助手中刪除文件,您可以將文件與助手分離:

file_deletion_status = client.beta.assistants.files.delete(
assistant_id=assistant.id,
file_id=file.id
)
輸出

API 中的Code Interpreter支持輸出文件,例如生成圖像圖、CSV 和 PDF。 當 Code Interpreter 生成圖像時,您可以在 Assistant 消息響應 file_id 的字段中查找并下載此文件:

{
"id": "msg_OHGpsFRGFYmz69MM1u8KYCwf",
"object": "thread.message",
"created_at": 1698964262,
"thread_id": "thread_uqorHcTs46BZhYMyPn6Mg5gW",
"role": "assistant",
"content": [
{
"type": "image_file",
"image_file": {
"file_id": "file-WsgZPYWAauPuW4uvcgNUGcb"
}
}
]
# ...
}

然后,可以通過將文件 ID 傳遞給文件 API 來下載文件內容:

content = client.files.with_raw_response.retrieve_content(file.id)
日志

可以檢查 Code Interpreter 運行過程中的細節,打印出 input 和 outputs 日志:

run_steps = client.beta.threads.runs.steps.list(
thread_id=thread.id,
run_id=run.id
)

輸出內容示例

{
"object": "list",
"data": [
{
"id": "step_DQfPq3JPu8hRKW0ctAraWC9s",
"object": "thread.run.step",
"type": "tool_calls",
"run_id": "run_kme4a442kme4a442",
"thread_id": "thread_34p0sfdas0823smfv",
"status": "completed",
"step_details": {
"type": "tool_calls",
"tool_calls": [
{
"type": "code",
"code": {
"input": "# Calculating 2 + 2\nresult = 2 + 2\nresult",
"outputs": [
{
"type": "logs",
"logs": "4"
}
...
}
完整示例

上傳一個包含兩列數據的商品銷量excel文件(盡量是csv文件后綴,xlsx文件后綴有時候識別失?。?,讓模型分析并畫圖,示例如下:

def run_assistant_code_interpreter():
# step 1: 上傳文件
file = client.files.create(
file=open("data.xlsx", "rb"),
purpose='assistants'
)
# step 2: 創建助手
assistant = client.beta.assistants.create(
name="data analyst",
instructions="You are a personal data analyst. Write and run code to answer data questions.",
tools=[{"type": "code_interpreter"}],
model="gpt-4-1106-preview",
file_ids=[file.id]
)
# step 3: 創建線程
thread = client.beta.threads.create()
# step 4: 添加用戶問題
client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="幫忙分析最近的商品銷量,畫一個折線圖,并且根據圖表內容給出相關的產品建議,生成圖表的時候要注意中文需要正常顯示。"
)
# step 5: 運行助手
run = client.beta.threads.runs.create(
thread_id=thread.id,
assistant_id=assistant.id,
instructions="生成圖表的時候要注意中文需要正常顯示。"
)
# step 6: 查詢運行結果
while True:
run = client.beta.threads.runs.retrieve(
thread_id=thread.id,
run_id=run.id
)
print("status:", run.status)
if run.status == "completed":
break
if run.status == "failed":
print("failed error:", run.last_error)
time.sleep(2)
# step 7: 展示結果消息列表
messages = client.beta.threads.messages.list(
thread_id=thread.id
)
for msg in reversed(messages.data):
if msg.content[0].type == "text":
print(f"{msg.role}:", msg.content[0].text.value)
elif msg.content[0].type == "image_file":
print(f"{msg.role}: file id is", msg.content[0].image_file.file_id)
# see issue: https://github.com/openai/openai-python/issues/699
file_content = client.files.with_raw_response.retrieve_content(file_id=msg.content[0].image_file.file_id)
with open(f"test_{msg.content[0].image_file.file_id}.png", "wb") as f:
f.write(file_content.content)
else:
print(f"{msg.role}:", msg.content[0])

# step 8(可選): 展示中間步驟
run_steps = client.beta.threads.runs.steps.list(
thread_id=thread.id,
run_id=run.id
)
i = 0
for run_step in reversed(run_steps.data):
if run_step.step_details.type == "tool_calls":
i += 1
for tool_call in run_step.step_details.tool_calls:
print(f"step {i} input<<<<<<<<<<<<<<<:\n", tool_call.code_interpreter.input)
if tool_call.code_interpreter.outputs[0].type == "logs":
print(f"step {i} outputs>>>>>>>>>>>>>>>>>>:\n", tool_call.code_interpreter.outputs[0].logs)
elif tool_call.code_interpreter.outputs[0].type == "image":
print(f"step {i} outputs>>>>>>>>>>>>>>>>>>:\n", tool_call.code_interpreter.outputs[0].image.file_id)
else:
print(f"step {i} outputs>>>>>>>>>>>>>>>>>>:\n", tool_call.code_interpreter.outputs[0])
elif run_step.step_details.type != "message_creation":
print(f"other middle step:", run_step.step_details)

Excel原始數據如下:

時間銷量
2023/11/150
2023/11/270
2023/11/360
2023/11/480
2023/11/540
2023/11/636
2023/11/790
2023/11/8100
2023/11/9150

輸出內容:

user: 幫忙分析最近的商品銷量,畫一個折線圖,并且根據圖表內容給出相關的產品建議。
assistant: 首先,我將打開和檢查你上傳的文件,以了解其中的數據結構,然后我們可以進一步分析商品銷量,并嘗試繪制折線圖來表示數據。接下來,我將根據圖表的結果給出一些建議?,F在我將打開這個文件。
assistant: 文件已成功讀取,數據看起來包含兩列:一列是日期("時間"),另一列是對應的銷量("銷量")。看起來像是每日銷量的時間序列數據。接下來,我將使用這些數據來繪制折線圖。這將有助于我們更好地理解銷量隨時間的變化趨勢?,F在我來繪制這個折線圖。
assistant: file id is file-QPltLMyDx4RWUzt59PD72oY9

畫圖時圖中的中文經常顯示異常,即使在prompt提示要在圖中正常顯示中文,也沒有效果。 在畫圖結束后,也沒有對數據進行總結和建議,這是一個需要關注的地方,應該可以在后續的對話中繼續讓模型基于歷史聊天記錄進行分析,此處就不再演示。

中間步驟的輸出內容如下,可以查看step 1 ~ step 3的輸入和輸出過程。

step 1 input<<<<<<<<<<<<<<<:
import pandas as pd

# Load the uploaded file to see its content and structure
file_path = '/mnt/data/file-oxr4e1AhUQcmRScQYlg1wVY9'

# We'll try to read the file with pandas assuming it might be a CSV or Excel file
# If this fails, we might need additional context or information about the file format

# First, let's try to read it as a CSV
try:
sales_data = pd.read_csv(file_path)
except Exception as e:
print(f"Error reading as CSV: {e}")

# Next, we'll try to read it as an Excel file
try:
sales_data = pd.read_excel(file_path)
except Exception as e:
print(f"Error reading as Excel: {e}")

# If successfully read, let's check the first few rows of the file
if 'sales_data' in locals():
preview = sales_data.head()
else:
preview = "Unable to read the file. It may not be a CSV or Excel file, or it could have a different structure."

preview
step 1 outputs>>>>>>>>>>>>>>>>>>:
Error reading as CSV: 'utf-8' codec can't decode byte 0x82 in position 16: invalid start byte
時間 銷量
0 2023-11-01 50
1 2023-11-02 70
2 2023-11-03 60
3 2023-11-04 80
4 2023-11-05 40

step 2 input<<<<<<<<<<<<<<<:
# Check the entire dataset for missing values or anomalies
sales_data_info = sales_data.info()
missing_values = sales_data.isnull().sum()

# Preparing data for plotting
# Ensuring the '時間' column is parsed as datetime if it isn't already
sales_data['時間'] = pd.to_datetime(sales_data['時間'])

# Sorting the data according to the '時間' to ensure correct plotting
sales_data_sorted = sales_data.sort_values('時間')

# Displaying data information and missing value count
sales_data_info, missing_values, sales_data_sorted.head()
step 2 outputs>>>>>>>>>>>>>>>>>>:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 時間 9 non-null datetime64[ns]
1 銷量 9 non-null int64
dtypes: datetime64[ns](1), int64(1)
memory usage: 272.0 bytes
(None,
時間 0
銷量 0
dtype: int64,
時間 銷量
0 2023-11-01 50
1 2023-11-02 70
2 2023-11-03 60
3 2023-11-04 80
4 2023-11-05 40)

step 3 input<<<<<<<<<<<<<<<:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

# Configuring matplotlib to display Chinese characters properly
plt.rcParams['font.sans-serif'] = ['SimHei'] # Or another font that supports Chinese
plt.rcParams['axes.unicode_minus'] = False

# Plotting the sales data
plt.figure(figsize=(10, 6))
plt.plot(sales_data_sorted['時間'], sales_data_sorted['銷量'], marker='o')

# Formatting the plot
plt.title('商品銷量趨勢圖')
plt.xlabel('時間')
plt.ylabel('銷量')
plt.grid(True)
plt.xticks(rotation=45)
plt.gca().xaxis.set_major_locator(mdates.DayLocator(interval=1)) # Ensure every day is marked
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))

# Showing the plot
plt.tight_layout()
plt.show()
step 3 outputs>>>>>>>>>>>>>>>>>>:
file-q2LglNbwgnM7VHc9PjxlgslD
知識檢索

檢索功能會利用助手模型之外的知識(例如專有產品信息或用戶提供的文檔)來增強助手。一旦文件上傳并傳遞給助手,OpenAI 將自動對您的文檔進行分塊,索引和存儲嵌入,并實施矢量搜索以檢索相關內容以回答用戶查詢。 retrieval 傳入 Assistant 的 tools 參數以啟用 Retrieval

assistant = client.beta.assistants.create(
instructions="You are a customer support chatbot. Use your knowledge base to best respond to customer queries.",
model="gpt-4-1106-preview",
tools=[{"type": "retrieval"}]
)

模型根據用戶消息決定何時檢索內容。助手 API 會自動在兩種檢索技術之間進行選擇:

與 Code Interpreter 類似,文件可以在 Assistant 級別或線程級別傳遞。其它操作和Code Interpreter 也是類似,參考即可。

完整示例

參考Code Interpreter 的完整示例。

函數調用

與聊天API 類似,助手 API 支持函數調用。函數調用允許您向助手描述函數,并讓它智能地返回需要調用的函數及其參數。助手 API 在調用函數時將在 Run 期間暫停執行,您可以提供函數回調的結果以繼續執行 Run。

定義函數

創建 Assistant 時定義函數:

assistant = client.beta.assistants.create(
instructions="You are a weather bot. Use the provided functions to answer questions.",
model="gpt-4-1106-preview",
tools=[{
"type": "function",
"function": {
"name": "getCurrentWeather",
"description": "Get the weather in location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "The city and state e.g. San Francisco"},
"unit": {"type": "string", "enum": ["c", "f"]}
},
"required": ["location", "unit"]
}
}
}, {
"type": "function",
"function": {
"name": "getNickname",
"description": "Get the nickname of a city",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "The city and state e.g. San Francisco"},
},
"required": ["location"]
}
}
}]
)
讀取助手返回的函數信息

當啟動助手并觸發該函數時,將進入 requires_action 狀態,會暫停等待用戶提交函數的調用結果,此時讀取線程線程中的消息時會返回如下格式的內容:

{
"id": "run_3HV7rrQsagiqZmYynKwEdcxS",
"object": "thread.run",
"assistant_id": "asst_rEEOF3OGMan2ChvEALwTQakP",
"thread_id": "thread_dXgWKGf8Cb7md8p0wKiMDGKc",
"status": "requires_action",
"required_action": {
"type": "submit_tool_outputs",
"submit_tool_outputs": {
"tool_calls": [
{
"id": "call_Vt5AqcWr8QsRTNGv4cDIpsmA",
"type": "function",
"function": {
"name": "getCurrentWeather",
"arguments": "{\"location\":\"San Francisco\"}"
}
},
{
"id": "call_45y0df8230430n34f8saa",
"type": "function",
"function": {
"name": "getNickname",
"arguments": "{\"location\":\"Los Angeles\"}"
}
}
]
}
},
...
提交函數輸出

然后根據返回的函數名稱和參數值,本地調用函數,并將函數的返回結果提交給助手,來繼續運行助手, tool_call_id 以將輸出匹配到每個函數調用。

run = client.beta.threads.runs.submit_tool_outputs(
thread_id=thread.id,
run_id=run.id,
tool_outputs=[
{
"tool_call_id": call_ids[0],
"output": "22C",
},
{
"tool_call_id": call_ids[1],
"output": "LA",
},
]
)
完整示例

創建助手并運行函數調用的完整代碼示例如下,用例是讓模型返回San Francisco and Paris的天氣和城市別名,天氣函數和別名查詢函數均在本地函數,此處也可以是調用其它第三方API,以此來增強模型能力,函數的調用結果需要再次提交給模型進行總結。

def run_assistant_function_call():
# step 1: 創建助手,并添加函數定義
assistant = client.beta.assistants.create(
instructions="You are a weather bot. Use the provided functions to answer questions.",
model="gpt-4-1106-preview",
tools=[{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the weather in location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "The city and state e.g. San Francisco"},
"unit": {"type": "string", "enum": ["c", "f"]}
},
"required": ["location", "unit"]
}
}
}, {
"type": "function",
"function": {
"name": "get_nickname",
"description": "Get the nickname of a city",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "The city and state e.g. San Francisco"},
},
"required": ["location"]
}
}
}]
)
# step 2: 創建線程
thread = client.beta.threads.create()

# step 3: 添加問題消息
client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="What's the weather and nickname like in San Francisco and Paris?"
)

# step 4: 運行助手
run = client.beta.threads.runs.create(
thread_id=thread.id,
assistant_id=assistant.id
)

# step 5: 本地函數實現,實際情況中可以調用外部函數或接口
# 參數名與 tools聲明的名稱要一致,不然自動填充參數會失敗
def get_current_weather(location: str, unit: str):
if "paris" in location.lower():
return json.dumps({"location": location, "temperature": "10", "unit": unit})
elif "san francisco" in location.lower():
return json.dumps({"location": location, "temperature": "72", "unit": unit})
else:
return json.dumps({"location": location, "temperature": "22", "unit": unit})

def get_nickname(location: str):
if "paris" in location.lower():
return json.dumps({"location": location, "nickname": "City of Light"})
elif "san francisco" in location.lower():
return json.dumps({"location": location, "nickname": "City by the Bay"})
else:
return json.dumps({"location": location, "nickname": "Nice City"})

available_functions = {
"get_current_weather": get_current_weather,
"get_nickname": get_nickname
}

# step 6: 查詢結果
while True:
# step 6.1: 查詢結果狀態
run = client.beta.threads.runs.retrieve(
thread_id=thread.id,
run_id=run.id
)
print("status:", run.status)

# step 6.2 如果狀態轉為requires_action,則獲取GPT初步返回的函數名稱和輸入
if run.status == "requires_action" and run.required_action.type == "submit_tool_outputs":
tool_calls = run.required_action.submit_tool_outputs.tool_calls
tool_outputs = []
for tool_call in tool_calls:
# step 6.2.1 獲取每個函數信息
function_name = tool_call.function.name
function_to_call = available_functions[function_name]
function_args = json.loads(tool_call.function.arguments)
# step 6.2.2 自動填充參數并調用函數
# **function_args會根據function_args字典中的鍵值對自動填充函數的參數。
function_response = function_to_call(**function_args)
# step 6.2.3 將函數返回結果保存
tool_outputs.append({
"tool_call_id": tool_call.id,
"output": function_response
}
)
# step 6.2.4 將所有函數返回結果提交給GPT,繼續運行助手,狀態變為in_progress
client.beta.threads.runs.submit_tool_outputs(
thread_id=thread.id,
run_id=run.id,
tool_outputs=tool_outputs
)
if run.status == "completed":
break
time.sleep(2)

# step 7: 展示結果
messages = client.beta.threads.messages.list(
thread_id=thread.id
)
for msg in reversed(messages.data):
if msg.content[0].type == "text":
print(f"{msg.role}:", msg.content[0].text.value)
else:
print(f"{msg.role}:", msg.content[0])

輸出的內容如下:

user: What's the weather and nickname like in San Francisco and Paris?
assistant: In San Francisco, the current weather is 72°F, and it is nicknamed the "City by the Bay". In Paris, the current weather is 10°C, and it is known as the "City of Light".

支持的文件

對于 text/ MIME 類型,編碼必須是 utf-8 、 utf-16 或 ascii 之一。

收費

1,000 tokens大概是 750 單詞。 Token在線計算:https://platform.openai.com/tokenizer

GPT-4 Turbo

ModelInputOutput
gpt-4-1106-preview0.03 / 1K tokens
gpt-4-1106-vision-preview0.03 / 1K tokens

GPT-4

ModelInputOutput
gpt-40.06 / 1K tokens
gpt-4-32k0.12 / 1K tokens

GPT-3.5 Turbo

ModelInputOutput
gpt-3.5-turbo-11060.0020 / 1K tokens
gpt-3.5-turbo-instruct0.0020 / 1K tokens

Fine-tuning models

ModelTrainingInput usageOutput usage
gpt-3.5-turbo0.0030 / 1K tokens$0.0060 / 1K tokens
davinci-0020.0120 / 1K tokens$0.0120 / 1K tokens
babbage-0020.0016 / 1K tokens$0.0016 / 1K tokens

Assistants API

ToolInput
Code interpreter$0.03 / session (free until 11/17/2023)
Retrieval$0.20 GB assistant / day (free until 11/17/2023)

Embedding models

ModelUsage
ada v2$0.0001 / 1K tokens

Base models

ModelUsage
davinci-002$0.0020 / 1K tokens
babbage-002$0.0004 / 1K tokens

文章轉自微信公眾號@騰訊技術工程

上一篇:

SQL Server API:它是什么以及如何創建它

下一篇:

TF2的目標檢測API
#你可能也喜歡這些API文章!

我們有何不同?

API服務商零注冊

多API并行試用

數據驅動選型,提升決策效率

查看全部API→
??

熱門場景實測,選對API

#AI文本生成大模型API

對比大模型API的內容創意新穎性、情感共鳴力、商業轉化潛力

25個渠道
一鍵對比試用API 限時免費

#AI深度推理大模型API

對比大模型API的邏輯推理準確性、分析深度、可視化建議合理性

10個渠道
一鍵對比試用API 限時免費