国产精品久久久久国产精品三级 ,久草在线这里只有精品,久久精品三级视频

Qwen2.5模型特點

Qwen2.5模型具有多種顯著特點，使其在自然語言處理領域中脫穎而出：

強大的歸納和理解能力：能夠處理多種復雜的自然語言任務，不僅限于文本分類和生成。
高效的推理能力：在與其他模型如Llama-3.1-405B的對比中，表現出色，擁有更高的準確性和速度。
豐富的應用場景：適用于智能客服、內容生成、代碼生成等多個領域，成為用戶便捷的工具。
靈活的定制化能力：支持用戶根據具體需求進行模型擴展和定制，提供個性化解決方案。

Qwen2.5模型版本分類

Qwen2.5系列模型分為多個版本，以適應不同的任務需求：

Qwen2.5-14B：基礎版本，適用于中等復雜度的任務。
Qwen2.5-32B：增強版本，用于更復雜和高級的任務。
Qwen2.5-Plus：具有更強推理能力和速度，適合中等復雜任務。
Qwen2.5-Turbo：速度最快且成本較低，適合簡單任務。

每個版本都有其特定的使用場景和優勢，用戶可以根據自身需求選擇合適的版本進行使用。模型免費額度贈送

模型版本

環境準備

在使用Qwen2.5模型之前，需要確保環境準備充分，包括安裝必要的Python庫、獲取模型資源、安裝Hugging Face Transformers庫及其依賴。以下是步驟：

安裝Python庫

pip install torch
pip install transformers
pip install requests

這些庫包括：

torch：用于深度學習計算，支持GPU加速。
transformers：用于加載各種預訓練模型。
requests：用于發送HTTP請求，獲取模型資源。

獲取模型資源

從Hugging Face模型庫下載Qwen2.5模型：

curl -LO https://huggingface.co/second-state/Qwen2.5-14B-Instruct-GGUF/resolve/main/Qwen2.5-14B-Instruct-Q5_K_M.gguf

安裝Transformers庫

確保安裝最新版本的Transformers庫：

from transformers import AutoModelForCausalLM, AutoTokenizer

print("Transformers庫安裝成功！")

安裝其他依賴庫

pip install accelerate
pip install sentencepiece

這些庫將幫助我們更高效地加載和使用Qwen2.5模型。創建API key

模型加載與部署

在使用Qwen2.5模型之前，需先加載模型和分詞器，并將其部署到指定設備上（如CPU或GPU）。

加載模型和分詞器

使用Transformers庫加載模型和分詞器：

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen2.5-7B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

模型部署到設備

選擇將模型部署到CPU或GPU：

import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

加載基礎和指令模型

基礎模型用于廣泛推理任務，指令模型用于特定任務：

model_name = "Qwen/Qwen2.5-7B"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

model_name = "Qwen/Qwen2.5-7B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

推理過程

使用Qwen2.5模型進行推理需按以下步驟：

構建推理輸入

準備模型輸入，包括問題描述和系統指令：

prompt = "Find the value of $x$ that satisfies the equation $4x + 5 = 6x + 7$."
messages = [
    {"role": "system", "content": "Please reason step by step, and put your final answer within boxed{}."},
    {"role": "user", "content": prompt}
]

生成推理輸出

將輸入傳遞給模型并生成輸出：

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(**model_inputs, max_new_tokens=512)
generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

處理生成的輸出

解析和處理模型的推理結果：

print(response)

使用TextStreamer進行流式生成

在需要實時反饋的應用場景中，可使用TextStreamer：

from transformers import TextStreamer

streamer = TextStreamer(tokenizer, skip_special_tokens=True)
model.generate(**model_inputs, max_new_tokens=512, streamer=streamer)

API調用

通過API可方便地與Qwen2.5模型進行交互，以下為步驟：

注冊阿里云賬號與創建API Key

訪問阿里云官網注冊賬號，創建API Key，獲取AccessKey ID和AccessKey Secret，并妥善保存。

設置API密鑰

在項目中設置API密鑰：

import os

os.environ['ALIYUN_ACCESS_KEY_ID'] = 'your_access_key_id'
os.environ['ALIYUN_ACCESS_KEY_SECRET'] = 'your_access_key_secret'

創建API客戶端

使用阿里云SDK創建客戶端對象：

from aliyunsdkcore.client import AcsClient

client = AcsClient(
    os.environ['ALIYUN_ACCESS_KEY_ID'],
    os.environ['ALIYUN_ACCESS_KEY_SECRET'],
    'cn-hangzhou'
)

發送聊天請求

通過API發送請求并獲取響應：

from aliyunsdkcore.request import RpcRequest

request = RpcRequest('Qwen', '2023-09-01', 'Chat')
request.set_method('POST')

request.add_query_param('Prompt', '你好，通義千問！')
request.add_query_param('MaxTokens', '100')
request.add_query_param('Temperature', '0.7')

response = client.do_action_with_exception(request)
print(response)

打印響應結果

解析并打印API響應：

import json

response_json = json.loads(response)
print(json.dumps(response_json, ensure_ascii=False, indent=2))

部署與優化

在實際應用中，模型的部署與優化至關重要，涉及如何高效地部署和利用工具提升性能。

使用vLLM、SGLang、Ollama和Transformers

這些工具支持離線推理、在線推理和多GPU分布式服務，顯著提高模型性能和效率。

vLLM部署模型

安裝vLLM并加載Qwen2.5模型：

pip install vllm

from vllm import LLM, SamplingParams

llm = LLM(model="path/to/qwen2.5")

生成文本：

sampling_params = SamplingParams(temperature=0.8, top_p=0.9)
prompts = ["Hello, how are you?"]
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
    print(output.text)

性能評估

評估Qwen2.5模型性能需設定多個基準，如準確性、連貫性、多樣性、速度和資源消耗。

評估基準與方法

通過人工評估、自動評估、基準測試和性能測試，全面評估模型的表現。

評估結果

準確性：準確率達90%以上。
連貫性：評分85分（滿分100）。
多樣性：評分80分（滿分100）。
速度：每秒生成約1000個token。
資源消耗：內存占用約10GB，CPU/GPU使用率50%左右。

輸入與輸出參數

調用Qwen2.5模型時需了解輸入參數和返回參數，確保正確處理模型輸出。

OpenAI Python SDK輸入參數

配置模型名稱、對話歷史記錄、核采樣方法概率閾值等參數。

import openai

openai.api_key = "your_api_key_here"

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
]

response = openai.ChatCompletion.create(
    model="Qwen2.5-Math-72B-Instruct",
    messages=messages,
    top_p=0.9,
    temperature=0.7,
    presence_penalty=0.5,
    max_tokens=50,
    seed=42,
    stream=False,
    stop=["n"]
)

print(response.choices[0].message.content)

函數調用與示例代碼

Qwen2.5支持函數調用，通過Qwen-Agent和Hugging Face Transformers實現更靈活高效的推理。

使用Qwen-Agent

安裝Qwen-Agent庫并準備模型和API：

pip install -U qwen-agent

from qwen_agent.llm import get_chat_model

llm = get_chat_model({
    "model": "Qwen/Qwen2.5-7B-Instruct",
    "model_server": "http://localhost:8000/v1",
    "api_key": "EMPTY",
})

示例代碼

完整示例代碼展示如何使用Python調用Qwen2.5模型，從安裝必要庫到生成推理輸出的全過程。

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen2.5-7B-Instruct"

model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "請給我一個關于大型語言模型的簡短介紹。"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs, max_new_tokens=512)
generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(response)