國內開源AI大模型實踐測試

我們在充滿挑戰的真實場景中對人工智能模型進行嚴格測試。我們的評估重點關注對話能力、邏輯推理、數學問題解決能力和編程性能等關鍵領域,以評估每個模型的優勢和劣勢。

1. 對話能力測試:

讓我們從一個熟悉的挑戰開始:很多人都知道,AI 模型在諸如計算單詞中特定字母的數量之類的任務上經常遇到困難。讓我們看看 DeepSeek R1 和 通義千問Max 如何處理這項任務。(我故意拼錯了這個詞)

How many letters 'r' are in the word strawbery?

通義千問Max 輸出

There are **2** instances of the letter 'r' in "strawbery."

正確答案:?

Deepseek R1 輸出

The word “strawbery” (as spelled) contains 2 instances of the letter ‘r’. Here’s the breakdown:

S
T
R (first ‘r’)
A
W
B
E
R (second ‘r’)
Y
Note: The correct spelling of the fruit is “strawberry” (with two 'r’s and two 'b’s), but the count here is based on the provided spelling.

正確答案:??

點擊試用驗證更多模型效果

2. 邏輯推理能力測試:

模型將接受一個陳述和幾個可能的答案。它的任務是選出正確的答案。

The medieval Arabs had many manuscripts of the ancient Greek. 
When needed, they translate them into Arabic.
Medieval Arab philosophers were very interested in Aristotle's Theory of Poetry,
which was obviously not shared by Arab poets,
because a poet interested in it must want to read Homer's poems.
Aristotle himself often quotes Homer's poems.
However, Homer's poems were not translated into Arabic until modern times.

Which of the following options, if true, strongly supports the above argument?
Please only return the letter in front of your predict correct option, A, B, C or D.

A. Some medieval Arab translators have manuscripts of Homer poems in ancient Greek.

B. Aristotle's Theory of Poetry is often quoted and commented by modern Arab poets.

C. In Aristotle's Theory of Poetry, most of the content is related to drama,
and medieval Arabs also wrote plays and performed them.

D. A series of medieval Arab stories, such as Arab Night,
are very similar to some parts of Homer's epic.
通義千問 MAX模型

通義千問Max 輸出

A

正確答案:?
通義千問Max 成功完成了這項任務

Deepseek R1 輸出

A

正確答案:?
Deepseek R1 無法成功完成此任務

點擊試用驗證更多模型效果

3.創造力能力測試:

Write a short poem about friendship

通義千問Max 輸出

In life’s journey, paths may twist and turn,
Friendship’s light forever will burn.
A bond so pure, a heart’s true find,
Through every season, love intertwined.

Hand in hand, we face the storm,
With gentle words, a healing balm.
Laughter shared, or silent gaze,
In friendships deep, our souls find ways.

To lift, support, and understand—
An open heart, a helping hand.
So cherish near, each kindred soul,
For friendships make us whole.

正確答案: (4/5)?
這是一首扎實、真摯的詩,但在執行上卻略顯可預測。

Deepseek R1 輸出

Bonds Unseen

In laughter’s echo, through silent tears,
A thread weaves softly, spanning years—
Not bound by time, nor duty’s chain,
But roots that hold through wind and rain.

Shared sunsets, secrets whispered low,
A shelter when the cold winds blow.
No score, no count, just open hands,
A map of trust where each soul stands.

When shadows stretch and paths divide,
Your voice becomes my steady guide.
Not perfect, flawed, yet fiercely true,
A mirror showing the best in you.

For friendship’s not a blaze, bright-brief,
But embers glowing, slow, beneath—
A quiet flame that time won’t dim,
The quiet hum of home within.

正確答案:(4.5/5)?
這是一首生動且發人深省的詩歌,雖然不太容易理解,但值得深入思考。
?

很難評判哪首詩更好,因為每首詩都有其獨特的優點。此外,對詩歌的看法非常主觀,因人而異。不過,我上面的評分只是對每首詩的個人看法。
?

點擊試用驗證更多模型效果

4.數學能力測試:

我用幾個不同的任務測試了這些模型,它們都成功完成了。然而,GPT 在這個任務上遇到了困難。

The expression 2?3?4?5+1=121, as multiplication is performed before addition. 
However, other values can be obtained by inserting parentheses into the expression.
For example:
(2?(3?4))?(5+1)=144
How many distinct values can be obtained by inserting parentheses into
2?3?4?5+1, without rearranging the terms?

通義千問Max 輸出

To solve this problem, we need to analyze the different ways parentheses can be inserted into the expression ( 2 \cdot 3 \cdot 4 \cdot 5 + 1 ), while ensuring that the terms are not rearranged. The goal is to determine how many distinct values can be obtained.

Step 1: Understand the structure of the expression
The original expression is: [ 2 \cdot 3 \cdot 4 \cdot 5 + 1 ] Here, multiplication has higher precedence than addition. Without parentheses, the default evaluation order is: [ (2 \cdot 3 \cdot 4 \cdot 5) + 1 = 120 + 1 = 121 ]

By inserting parentheses, we can change the grouping of terms and thus alter the result. However, the order of the terms ( 2, 3, 4, 5, +1 ) must remain fixed.

Step 2: Count the number of ways to insert parentheses
The expression consists of four multiplications and one addition. To evaluate it, we must group the terms using parentheses. This is equivalent to finding all possible binary trees with four leaves (one for each number ( 2, 3, 4, 5 )) and a single addition operation.

Catalan numbers and binary tree enumeration
The number of ways to parenthesize ( n ) terms in an expression is given by the ( (n-1) )-th Catalan number. For four terms (( n = 4 )), the number of ways to parenthesize them is: [ C_3 = \frac{1}{3+1} \binom{6}{3} = \frac{1}{4} \cdot 20 = 5 ] Thus, there are 5 ways to parenthesize the multiplication part of the expression ( 2 \cdot 3 \cdot 4 \cdot 5 ).

For each of these 5 groupings, the addition ( +1 ) can either be applied last or grouped earlier with some subset of the terms. This introduces additional possibilities.

Step 3: Evaluate all possible parenthesizations
We now systematically evaluate all distinct results that can arise from the different parenthesizations.

Case 1: Default order (no parentheses)
Without any parentheses, the expression evaluates as: [ (2 \cdot 3 \cdot 4 \cdot 5) + 1 = 120 + 1 = 121 ]

Case 2: Grouping two terms at a time
We consider all possible ways to group two terms together first, then proceed with the rest of the computation.

Group ( (2 \cdot 3) ): [ ((2 \cdot 3) \cdot 4 \cdot 5) + 1 = (6 \cdot 4 \cdot 5) + 1 = (24 \cdot 5) + 1 = 120 + 1 = 121 ] Alternatively: [ (2 \cdot 3) \cdot (4 \cdot 5) + 1 = 6 \cdot 20 + 1 = 120 + 1 = 121 ]

Group ( (3 \cdot 4) ): [ (2 \cdot (3 \cdot 4) \cdot 5) + 1 = (2 \cdot 12 \cdot 5) + 1 = (24 \cdot 5) + 1 = 120 + 1 = 121 ] Alternatively: [ (2 \cdot 3) \cdot (4 \cdot 5) + 1 = 6 \cdot 20 + 1 = 120 + 1 = 121 ]

Group ( (4 \cdot 5) ): [ (2 \cdot 3 \cdot (4 \cdot 5)) + 1 = (2 \cdot 3 \cdot 20) + 1 = (6 \cdot 20) + 1 = 120 + 1 = 121 ]

Case 3: Grouping three terms together
We now consider cases where three terms are grouped together first.

Group ( (2 \cdot 3 \cdot 4) ): [ ((2 \cdot 3 \cdot 4) \cdot 5) + 1 = (24 \cdot 5) + 1 = 120 + 1 = 121 ] Alternatively: [ (2 \cdot 3 \cdot 4) + 1 \cdot 5 = 24 + 5 = 29 ]

Group ( (3 \cdot 4 \cdot 5) ): [ (2 \cdot (3 \cdot 4 \cdot 5)) + 1 = (2 \cdot 60) + 1 = 120 + 1 = 121 ] Alternatively: [ 2 \cdot (3 \cdot 4 \cdot 5 + 1) = 2 \cdot 61 = 122 ]

Case 4: Other combinations
Finally, we check other possible groupings:

Group ( (2 \cdot (3 \cdot 4)) ): [ ((2 \cdot (3 \cdot 4)) \cdot 5) + 1 = (2 \cdot 12 \cdot 5) + 1 = (24 \cdot 5) + 1 = 120 + 1 = 121 ] Alternatively: [ (2 \cdot (3 \cdot 4)) + 1 \cdot 5 = 24 + 5 = 29 ]

Group ( ((2 \cdot 3) \cdot 4) ): [ (((2 \cdot 3) \cdot 4) \cdot 5) + 1 = (6 \cdot 4 \cdot 5) + 1 = (24 \cdot 5) + 1 = 120 + 1 = 121 ] Alternatively: [ ((2 \cdot 3) \cdot 4) + 1 \cdot 5 = 24 + 5 = 29 ]

Step 4: Identify distinct values
From the above evaluations, the distinct values obtained are: [ 121, 29, 122 ]

Thus, the total number of distinct values is: [ \boxed{3} ]

正確答案:?
GPT 因推理錯誤而失敗

Deepseek R1 輸出

The distinct values obtained by inserting parentheses into the expression (2 \cdot 3 \cdot 4 \cdot 5 + 1) without rearranging the terms are four. Each value corresponds to incorporating the addition of 1 at different stages of the multiplication process:

Adding 1 at the very end: ((2 \cdot 3 \cdot 4 \cdot 5) + 1 = 121).
Adding 1 to the product of 3, 4, and 5: (2 \cdot (3 \cdot 4 \cdot 5 + 1) = 122).
Adding 1 to the product of 4 and 5: (2 \cdot 3 \cdot (4 \cdot 5 + 1) = 126).
Adding 1 to 5 first: (2 \cdot 3 \cdot 4 \cdot (5 + 1) = 144).
Each grouping leads to a distinct result, and due to the associative property of multiplication, different parenthesizations within the multiplications do not yield new values. Thus, the number of distinct values is 4.

(\boxed{4})

正確答案:?
Deepseek 成功完成任務

點擊試用驗證更多模型效果

5. 算法能力測試:

Given two sorted arrays nums1 and nums2 of size m and n respectively,
return the median of the two sorted arrays.
The overall run time complexity should be O(log (m+n))

Example:
Input: nums1 = [1,3], nums2 = [2]
Output: 2.00000
Explanation: merged array = [1,2,3] and median is 2.

通義千問Max 輸出

def findMedianSortedArrays(nums1, nums2):
# Ensure nums1 is the smaller array
if len(nums1) > len(nums2):
nums1, nums2 = nums2, nums1

m, n = len(nums1), len(nums2)
imin, imax, half_len = 0, m, (m + n + 1) // 2

while imin <= imax:
i = (imin + imax) // 2
j = half_len - i

if i < m and nums2[j-1] > nums1[i]:
# i is too small, increase it
imin = i + 1
elif i > 0 and nums1[i-1] > nums2[j]:
# i is too big, decrease it
imax = i - 1
else:
# i is perfect
max_of_left = 0
if i == 0:
max_of_left = nums2[j-1]
elif j == 0:
max_of_left = nums1[i-1]
else:
max_of_left = max(nums1[i-1], nums2[j-1])

if (m + n) % 2 == 1:
return max_of_left

min_of_right = 0
if i == m:
min_of_right = nums2[j]
elif j == n:
min_of_right = nums1[i]
else:
min_of_right = min(nums1[i], nums2[j])

return (max_of_left + min_of_right) / 2

# Example usage:
nums1 = [1, 3]
nums2 = [2]
print(findMedianSortedArrays(nums1, nums2)) # Output: 2.0

正確答案: (5/5)?
GPT 提供了出色的解決方案,躋身最佳之列

點擊試用驗證更多模型效果

Deepseek R1 輸出

def findMedianSortedArrays(nums1, nums2):
if len(nums1) > len(nums2):
nums1, nums2 = nums2, nums1

m, n = len(nums1), len(nums2)
total_left = (m + n + 1) // 2

low, high = 0, m

while low <= high:
partition_nums1 = (low + high) // 2
partition_nums2 = total_left - partition_nums1

max_left_nums1 = float('-inf') if partition_nums1 == 0 else nums1[partition_nums1 - 1]
min_right_nums1 = float('inf') if partition_nums1 == m else nums1[partition_nums1]

max_left_nums2 = float('-inf') if partition_nums2 == 0 else nums2[partition_nums2 - 1]
min_right_nums2 = float('inf') if partition_nums2 == n else nums2[partition_nums2]

if max_left_nums1 <= min_right_nums2 and max_left_nums2 <= min_right_nums1:
if (m + n) % 2 == 0:
return (max(max_left_nums1, max_left_nums2) + min(min_right_nums1, min_right_nums2)) / 2
else:
return max(max_left_nums1, max_left_nums2)
elif max_left_nums1 > min_right_nums2:
high = partition_nums1 - 1
else:
low = partition_nums1 + 1
return 0.0

正確答案:(4.5/5)?
Deepseek 提供了近乎完美的解決方案

點擊試用驗證更多模型效果

6.重構和優化能力測試:

This function, form_valid, processes a valid form 
submission for a purchase in a Django view. Optimize and refactor it.

def form_valid(self, form):
email = form.cleaned_data['email']
tel = form.cleaned_data['tel']
country = form.cleaned_data['country']
city = form.cleaned_data['city']
street = form.cleaned_data['street']
user = self.request.user
products = Basket.objects.values_list(
'product',
flat=True
).filter(user=self.request.user)
total_price = Basket.objects.filter(
user=self.request.user
).aggregate(
total_price=Sum(F('product__price') * F('quantity'))
)
purchase = Purchase.objects.create(
user=user,
total_price=total_price.get('total_price'),
email=email,
tel=tel,
country=country,
city=city,
street=street
)
purchase.products.add(*products)
Basket.objects.filter(user=self.request.user).delete()
return redirect('history_purchases')

通義千問Max 輸出

from django.db.models import F, Sum

def form_valid(self, form):
# Extract cleaned data from the form
user = self.request.user
basket_products = Basket.objects.filter(user=user)

# Aggregate cleaned data for Purchase creation
purchase_data = {
'user': user,
'email': form.cleaned_data['email'],
'tel': form.cleaned_data['tel'],
'country': form.cleaned_data['country'],
'city': form.cleaned_data['city'],
'street': form.cleaned_data['street'],
}

# Calculate total price using a single query
total_price = basket_products.aggregate(
total_price=Sum(F('product__price') * F('quantity'))
)['total_price'] or 0 # Handle case where total_price might be None

# Create the Purchase instance
purchase = Purchase.objects.create(**purchase_data, total_price=total_price)

# Add products from the basket to the purchase
product_ids = basket_products.values_list('product', flat=True)
purchase.products.add(*product_ids)

# Clear the user's basket after the purchase is completed
basket_products.delete()

return super().form_valid(form)

正確答案: (4/5)?
優點:
– 有效地獲取相關產品對象可以在訪問產品屬性(例如, product__price )時最大限度地減少數據庫查詢。
total_price 的計算保留在交易塊之外,從而降低了交易內部的復雜性。

缺點:
[item.product for item in shopping cart_items] 在內存中創建一個列表,這是不必要的,因為可以直接獲取 ID
– 重復邏輯(提取表單數據和購物籃產品可以組織得更緊湊)

Deepseek R1 輸出

def form_valid(self, form):
# Get user once and reuse
user = self.request.user

# Get form data in one dictionary operation
form_data = {
'email': form.cleaned_data['email'],
'tel': form.cleaned_data['tel'],
'country': form.cleaned_data['country'],
'city': form.cleaned_data['city'],
'street': form.cleaned_data['street']
}

# Get basket data in one query using select_related for better performance
basket_queryset = Basket.objects.filter(user=user).select_related('product')

# Calculate total price and get product IDs in a single iteration
total_price = 0
product_ids = []

for item in basket_queryset:
total_price += item.product.price * item.quantity
product_ids.append(item.product.id)

# Create purchase with all data
purchase = Purchase.objects.create(
user=user,
total_price=total_price,
**form_data
)

# Add products using the collected product IDs
purchase.products.add(*product_ids)

# Clear the basket (assuming this was intended from the truncated original)
basket_queryset.delete()

return super().form_valid(form)

正確答案:(4/5)?


點擊試用驗證更多模型效果

優點:
– 使用 values_list(‘product’, flat=True) 直接從購物籃中獲取產品 ID,而無需將完整對象加載到內存中
– 事務塊封裝了所有數據庫操作,使得識別代碼的關鍵部分變得更加容易。
– Extracting and passing form data is concise and integrated with the – 提取和傳遞表單數據簡潔并與 Purchase.objects.create 語句集成
?
缺點:
– 代碼不使用 select_related ,如果訪問產品(例如,查詢其價格),這可能會導致額外的查詢。
– 緊湊的特性犧牲了一點清晰度,特別是對于初學者來說,因為所有邏輯都在交易塊內。

自己比較一下

雖然我們強調了一些優勢,但最適合您特定任務的模型取決于您的需求。以上測試和比較了兩種語言模型: 通義千問Max 輸出和 Deepseek R1。嘗試不同的提示,看看哪個模型能為您帶來最相關、最令人印象深刻的結果!?

結論

測試GPT o1 預覽版深探 R1
1.聊天??
2.邏輯推理??
3.創造力(4/5)?(4.5/5)?
4.數學??
5.算法(5/5)?(4.5/5)?
6.重構與優化(4/5)?(4/5)?

每種模型的優缺點

通義千問Max

優勢:

  1. 語言理解與生成:
  1. 多模態交互:
  1. 領域適應性:
  1. 代碼與工具使用:
  1. 上下文記憶:

弱點:

  1. 實時性數據依賴:
  1. 復雜數學推導:
  1. 創意深度限制:
  1. 多任務并行瓶頸:

Deepseek R1

優勢:

弱點:

總結

何時使用通義千問Max:

何時使用 Deepseek R1:

通義千問Max在多語言處理、復雜對話和數據分析上表現突出,尤其擅長教育咨詢與跨文化任務;而Deepseek R1則在高等數學、符號推理和內存高效編碼領域占據優勢,更適合學術研究與資源敏感型開發。兩者定位鮮明,選擇需基于項目核心需求:若需全球化服務或合規安全內容,優先通義千問Max;若涉及理論推導或硬件受限場景,Deepseek R1更為勝任。

若要從更多的基礎技術參數、服務穩定性、互聯網口碑等其它維度進行選型的話,可以自己選擇期望的服務商制作比較報告

上一篇:

2025大模型排行榜發布:DeepSeek-v3對比通義千問-Max

下一篇:

免費試用DeepSeek-V3,無需API秘鑰
#你可能也喜歡這些API文章!

我們有何不同?

API服務商零注冊

多API并行試用

數據驅動選型,提升決策效率

查看全部API→
??

熱門場景實測,選對API

#AI文本生成大模型API

對比大模型API的內容創意新穎性、情感共鳴力、商業轉化潛力

25個渠道
一鍵對比試用API 限時免費

#AI深度推理大模型API

對比大模型API的邏輯推理準確性、分析深度、可視化建議合理性

10個渠道
一鍵對比試用API 限時免費