国产精品原创巨作av,91亚洲精品第一综合不卡播放,国产91在线|日韩

國內開源AI大模型實踐測試

我們在充滿挑戰(zhàn)的真實場景中對人工智能模型進行嚴格測試。我們的評估重點關注對話能力、邏輯推理、數(shù)學問題解決能力和編程性能等關鍵領域，以評估每個模型的優(yōu)勢和劣勢。

1. 對話能力測試：

讓我們從一個熟悉的挑戰(zhàn)開始：很多人都知道，AI 模型在諸如計算單詞中特定字母的數(shù)量之類的任務上經常遇到困難。讓我們看看 DeepSeek R1 和通義千問Max 如何處理這項任務。（我故意拼錯了這個詞）

How many letters 'r' are in the word strawbery?

通義千問Max 輸出

There are **2** instances of the letter 'r' in "strawbery."

正確答案：?

Deepseek R1 輸出

The word “strawbery” (as spelled) contains 2 instances of the letter ‘r’. Here’s the breakdown:



S

T

R (first ‘r’)

A

W

B

E

R (second ‘r’)

Y

Note: The correct spelling of the fruit is “strawberry” (with two 'r’s and two 'b’s), but the count here is based on the provided spelling.

正確答案：??

點擊試用驗證更多模型效果

2. 邏輯推理能力測試：

模型將接受一個陳述和幾個可能的答案。它的任務是選出正確的答案。

The medieval Arabs had many manuscripts of the ancient Greek. 

When needed, they translate them into Arabic. 

Medieval Arab philosophers were very interested in Aristotle's Theory of Poetry, 

which was obviously not shared by Arab poets, 

because a poet interested in it must want to read Homer's poems. 

Aristotle himself often quotes Homer's poems. 

However, Homer's poems were not translated into Arabic until modern times.



Which of the following options, if true, strongly supports the above argument? 

Please only return the letter in front of your predict correct option, A, B, C or D.



A. Some medieval Arab translators have manuscripts of Homer poems in ancient Greek.



B. Aristotle's Theory of Poetry is often quoted and commented by modern Arab poets.



C. In Aristotle's Theory of Poetry, most of the content is related to drama, 

and medieval Arabs also wrote plays and performed them.



D. A series of medieval Arab stories, such as Arab Night, 

are very similar to some parts of Homer's epic.

通義千問Max 輸出

正確答案：?
通義千問Max 成功完成了這項任務

Deepseek R1 輸出

正確答案：?
Deepseek R1 無法成功完成此任務

點擊試用驗證更多模型效果

3.創(chuàng)造力能力測試：

Write a short poem about friendship

通義千問Max 輸出

In life’s journey, paths may twist and turn,

Friendship’s light forever will burn.

A bond so pure, a heart’s true find,

Through every season, love intertwined.



Hand in hand, we face the storm,

With gentle words, a healing balm.

Laughter shared, or silent gaze,

In friendships deep, our souls find ways.



To lift, support, and understand—

An open heart, a helping hand.

So cherish near, each kindred soul,

For friendships make us whole.

正確答案： (4/5)?
這是一首扎實、真摯的詩，但在執(zhí)行上卻略顯可預測。

Deepseek R1 輸出

Bonds Unseen



In laughter’s echo, through silent tears,

A thread weaves softly, spanning years—

Not bound by time, nor duty’s chain,

But roots that hold through wind and rain.



Shared sunsets, secrets whispered low,

A shelter when the cold winds blow.

No score, no count, just open hands,

A map of trust where each soul stands.



When shadows stretch and paths divide,

Your voice becomes my steady guide.

Not perfect, flawed, yet fiercely true,

A mirror showing the best in you.



For friendship’s not a blaze, bright-brief,

But embers glowing, slow, beneath—

A quiet flame that time won’t dim,

The quiet hum of home within.

正確答案：（4.5/5）?
這是一首生動且發(fā)人深省的詩歌，雖然不太容易理解，但值得深入思考。
?

很難評判哪首詩更好，因為每首詩都有其獨特的優(yōu)點。此外，對詩歌的看法非常主觀，因人而異。不過，我上面的評分只是對每首詩的個人看法。
?

點擊試用驗證更多模型效果

4.數(shù)學能力測試：

我用幾個不同的任務測試了這些模型，它們都成功完成了。然而，GPT 在這個任務上遇到了困難。

The expression 2?3?4?5+1=121, as multiplication is performed before addition. 

However, other values can be obtained by inserting parentheses into the expression. 

For example:

(2?(3?4))?(5+1)=144

How many distinct values can be obtained by inserting parentheses into 

2?3?4?5+1, without rearranging the terms?

通義千問Max 輸出

To solve this problem, we need to analyze the different ways parentheses can be inserted into the expression ( 2 \cdot 3 \cdot 4 \cdot 5 + 1 ), while ensuring that the terms are not rearranged. The goal is to determine how many distinct values can be obtained.



Step 1: Understand the structure of the expression

The original expression is: [ 2 \cdot 3 \cdot 4 \cdot 5 + 1 ] Here, multiplication has higher precedence than addition. Without parentheses, the default evaluation order is: [ (2 \cdot 3 \cdot 4 \cdot 5) + 1 = 120 + 1 = 121 ]



By inserting parentheses, we can change the grouping of terms and thus alter the result. However, the order of the terms ( 2, 3, 4, 5, +1 ) must remain fixed.



Step 2: Count the number of ways to insert parentheses

The expression consists of four multiplications and one addition. To evaluate it, we must group the terms using parentheses. This is equivalent to finding all possible binary trees with four leaves (one for each number ( 2, 3, 4, 5 )) and a single addition operation.



Catalan numbers and binary tree enumeration

The number of ways to parenthesize ( n ) terms in an expression is given by the ( (n-1) )-th Catalan number. For four terms (( n = 4 )), the number of ways to parenthesize them is: [ C_3 = \frac{1}{3+1} \binom{6}{3} = \frac{1}{4} \cdot 20 = 5 ] Thus, there are 5 ways to parenthesize the multiplication part of the expression ( 2 \cdot 3 \cdot 4 \cdot 5 ).



For each of these 5 groupings, the addition ( +1 ) can either be applied last or grouped earlier with some subset of the terms. This introduces additional possibilities.



Step 3: Evaluate all possible parenthesizations

We now systematically evaluate all distinct results that can arise from the different parenthesizations.



Case 1: Default order (no parentheses)

Without any parentheses, the expression evaluates as: [ (2 \cdot 3 \cdot 4 \cdot 5) + 1 = 120 + 1 = 121 ]



Case 2: Grouping two terms at a time

We consider all possible ways to group two terms together first, then proceed with the rest of the computation.



Group ( (2 \cdot 3) ): [ ((2 \cdot 3) \cdot 4 \cdot 5) + 1 = (6 \cdot 4 \cdot 5) + 1 = (24 \cdot 5) + 1 = 120 + 1 = 121 ] Alternatively: [ (2 \cdot 3) \cdot (4 \cdot 5) + 1 = 6 \cdot 20 + 1 = 120 + 1 = 121 ]



Group ( (3 \cdot 4) ): [ (2 \cdot (3 \cdot 4) \cdot 5) + 1 = (2 \cdot 12 \cdot 5) + 1 = (24 \cdot 5) + 1 = 120 + 1 = 121 ] Alternatively: [ (2 \cdot 3) \cdot (4 \cdot 5) + 1 = 6 \cdot 20 + 1 = 120 + 1 = 121 ]



Group ( (4 \cdot 5) ): [ (2 \cdot 3 \cdot (4 \cdot 5)) + 1 = (2 \cdot 3 \cdot 20) + 1 = (6 \cdot 20) + 1 = 120 + 1 = 121 ]



Case 3: Grouping three terms together

We now consider cases where three terms are grouped together first.



Group ( (2 \cdot 3 \cdot 4) ): [ ((2 \cdot 3 \cdot 4) \cdot 5) + 1 = (24 \cdot 5) + 1 = 120 + 1 = 121 ] Alternatively: [ (2 \cdot 3 \cdot 4) + 1 \cdot 5 = 24 + 5 = 29 ]



Group ( (3 \cdot 4 \cdot 5) ): [ (2 \cdot (3 \cdot 4 \cdot 5)) + 1 = (2 \cdot 60) + 1 = 120 + 1 = 121 ] Alternatively: [ 2 \cdot (3 \cdot 4 \cdot 5 + 1) = 2 \cdot 61 = 122 ]



Case 4: Other combinations

Finally, we check other possible groupings:



Group ( (2 \cdot (3 \cdot 4)) ): [ ((2 \cdot (3 \cdot 4)) \cdot 5) + 1 = (2 \cdot 12 \cdot 5) + 1 = (24 \cdot 5) + 1 = 120 + 1 = 121 ] Alternatively: [ (2 \cdot (3 \cdot 4)) + 1 \cdot 5 = 24 + 5 = 29 ]



Group ( ((2 \cdot 3) \cdot 4) ): [ (((2 \cdot 3) \cdot 4) \cdot 5) + 1 = (6 \cdot 4 \cdot 5) + 1 = (24 \cdot 5) + 1 = 120 + 1 = 121 ] Alternatively: [ ((2 \cdot 3) \cdot 4) + 1 \cdot 5 = 24 + 5 = 29 ]



Step 4: Identify distinct values

From the above evaluations, the distinct values obtained are: [ 121, 29, 122 ]



Thus, the total number of distinct values is: [ \boxed{3} ]

正確答案：?
GPT 因推理錯誤而失敗

Deepseek R1 輸出

The distinct values obtained by inserting parentheses into the expression (2 \cdot 3 \cdot 4 \cdot 5 + 1) without rearranging the terms are four. Each value corresponds to incorporating the addition of 1 at different stages of the multiplication process:



Adding 1 at the very end: ((2 \cdot 3 \cdot 4 \cdot 5) + 1 = 121).

Adding 1 to the product of 3, 4, and 5: (2 \cdot (3 \cdot 4 \cdot 5 + 1) = 122).

Adding 1 to the product of 4 and 5: (2 \cdot 3 \cdot (4 \cdot 5 + 1) = 126).

Adding 1 to 5 first: (2 \cdot 3 \cdot 4 \cdot (5 + 1) = 144).

Each grouping leads to a distinct result, and due to the associative property of multiplication, different parenthesizations within the multiplications do not yield new values. Thus, the number of distinct values is 4.



(\boxed{4})

正確答案：?
Deepseek 成功完成任務

點擊試用驗證更多模型效果

5. 算法能力測試：

Given two sorted arrays nums1 and nums2 of size m and n respectively,

return the median of the two sorted arrays. 

The overall run time complexity should be O(log (m+n))



Example:

Input: nums1 = [1,3], nums2 = [2]

Output: 2.00000

Explanation: merged array = [1,2,3] and median is 2.

通義千問Max 輸出

def findMedianSortedArrays(nums1, nums2):

    # Ensure nums1 is the smaller array

    if len(nums1) > len(nums2):

        nums1, nums2 = nums2, nums1



    m, n = len(nums1), len(nums2)

    imin, imax, half_len = 0, m, (m + n + 1) // 2



    while imin <= imax:

        i = (imin + imax) // 2

        j = half_len - i



        if i < m and nums2[j-1] > nums1[i]:

            # i is too small, increase it

            imin = i + 1

        elif i > 0 and nums1[i-1] > nums2[j]:

            # i is too big, decrease it

            imax = i - 1

        else:

            # i is perfect

            max_of_left = 0

            if i == 0:

                max_of_left = nums2[j-1]

            elif j == 0:

                max_of_left = nums1[i-1]

            else:

                max_of_left = max(nums1[i-1], nums2[j-1])



            if (m + n) % 2 == 1:

                return max_of_left



            min_of_right = 0

            if i == m:

                min_of_right = nums2[j]

            elif j == n:

                min_of_right = nums1[i]

            else:

                min_of_right = min(nums1[i], nums2[j])



            return (max_of_left + min_of_right) / 2



# Example usage:

nums1 = [1, 3]

nums2 = [2]

print(findMedianSortedArrays(nums1, nums2))  # Output: 2.0

正確答案： (5/5)?
GPT 提供了出色的解決方案，躋身最佳之列

點擊試用驗證更多模型效果

Deepseek R1 輸出

def findMedianSortedArrays(nums1, nums2):

    if len(nums1) > len(nums2):

        nums1, nums2 = nums2, nums1



    m, n = len(nums1), len(nums2)

    total_left = (m + n + 1) // 2



    low, high = 0, m



    while low <= high:

        partition_nums1 = (low + high) // 2

        partition_nums2 = total_left - partition_nums1



        max_left_nums1 = float('-inf') if partition_nums1 == 0 else nums1[partition_nums1 - 1]

        min_right_nums1 = float('inf') if partition_nums1 == m else nums1[partition_nums1]



        max_left_nums2 = float('-inf') if partition_nums2 == 0 else nums2[partition_nums2 - 1]

        min_right_nums2 = float('inf') if partition_nums2 == n else nums2[partition_nums2]



        if max_left_nums1 <= min_right_nums2 and max_left_nums2 <= min_right_nums1:

            if (m + n) % 2 == 0:

                return (max(max_left_nums1, max_left_nums2) + min(min_right_nums1, min_right_nums2)) / 2

            else:

                return max(max_left_nums1, max_left_nums2)

        elif max_left_nums1 > min_right_nums2:

            high = partition_nums1 - 1

        else:

            low = partition_nums1 + 1

    return 0.0

正確答案：（4.5/5）?
Deepseek 提供了近乎完美的解決方案

點擊試用驗證更多模型效果

6.重構和優(yōu)化能力測試：

This function, form_valid, processes a valid form 

submission for a purchase in a Django view. Optimize and refactor it.



def form_valid(self, form):

        email = form.cleaned_data['email']

        tel = form.cleaned_data['tel']

        country = form.cleaned_data['country']

        city = form.cleaned_data['city']

        street = form.cleaned_data['street']

        user = self.request.user

        products = Basket.objects.values_list(

        'product', 

        flat=True

        ).filter(user=self.request.user)

        total_price = Basket.objects.filter(

            user=self.request.user

        ).aggregate(

            total_price=Sum(F('product__price') * F('quantity'))

        )

        purchase = Purchase.objects.create(

            user=user, 

            total_price=total_price.get('total_price'), 

            email=email, 

            tel=tel,

            country=country,

            city=city, 

            street=street

        )

        purchase.products.add(*products)

        Basket.objects.filter(user=self.request.user).delete()

        return redirect('history_purchases')

通義千問Max 輸出

from django.db.models import F, Sum



def form_valid(self, form):

    # Extract cleaned data from the form

    user = self.request.user

    basket_products = Basket.objects.filter(user=user)



    # Aggregate cleaned data for Purchase creation

    purchase_data = {

        'user': user,

        'email': form.cleaned_data['email'],

        'tel': form.cleaned_data['tel'],

        'country': form.cleaned_data['country'],

        'city': form.cleaned_data['city'],

        'street': form.cleaned_data['street'],

    }



    # Calculate total price using a single query

    total_price = basket_products.aggregate(

        total_price=Sum(F('product__price') * F('quantity'))

    )['total_price'] or 0  # Handle case where total_price might be None



    # Create the Purchase instance

    purchase = Purchase.objects.create(**purchase_data, total_price=total_price)



    # Add products from the basket to the purchase

    product_ids = basket_products.values_list('product', flat=True)

    purchase.products.add(*product_ids)



    # Clear the user's basket after the purchase is completed

    basket_products.delete()



    return super().form_valid(form)

正確答案： (4/5)?
優(yōu)點：
– 有效地獲取相關產品對象可以在訪問產品屬性（例如， product__price ）時最大限度地減少數(shù)據庫查詢。
– total_price 的計算保留在交易塊之外，從而降低了交易內部的復雜性。

缺點：
– [item.product for item in shopping cart_items] 在內存中創(chuàng)建一個列表，這是不必要的，因為可以直接獲取 ID
– 重復邏輯（提取表單數(shù)據和購物籃產品可以組織得更緊湊）

Deepseek R1 輸出

def form_valid(self, form):

    # Get user once and reuse

    user = self.request.user



    # Get form data in one dictionary operation

    form_data = {

        'email': form.cleaned_data['email'],

        'tel': form.cleaned_data['tel'],

        'country': form.cleaned_data['country'],

        'city': form.cleaned_data['city'],

        'street': form.cleaned_data['street']

    }



    # Get basket data in one query using select_related for better performance

    basket_queryset = Basket.objects.filter(user=user).select_related('product')



    # Calculate total price and get product IDs in a single iteration

    total_price = 0

    product_ids = []



    for item in basket_queryset:

        total_price += item.product.price * item.quantity

        product_ids.append(item.product.id)



    # Create purchase with all data

    purchase = Purchase.objects.create(

        user=user,

        total_price=total_price,

        **form_data

    )



    # Add products using the collected product IDs

    purchase.products.add(*product_ids)



    # Clear the basket (assuming this was intended from the truncated original)

    basket_queryset.delete()



    return super().form_valid(form)

正確答案：（4/5）?

點擊試用驗證更多模型效果

優(yōu)點：
– 使用 values_list(‘product’, flat=True) 直接從購物籃中獲取產品 ID，而無需將完整對象加載到內存中
– 事務塊封裝了所有數(shù)據庫操作，使得識別代碼的關鍵部分變得更加容易。
– Extracting and passing form data is concise and integrated with the – 提取和傳遞表單數(shù)據簡潔并與 Purchase.objects.create 語句集成
?
缺點：
– 代碼不使用 select_related ，如果訪問產品（例如，查詢其價格），這可能會導致額外的查詢。
– 緊湊的特性犧牲了一點清晰度，特別是對于初學者來說，因為所有邏輯都在交易塊內。

自己比較一下

雖然我們強調了一些優(yōu)勢，但最適合您特定任務的模型取決于您的需求。以上測試和比較了兩種語言模型：通義千問Max 輸出和 Deepseek R1。嘗試不同的提示，看看哪個模型能為您帶來最相關、最令人印象深刻的結果！?

結論

測試	GPT o1 預覽版	深探 R1
1.聊天	?	?
2.邏輯推理	?	?
3.創(chuàng)造力	（4/5）?	（4.5/5）?
4.數(shù)學	?	?
5.算法	（5/5）?	（4.5/5）?
6.重構與優(yōu)化	（4/5）?	（4/5）?

每種模型的優(yōu)缺點

通義千問Max

優(yōu)勢：

語言理解與生成：

在多語言文本處理（如翻譯、摘要）中表現(xiàn)優(yōu)異，尤其在中文語境下能精準捕捉細微語義差異。
生成自然流暢的長篇內容（如報告、故事），邏輯連貫且風格適配性強。

多模態(tài)交互：

支持圖像、文本混合輸入的分析與描述，在視覺問答（VQA）任務中響應準確。

領域適應性：

對垂直領域（如法律、醫(yī)療）的專業(yè)術語和知識庫整合能力突出，提供符合行業(yè)規(guī)范的輸出。

代碼與工具使用：

熟練調用API和編寫腳本（Python、SQL等），能結合外部工具（如Wolfram Alpha）增強計算能力。

上下文記憶：

在長對話中保持較高的上下文一致性，減少重復或偏離主題的情況。

弱點：

實時性數(shù)據依賴：

對2024年后的動態(tài)信息（如新聞、科技進展）可能缺乏更新，需依賴外部檢索補足。

復雜數(shù)學推導：

涉及高階數(shù)學（如拓撲學、非線性方程）時可能出現(xiàn)步驟跳躍或驗證錯誤。

創(chuàng)意深度限制：

藝術類創(chuàng)作（如音樂作曲、抽象繪畫描述）時，創(chuàng)新性略遜于人類專家水平。

多任務并行瓶頸：

同時處理多個異構任務（如實時翻譯+代碼生成）時，響應速度或精度可能下降。

Deepseek R1

優(yōu)勢：

數(shù)學問題解決：擅長處理高級數(shù)學任務，展示出強大的推理能力。
研究生水平的邏輯：在需要細致邏輯推理的任務（例如 GPQA 基準）中表現(xiàn)出色。
編碼效率：在數(shù)據庫交互中使用內存優(yōu)化方法，提供干凈有效的解決方案。
創(chuàng)造力：產生發(fā)人深省、富有反思性的創(chuàng)意成果，強調深度和原創(chuàng)性。

弱點：

Web 開發(fā)：添加按鈕等功能，但有時其實現(xiàn)或功能會失敗。

總結

何時使用通義千問Max：

多輪對話與上下文理解：擅長處理需要長期記憶的復雜對話，如深度咨詢或連續(xù)問答。
多語言翻譯與處理：支持高精度跨語言任務，包括實時翻譯和文化語境適配。
數(shù)據分析與可視化：適用于結構化數(shù)據解讀、統(tǒng)計推斷及圖表生成需求。
教育輔助與知識檢索：高效解答學術問題，提供分步驟的學習資源或解題思路。
安全與合規(guī)內容審核：可靠識別敏感信息，生成符合倫理規(guī)范的響應。（注：通義千問Max作為通用大模型，其能力覆蓋更廣泛，但以上場景為突出優(yōu)勢領域。）

何時使用 Deepseek R1：

數(shù)學問題解決：高級數(shù)學推理任務和問題解決的首選。
研究生水平的邏輯：非常適合需要深度邏輯推理的細微任務。
創(chuàng)意輸出：適合生成具有深度、反思性、發(fā)人深省的內容。
高效編碼：適用于需要內存優(yōu)化和簡潔實現(xiàn)的任務。

通義千問Max在多語言處理、復雜對話和數(shù)據分析上表現(xiàn)突出，尤其擅長教育咨詢與跨文化任務；而Deepseek R1則在高等數(shù)學、符號推理和內存高效編碼領域占據優(yōu)勢，更適合學術研究與資源敏感型開發(fā)。兩者定位鮮明，選擇需基于項目核心需求：若需全球化服務或合規(guī)安全內容，優(yōu)先通義千問Max；若涉及理論推導或硬件受限場景，Deepseek R1更為勝任。

若要從更多的基礎技術參數(shù)、服務穩(wěn)定性、互聯(lián)網口碑等其它維度進行選型的話，可以自己選擇期望的服務商制作比較報告。