久久欧美精品1024你懂得,欧美亚洲国产视频,999国产视频

開始，咱們用一個淺顯易懂的例子來說明 Transformer 是怎么工作的。

假設你在讀一句話：“小明今天去商店買了一本書。”

傳統(tǒng)方法的問題

傳統(tǒng)的模型（比如循環(huán)神經(jīng)網(wǎng)絡，RNN）是按順序閱讀這句話的。也就是說，它先看到“小明”，然后是“今天”，再是“去商店”……每讀到一個詞，它才記住前面的部分。這種方式雖然有效，但如果句子很長，前面的詞就容易被“忘記”，特別是如果你想知道小明到底買了什么，這個“書”就很重要，但它離“小明”很遠。

那Transformer 如何解決？

Transformer 不按順序逐個看句子，而是一次性把整個句子都看一遍，它會考慮每個詞和其他詞之間的關系。

舉個例子：

1. 自注意力機制： Transformer 會問自己：“句子里的每個詞，跟其他詞有什么關系？” 比如：

“小明”跟“買”有關系，因為小明是買東西的人。
“買”和“書”有關系，因為買的東西是書。
“今天”和“去商店”有關系，因為今天是去商店的時間。

2. 權重： 這些詞的關系是有“權重”的，意思是有的詞的關系更重要，比如“買”和“書”之間的關系要比“買”和“今天”之間的關系重要，因為我們更關心買的是什么。

3. 并行處理： Transformer 并不像傳統(tǒng)模型那樣一步一步處理，而是并行地處理句子中的每個詞。它的自注意力機制可以一次就“看到”句子里的所有詞，并快速找到它們之間的重要關系。

再舉個通俗的例子：

假設 Transformer 是一個正在聽朋友講故事的小學生。故事很長，小學生不能依靠只記住故事開頭就理解整個故事。所以他一邊聽一邊在腦中快速建立人物、地點、事件之間的聯(lián)系：

他知道“小明”是主角，所以聽到“小明”做什么事情時特別注意。
他聽到“買了書”，就能迅速聯(lián)系到“小明”是買書的那個人。
他也知道“商店”和“買東西”有關，所以商店的存在也很重要。

這樣，即使故事很復雜，小學生依然能通過理解這些聯(lián)系快速明白故事的意思。這就是 Transformer 的思路！

總之，Transformer 的厲害之處在于它并行處理句子中的所有詞，并且通過自注意力機制，理解每個詞跟其他詞的關系。這樣即使句子很長，它也能快速且準確地抓住句子的意思。

下面，咱們詳細的聊聊原理、公式以及一個案例代碼，這里不使用現(xiàn)成的Python包，重點再原理的理解~

Transformer 公式推導

Transformer 的核心在于自注意力機制和位置編碼。

自注意力機制（Scaled Dot-Product Attention）

自注意力機制是 Transformer 中最重要的部分。它的目標是讓每個詞（或特征）能夠和其他詞建立聯(lián)系。這個過程分為幾個步驟：

位置編碼

由于 Transformer 不像 RNN 那樣按順序處理輸入，它通過位置編碼給每個詞的位置加上位置信息。位置編碼的公式如下：

案例構建

我們使用一個簡化的 Transformer 進行文本分類任務。為了避免使用高級框架，讓大家更容易理解其原理。咱們零開始實現(xiàn)自注意力機制和模型的訓練。

Kaggle 數(shù)據(jù)集

我們使用 Kaggle 上的 “IMDb Movie Reviews” 數(shù)據(jù)集進行文本分類任務（正面/負面情感）。

點

我們先加載數(shù)據(jù)集并進行預處理：

import numpy as np

import pandas as pd

import re

from sklearn.model_selection import train_test_split



# 加載 IMDb 數(shù)據(jù)集

df = pd.read_csv('IMDB Dataset.csv')



# 數(shù)據(jù)預處理（簡單清理）

def clean_text(text):

    text = re.sub(r'<.*?>', '', text)

    text = re.sub(r'[^a-zA-Z\s]', '', text)

    text = text.lower()

    return text



df['clean_review'] = df['review'].apply(clean_text)

df['label'] = df['sentiment'].map({'positive': 1, 'negative': 0})



# 拆分訓練集和測試集

X_train, X_test, y_train, y_test = train_test_split(df['clean_review'], df['label'], test_size=0.2, random_state=42)



# 簡單的詞匯表構建

from collections import Counter

vocab = Counter()

for text in X_train:

    vocab.update(text.split())



vocab_size = 5000

vocab = dict(vocab.most_common(vocab_size))



# 構建詞向量

word2idx = {word: idx for idx, (word, _) in enumerate(vocab.items(), 1)}

word2idx['<UNK>'] = 0



# 將文本轉(zhuǎn)為索引

def text_to_sequence(text):

    return [word2idx.get(word, 0) for word in text.split()]



X_train_seq = [text_to_sequence(text) for text in X_train]

X_test_seq = [text_to_sequence(text) for text in X_test]

自注意力機制實現(xiàn)

class ScaledDotProductAttention:

    def __init__(self, d_k):

        self.d_k = d_k



    def attention(self, Q, K, V):

        scores = np.dot(Q, K.T) / np.sqrt(self.d_k)

        attention_weights = np.exp(scores) / np.sum(np.exp(scores), axis=1, keepdims=True)

        return np.dot(attention_weights, V)



# 示例輸入，假設我們已經(jīng)有了詞向量

Q = np.random.rand(10, 64)  # 10個詞，64維度

K = np.random.rand(10, 64)

V = np.random.rand(10, 64)



attention = ScaledDotProductAttention(64)

output = attention.attention(Q, K, V)

可視化分析與復雜圖形

我們現(xiàn)在對數(shù)據(jù)集進行一些可視化分析，比如詞頻分布、模型的訓練損失等。下面展示了如何生成顏色鮮艷的圖形：

import matplotlib.pyplot as plt

import seaborn as sns



# 詞頻分布

word_counts = pd.DataFrame(vocab.items(), columns=['word', 'count']).sort_values(by='count', ascending=False).head(20)



plt.figure(figsize=(12, 6))

sns.barplot(x='count', y='word', data=word_counts, palette='rainbow')

plt.title('Top 20 Words by Frequency')

plt.show()

# 模型訓練過程中的損失變化（假設我們有訓練的 loss 記錄）

losses = np.random.rand(100)  # 假設有 100 輪訓練的損失

plt.figure(figsize=(12, 6))

plt.plot(losses, color='magenta')

plt.title('Training Loss Over Time')

plt.xlabel('Epoch')

plt.ylabel('Loss')

plt.show()

整體代碼：

import numpy as np

import pandas as pd

import re

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.model_selection import train_test_split



# 加載 IMDb 數(shù)據(jù)集

df = pd.read_csv('dataset/IMDB Dataset.csv')



# 數(shù)據(jù)預處理（簡單清理）

def clean_text(text):

    text = re.sub(r'<.*?>', '', text)

    text = re.sub(r'[^a-zA-Z\s]', '', text)

    text = text.lower()

    return text



df['clean_review'] = df['review'].apply(clean_text)

df['label'] = df['sentiment'].map({'positive': 1, 'negative': 0})



# 拆分訓練集和測試集

X_train, X_test, y_train, y_test = train_test_split(df['clean_review'], df['label'], test_size=0.2, random_state=42)



# 簡單的詞匯表構建

from collections import Counter



vocab = Counter()

for text in X_train:

    vocab.update(text.split())



vocab_size = 5000

vocab = dict(vocab.most_common(vocab_size))



# 構建詞向量

word2idx = {word: idx for idx, (word, _) in enumerate(vocab.items(), 1)}

word2idx['<UNK>'] = 0



# 將文本轉(zhuǎn)為索引

def text_to_sequence(text):

    return [word2idx.get(word, 0) for word in text.split()]



X_train_seq = [text_to_sequence(text) for text in X_train]

X_test_seq = [text_to_sequence(text) for text in X_test]



class ScaledDotProductAttention:

    def __init__(self, d_k):

        self.d_k = d_k



    def attention(self, Q, K, V):

        scores = np.dot(Q, K.T) / np.sqrt(self.d_k)

        attention_weights = np.exp(scores) / np.sum(np.exp(scores), axis=1, keepdims=True)

        return np.dot(attention_weights, V)



# 示例輸入，假設我們已經(jīng)有了詞向量

Q = np.random.rand(10, 64)  # 10個詞，64維度

K = np.random.rand(10, 64)

V = np.random.rand(10, 64)



attention = ScaledDotProductAttention(64)

output = attention.attention(Q, K, V)



# 詞頻分布

word_counts = pd.DataFrame(vocab.items(), columns=['word', 'count']).sort_values(by='count', ascending=False).head(20)



plt.figure(figsize=(12, 6))

sns.barplot(x='count', y='word', data=word_counts, palette='rainbow')

plt.title('Top 20 Words by Frequency')

plt.show()



# 模型訓練過程中的損失變化（假設我們有訓練的 loss 記錄）

losses = np.random.rand(100)  # 假設有 100 輪訓練的損失

plt.figure(figsize=(12, 6))

plt.plot(losses, color='magenta')

plt.title('Training Loss Over Time')

plt.xlabel('Epoch')

plt.ylabel('Loss')

plt.show()