国产精品一区久久,中文字幕在线观看免费,久草视频福利资源站

在此背景下，D. Lai等人在期刊《Engineering Applications of Artificial Intelligence》（2024年）中發(fā)表的文章，展示了通過泰勒圖評估多種機器學習模型（如XGBoost、ANN、GPR和NGBoost）在回歸任務中的表現(xiàn)，他們通過觀察模型預測值與真實數(shù)據(jù)的標準差和相關性來衡量模型性能，受此啟發(fā)，本文將基于相似的原理，結合 XGBoost、隨機森林（Random Forest）和 CatBoost 等幾種主流回歸模型，利用泰勒圖進行性能對比分析，帶領大家深入了解多模型的可視化評估方法

作用

通過泰勒圖 (Taylor Diagram) 來直觀比較模型的性能，泰勒圖通過標準差和相關系數(shù)來展示模型表現(xiàn)，并包含RMSE的等高線

最終的泰勒圖對比了各模型相對于觀測數(shù)據(jù)的表現(xiàn)，為模型的統(tǒng)計特性提供了直觀的展示

代碼實現(xiàn)

import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split plt.rcParams['font.family'] = 'Times New Roman' plt.rcParams['axes.unicode_minus'] = False df = pd.read_excel('2024-10-27-公眾號Python機器學習AI.xlsx') # 劃分特征和目標變量 X = df.drop(['待預測變量Y'], axis=1) y = df['待預測變量Y'] # 劃分訓練集和測試集 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

讀取 Excel 文件中的數(shù)據(jù)，并將特征和目標變量進行劃分，然后使用 train_test_split 將數(shù)據(jù)集分為訓練集和測試集

import xgboost as xgb from sklearn.model_selection import GridSearchCV # XGBoost模型參數(shù) params_xgb = { 'learning_rate': 0.02, # 學習率，控制每一步的步長，用于防止過擬合。典型值范圍：0.01 - 0.1 'booster': 'gbtree', # 提升方法，這里使用梯度提升樹（Gradient Boosting Tree） 'objective': 'reg:squarederror', # 損失函數(shù)，這里使用平方誤差 'max_leaves': 127, # 每棵樹的葉子節(jié)點數(shù)量，控制模型復雜度。較大值可以提高模型復雜度但可能導致過擬合 'verbosity': 1, # 控制 XGBoost 輸出信息的詳細程度，0表示無輸出，1表示輸出進度信息 'seed': 42, # 隨機種子，用于重現(xiàn)模型的結果 'nthread': -1, # 并行運算的線程數(shù)量，-1表示使用所有可用的CPU核心 'colsample_bytree': 0.6, # 每棵樹隨機2024-10-27-公眾號Python機器學習AI選擇的特征比例，用于增加模型的泛化能力 'subsample': 0.7 # 每次迭代時隨機選擇的樣本比例，用于增加模型的泛化能力 } model_xgb = xgb.XGBRegressor(**params_xgb) # 定義參數(shù)網格，用于網格搜索 param_grid = { 'n_estimators': [100, 200], # 樹的數(shù)量，控制模型的復雜度 'max_depth': [3, 4], # 樹的最大深度，控制模型的復雜度，防止過擬合 'min_child_weight': [1, 2], # 節(jié)點最小權重，值越大，算法越保守，用于控制過擬合 } grid_search = GridSearchCV( estimator=model_xgb, param_grid=param_grid, scoring='neg_root_mean_squared_error', cv=5, n_jobs=-1, verbose=1 ) # 訓練模型 grid_search.fit(X_train, y_train) xgboost = grid_search.best_estimator_ from sklearn.metrics import mean_squared_error from sklearn.metrics import r2_score # 預測 y_pred = xgboost.predict(X_test) # 計算標準差 std_dev_pred = np.std(y_pred) std_dev_obs = np.std(y_test) # 計算相關系數(shù) correlation = np.corrcoef(y_test, y_pred)[0, 1] # 計算均方根2024-10-27-公眾號Python機器學習AI誤差 (RMSE) rmse = np.sqrt(mean_squared_error(y_test, y_pred)) # 計算 R2 r2 = r2_score(y_test, y_pred) # 保存為DataFrame metrics_df = pd.DataFrame({ 'Model': ['XGBoost'], 'Standard Deviation (Pred)': [std_dev_pred], 'Standard Deviation (Observed)': [std_dev_obs], 'Correlation': [correlation], 'RMSE': [rmse], 'R2 Score': [r2] }) metrics_df

通過網格搜索對 XGBoost 模型進行超參數(shù)優(yōu)化，訓練得到最佳模型后，對測試數(shù)據(jù)進行預測，并計算和保存各種回歸評估指標，評估指標詳細解釋如下對測試集 X_test 進行預測，并計算以下評價指標：

這里演示只針對測試集做可視化，文獻中針對訓練集、驗證集、測試集均有做

from sklearn.ensemble import RandomForestRegressor # 創(chuàng)建隨機森林回歸器實例 rf_regressor = RandomForestRegressor( random_state=42, min_samples_split=2, min_samples_leaf=1, criterion='squared_error' # 對回歸來說，默認為'squared_error'，即均方誤差 ) # 定義參數(shù)網格，用于網格搜索 param_grid = { 'n_estimators': [100, 200], # 森林中樹的數(shù)量 'max_depth': [None, 10], # 每棵樹的最大深度 } # 使用GridSearchCV進行網格搜索和k折交叉驗證 grid_search_rf = GridSearchCV( estimator=rf_regressor, param_grid=param_grid, scoring='neg_mean_squared_error', # 回歸任務中常用的評價指標 cv=5, # 5折交叉驗證 n_jobs=-1, # 并行計算 verbose=1 # 輸出詳細進度信息 ) # 訓練模型 grid_search_rf.fit(X_train, y_train) # 輸出最優(yōu)參數(shù) print("Best parameters found: ", grid_search_rf.best_params_) print("Best negative mean squared error score: ", grid_search_rf.best_score_) # 使用最優(yōu)參數(shù)訓練的模型 RF = grid_search_rf.best_estimator_ # 預測 y_pred_RF = RF.predict(X_test) # 計算 RF 模型的評價指標 std_dev_pred_RF = np.std(y_pred_RF) correlation_RF = np.corrcoef(y_test, y_pred_RF)[0, 1] rmse_RF = np.sqrt(mean_squared_error(y_test, y_pred_RF)) r2_RF = r2_score(y_test, y_pred_RF) # 創(chuàng)建一個包含 RF 模型2024-10-27-公眾號Python機器學習AI評價指標的新 DataFrame new_row = pd.DataFrame({ 'Model': ['RF'], 'Standard Deviation (Pred)': [std_dev_pred_RF], 'Standard Deviation (Observed)': [std_dev_obs], # 這個是一樣的 'Correlation': [correlation_RF], 'RMSE': [rmse_RF], 'R2 Score': [r2_RF] }) # 使用 pd.concat 將新行添加到 metrics_df 中 metrics_df = pd.concat([metrics_df, new_row], ignore_index=True) metrics_df

這里所有評價指標保存到一個dataframe下方便接下來的可視化catboost

# 導入所需的庫 from catboost import CatBoostRegressor # CatBoost模型參數(shù) params_catboost = { 'learning_rate': 0.02, # 學習率，控制每一步的步長 'depth': 6, # 樹的深度，控制模型復雜度 'loss_function': 'RMSE', # 損失函數(shù)，回歸任務常用均方根誤差 'verbose': 100, # 控制 CatBoost 輸出信息的詳細程度 'random_seed': 42, # 隨機種子，用于重現(xiàn)模型的結果 'thread_count': -1, # 并行運算的線程數(shù)量 'subsample': 0.7, # 每次迭代時隨機選擇的樣本比例，用于增加模型的泛化能力 'l2_leaf_reg': 3.0 # L2正則化項的系數(shù)，用于防止過擬合 } # 初始化CatBoost回歸模型 model_catboost = CatBoostRegressor(**params_catboost) # 定義參數(shù)網格，用于網格搜索 param_grid_catboost = { 'iterations': [100, 200], # 迭代次數(shù) 'depth': [3, 4], # 樹的深度 'learning_rate': [0.01, 0.02], # 學習率 } # 使用GridSearchCV進行網格搜索和k折交叉驗證 grid_search_catboost = GridSearchCV( estimator=model_catboost, param_grid=param_grid_catboost, scoring='neg_mean_squared_error', # 評價指標為負均方誤差 cv=5, # 5折交叉驗證 n_jobs=-1, # 并行計算 verbose=1 # 輸出詳細進度信息 ) # 訓練模型 grid_search_catboost.fit(X_train, y_train) # 輸出最優(yōu)參數(shù) print("Best parameters found: ", grid_search_catboost.best_params_) print("Best RMSE score: ", (-grid_search_catboost.best_score_)**0.5) # 使用最優(yōu)參數(shù)訓練模型 catboost = grid_search_catboost.best_estimator_ # 使用最優(yōu)的CatBoost模型對測試集進行預測 y_pred_catboost = catboost.predict(X_test) # 計算CatBoost模型的評價指標 std_dev_pred_catboost = np.std(y_pred_catboost) correlation_catboost = np.corrcoef(y_test, y_pred_catboost)[0, 1] rmse_catboost = np.sqrt(mean_squared_error(y_test, y_pred_catboost)) r2_catboost = r2_score(y_test, y_pred_catboost) # 創(chuàng)建一個包含CatBoost模型評價指標的新DataFrame new_row_catboost = pd.DataFrame({ 'Model': ['CatBoost'], 'Standard Deviation (Pred)': [std_dev_pred_catboost], 'Standard Deviation (Observed)': [std_dev_obs], # 這個是一樣的 'Correlation': [correlation_catboost], 'RMSE': [rmse_catboost], 'R2 Score': [r2_catboost] }) # 使用 pd.concat 將2024-10-27-公眾號Python機器學習AI新行添加到 metrics_df 中 metrics_df = pd.concat([metrics_df, new_row_catboost], ignore_index=True) metrics_df

在可視化之前，先重點說明一下相關系數(shù)和標準差兩個重要指標：

因此，最優(yōu)的模型應該是具有較高的相關系數(shù)（接近 1）和預測標準差接近觀測標準差的模型，在接下來的可視化中，我們將利用這些指標，結合泰勒圖等工具，來直觀地展示各個模型的表現(xiàn)

初始可視化

from mpl_toolkits.axisartist import floating_axes from mpl_toolkits.axisartist.grid_finder import FixedLocator, DictFormatter from matplotlib.projections import PolarAxes def set_tayloraxes(fig, location): trans = PolarAxes.PolarTransform(apply_theta_transforms=False) r1_locs = np.hstack((np.arange(1,10)/10.0, [0.95, 0.99])) t1_locs = np.arccos(r1_locs) gl1 = FixedLocator(t1_locs) tf1 = DictFormatter(dict(zip(t1_locs, map(str, r1_locs)))) r2_locs = np.arange(0, 2, 0.25) r2_labels = ['0', '0.25', '0.50', '0.75', '1.00', '1.25', '1.50', '1.75'] gl2 = FixedLocator(r2_locs) tf2 = DictFormatter(dict(zip(r2_locs, r2_labels))) ghelper = floating_axes.GridHelperCurveLinear(trans, extremes=(0, np.pi/2, 0, 1.75), grid_locator1=gl1, tick_formatter1=tf1, grid_locator2=gl2, tick_formatter2=tf2) ax = floating_axes.FloatingSubplot(fig, location, grid_helper=ghelper) fig.add_subplot(ax) ax.axis["top"].set_axis_direction("bottom") ax.axis["top"].toggle(ticklabels=True, label=True) ax.axis["top"].major_ticklabels.set_axis_direction("top") ax.axis["top"].label.set_axis_direction("top") ax.axis["top"].label.set_text("Correlation") ax.axis["top"].label.set_fontsize(14) ax.axis["left"].set_axis_direction("bottom") ax.axis["left"].label.set_text("Standard deviation") ax.axis["left"].label.set_fontsize(14) ax.axis["right"].set_axis_direction("top") ax.axis["right"].toggle(ticklabels=True) ax.axis["right"].major_ticklabels.set_axis_direction("left") ax.axis["bottom"].set_visible(False) ax.grid(True) polar_ax = ax.get_aux_axes(trans) rs, ts = np.meshgrid(np.linspace(0, 1.75, 100), np.linspace(0, np.pi/2, 100)) rms = np.sqrt(1 + rs**2 - 2 * rs * np.cos(ts)) CS = polar_ax.contour(ts, rs, rms, colors='gray', linestyles='--') plt.clabel(CS, inline=1, fontsize=10) return polar_ax def plot_taylor(ax, std_obs, std_pred, correlation, **kwargs): theta = np.arccos(correlation) radius = std_pred / std_obs ax.plot(theta, radius, **kwargs) # 去掉 'label' 的位置參2024-10-27-公眾號Python機器學習AI數(shù)傳遞，改為僅在關鍵字參數(shù)中傳遞 fig = plt.figure(figsize=(8, 8), dpi=1200) ax = set_tayloraxes(fig, 111) # 在泰勒圖上繪制每個模型的數(shù)據(jù)點 for i, row in metrics_df.iterrows(): plot_taylor(ax, row['Standard Deviation (Observed)'], row['Standard Deviation (Pred)'], row['Correlation'], marker='o', markersize=8, label=row['Model']) # 添加圖例 ax.legend(loc='upper right', bbox_to_anchor=(1.2, 1.1)) plt.savefig("2024-10-27-公眾號Python機器學習AI——1.pdf", format='pdf',bbox_inches='tight') plt.show()

優(yōu)化可視化

在原有的泰勒圖上，加入了一條標準差為1.0的紅色虛線，這條線是一個參考線，用于快速判斷模型預測的標準差是否接近觀測值的標準差，與文獻保持一致

我們有何不同？

API服務商零注冊

多API并行試用

數(shù)據(jù)驅動選型，提升決策效率

查看全部API→

#AI文本生成大模型API

對比大模型API的內容創(chuàng)意新穎性、情感共鳴力、商業(yè)轉化潛力

25個渠道

一鍵對比試用API 限時免費

#AI深度推理大模型API

對比大模型API的邏輯推理準確性、分析深度、可視化建議合理性

10個渠道

一鍵對比試用API 限時免費

作用

代碼實現(xiàn)

初始可視化

優(yōu)化可視化

多分類模型的 SHAP 特征貢獻圖及其衍生可視化繪制

復現(xiàn) Nature 圖表——基于PCA的高維數(shù)據(jù)降維與可視化實踐及其擴展

我們有何不同？

熱門場景實測，選對API

#AI文本生成大模型API

#AI深度推理大模型API

作用

代碼實現(xiàn)

初始可視化

優(yōu)化可視化

多分類模型的 SHAP 特征貢獻圖及其衍生可視化繪制

復現(xiàn) Nature 圖表——基于PCA的高維數(shù)據(jù)降維與可視化實踐及其擴展

我們有何不同？

熱門場景實測，選對API

#AI文本生成大模型API

#AI深度推理大模型API

熱門場景實測，選對API