# 功能

``````# Load the classification data set

# Specify the features of interest and the classes of the target
features = ["temperature", "relative humidity", "light", "C02", "humidity"]
classes = ["unoccupied", "occupied"]

# Extract the instances and target
X = data[features]
y = data.occupancy

# Import the visualizer

# Instantiate the visualizer

visualizer.fit(X, y)      # Fit the data to the visualizer
visualizer.transform(X)   # Transform the data
visualizer.poof()         # Draw/show/poof the data``````

### 一维排序 Rank 1D

``````from yellowbrick.features import Rank1D

# Instantiate the 1D visualizer with the Sharpiro ranking algorithm
visualizer = Rank1D(features=features, algorithm='shapiro')

visualizer.fit(X, y)                # Fit the data to the visualizer
visualizer.transform(X)             # Transform the data
visualizer.poof()                   # Draw/show/poof the data``````

### PCA Projection

PCA分解可视化利用主成分分析将高维数据分解为二维或三维，以便可以在散点图中绘制每个实例。PCA的使用意味着可以沿主要变化轴分析投影数据集，并且可以解释该数据集以确定是否可以利用球面距离度量。

### 双重图 Biplot

PCA投影可以增强到双点，其点是投影实例，其矢量表示高维空间中数据的结构。通过使用proj_features = True标志，数据集中每个要素的向量将在散点图上以该要素的最大方差方向绘制。这些结构可用于分析特征对分解的重要性或查找相关方差的特征以供进一步分析。

``````# Load the classification data set

# Specify the features of interest and the target
target = "strength"
features = [
'cement', 'slag', 'ash', 'water', 'splast', 'coarse', 'fine', 'age'
]

# Extract the instance data and the target
X = data[features]
y = data[target]

visualizer.fit_transform(X, y)
visualizer.poof()``````

### 特征重要性 Feature Importance

``````import matplotlib.pyplot as plt

from yellowbrick.features.importances import FeatureImportances

# Create a new matplotlib figure
fig = plt.figure()

viz.fit(X, y)
viz.poof()``````

### 递归特征消除 Recursive Feature Elimination

RFE需要保留指定数量的特征，但事先通常不知道有多少特征有效。为了找到最佳数量的特征，交叉验证与RFE一起用于对不同的特征子集进行评分，并选择最佳评分特征集合。RFECV可视化绘制模型中的特征数量以及它们的交叉验证测试分数和可变性，并可视化所选数量的特征。

``````from sklearn.svm import SVC
from sklearn.datasets import make_classification

from yellowbrick.features import RFECV

# Create a dataset with only 3 informative features
X, y = make_classification(
n_samples=1000, n_features=25, n_informative=3, n_redundant=2,
n_repeated=0, n_classes=8, n_clusters_per_class=1, random_state=0
)

# Create RFECV visualizer with linear SVM classifier
viz = RFECV(SVC(kernel='linear', C=1))
viz.fit(X, y)
viz.poof()``````

``````from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold

target = 'default'
features = [col for col in data.columns if col != target]

X = data[features]
y = data[target]

cv = StratifiedKFold(5)
oz = RFECV(RandomForestClassifier(), cv=cv, scoring='f1_weighted')

oz.fit(X, y)
oz.poof()``````

### 残差图 Residuals Plot

``````from sklearn.linear_model import Ridge
from yellowbrick.regressor import ResidualsPlot

# Instantiate the linear model and visualizer
ridge = Ridge()
visualizer = ResidualsPlot(ridge)

visualizer.fit(X_train, y_train)  # Fit the training data to the model
visualizer.score(X_test, y_test)  # Evaluate the model on the test data
visualizer.poof()                 # Draw/show/poof the data``````

### 正则化 Alpha Selection

``````import numpy as np

from sklearn.linear_model import LassoCV
from yellowbrick.regressor import AlphaSelection

# Create a list of alphas to cross-validate against
alphas = np.logspace(-10, 1, 400)

# Instantiate the linear model and visualizer
model = LassoCV(alphas=alphas)
visualizer = AlphaSelection(model)

visualizer.fit(X, y)
g = visualizer.poof()``````

### 分类预测误差 Class Prediction Error

``````from sklearn.ensemble import RandomForestClassifier

from yellowbrick.classifier import ClassPredictionError

# Instantiate the classification model and visualizer
visualizer = ClassPredictionError(
RandomForestClassifier(), classes=classes
)

# Fit the training data to the visualizer
visualizer.fit(X_train, y_train)

# Evaluate the model on the test data
visualizer.score(X_test, y_test)

# Draw visualization
g = visualizer.poof()``````

### 二分类辨别阈值 Discrimination Threshold

``````from sklearn.linear_model import LogisticRegression
from yellowbrick.classifier import DiscriminationThreshold

# Instantiate the classification model and visualizer
logistic = LogisticRegression()
visualizer = DiscriminationThreshold(logistic)

visualizer.fit(X, y)  # Fit the training data to the visualizer
visualizer.poof()     # Draw/show/poof the data``````

### 聚类肘部法则 Elbow Method

KElbowVisualizer实现了“肘部”法则，通过使模型具有K的一系列值来帮助数据科学家选择最佳簇数。如果折线图类似于手臂，那么“肘”（拐点）就是曲线）是一个很好的迹象，表明基础模型最适合那一点。

``````from sklearn.datasets import make_blobs

# Create synthetic dataset with 8 random clusters
X, y = make_blobs(centers=8, n_features=12, shuffle=True, random_state=42)

from sklearn.cluster import KMeans
from yellowbrick.cluster import KElbowVisualizer

# Instantiate the clustering model and visualizer
model = KMeans()
visualizer = KElbowVisualizer(model, k=(4,12))

visualizer.fit(X)    # Fit the data to the visualizer
visualizer.poof()    # Draw/show/poof the data``````

### 集群间距离图 Intercluster Distance Maps

``````from sklearn.datasets import make_blobs

# Make 12 blobs dataset
X, y = make_blobs(centers=12, n_samples=1000, n_features=16, shuffle=True)

from sklearn.cluster import KMeans
from yellowbrick.cluster import InterclusterDistance

# Instantiate the clustering model and visualizer
visualizer = InterclusterDistance(KMeans(9))

visualizer.fit(X) # Fit the training data to the visualizer
visualizer.poof() # Draw/show/poof the data``````

### 模型选择-学习曲线 Learning Curve

1. 模型会不会随着数据量增多而效果变好
2. 模型对偏差和方差哪个更加敏感

### 模型选择-验证曲线 Validation Curve

``````import numpy as np

from sklearn.tree import DecisionTreeRegressor
from yellowbrick.model_selection import ValidationCurve

# Specify features of interest and the target
features = [col for col in data.columns if col not in targets]

# Extract the instances and target
X = data[features]
y = data[targets[0]]

viz = ValidationCurve(
DecisionTreeRegressor(), param_name="max_depth",
param_range=np.arange(1, 11), cv=10, scoring="r2"
)

# Fit and poof the visualizer
viz.fit(X, y)
viz.poof()``````

# 总结

https://www.scikit-yb.org/en/...

2798 人关注
50 篇文章