随机梯度下降的早期停止策略

随机梯度下降(SGD)是一种优化技术,它以随机的方式最小化损失函数,逐个样本执行梯度下降步骤。特别是,它是一种非常高效的拟合线性模型的方法。由于其随机性,损失函数在每次迭代中并不一定减少,收敛性仅在期望中得到保证。因此,监控损失函数的收敛性可能比较困难。

另一种方法是监控验证分数的收敛性。在这种情况下,输入数据被分割为训练集和验证集。然后,模型在训练集上拟合,停止准则基于在验证集上计算的预测分数。这使能够找到构建一个泛化到未见过数据的模型所需的最少迭代次数,并减少过拟合训练数据的机会。

如果设置early_stopping=True,则激活早期停止策略;否则,停止准则仅使用整个输入数据上的培训损失。为了更好地控制早期停止策略,可以指定一个参数validation_fraction,它设置了保留的输入数据集的分数,用于计算验证分数。优化将继续进行,直到验证分数在最后的n_iter_no_change次迭代中没有至少提高tol。实际的迭代次数可以在属性n_iter_中获得。

这个例子说明了如何在SGDClassifier模型中使用早期停止,以实现与没有早期停止构建的模型几乎相同的准确性。这可以显著减少训练时间。请注意,即使从早期迭代开始,不同停止准则的分数也有所不同,因为一些训练数据在验证停止准则中被保留。

代码示例

import sys import time import matplotlib.pyplot as plt import numpy as np import pandas as pd from sklearn import linear_model from sklearn.datasets import fetch_openml from sklearn.exceptions import ConvergenceWarning from sklearn.model_selection import train_test_split from sklearn.utils import shuffle from sklearn.utils._testing import ignore_warnings def load_mnist(n_samples=None, class_0="0", class_1="8"): ""“加载MNIST,选择两个类别,打乱并返回仅n_samples。”"" # 从http://openml.org/d/554加载数据 mnist = fetch_openml("mnist_784", version=1, as_frame=False) # 仅对二元分类选择两个类别 mask = np.logical_or(mnist.target == class_0, mnist.target == class_1) X, y = shuffle(mnist.data[mask], mnist.target[mask], random_state=42) if n_samples is not None: X, y = X[:n_samples], y[:n_samples] return X, y @ignore_warnings(category=ConvergenceWarning) def fit_and_score(estimator, max_iter, X_train, X_test, y_train, y_test): ""“在训练集上拟合估计器并在两组上对其进行评分”"" estimator.set_params(max_iter=max_iter) estimator.set_params(random_state=0) start = time.time() estimator.fit(X_train, y_train) fit_time = time.time() - start n_iter = estimator.n_iter_ train_score = estimator.score(X_train, y_train) test_score = estimator.score(X_test, y_test) return fit_time, n_iter, train_score, test_score # 定义要比较的估计器 estimator_dict = { "无停止准则": linear_model.SGDClassifier(n_iter_no_change=3), "训练损失": linear_model.SGDClassifier(early_stopping=False, n_iter_no_change=3, tol=0.1), "验证分数": linear_model.SGDClassifier(early_stopping=True, n_iter_no_change=3, tol=0.0001, validation_fraction=0.2), } # 加载数据集 X, y = load_mnist(n_samples=10000) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0) results = [] for estimator_name, estimator in estimator_dict.items(): print(estimator_name + ": ", end="") for max_iter in range(1, 50): print(".", end="") sys.stdout.flush() fit_time, n_iter, train_score, test_score = fit_and_score(estimator, max_iter, X_train, X_test, y_train, y_test) results.append((estimator_name, max_iter, fit_time, n_iter, train_score, test_score)) print("") # 将结果转换为pandas dataframe以便于绘图 columns = ["停止准则", "max_iter", "拟合时间(秒)", "n_iter_", "训练分数", "测试分数"] results_df = pd.DataFrame(results, columns=columns) # 定义要绘制的内容 lines = "停止准则" x_axis = "max_iter" styles = ["-.", "--", "-"] # 第一个图表:训练和测试分数 fig, axes = plt.subplots(nrows=1, ncols=2, sharey=True, figsize=(12, 4)) for ax, y_axis in zip(axes, ["训练分数", "测试分数"]): for style, (criterion, group_df) in zip(styles, results_df.groupby(lines)): group_df.plot(x=x_axis, y=y_axis, label=criterion, ax=ax, style=style) ax.set_title(y_axis) ax.legend(title=lines) fig.tight_layout() # 第二个图表:n_iter和拟合时间 fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(12, 4)) for ax, y_axis in zip(axes, ["n_iter_", "拟合时间(秒)"]): for style, (criterion, group_df) in zip(styles, results_df.groupby(lines)): group_df.plot(x=x_axis, y=y_axis, label=criterion, ax=ax, style=style) ax.set_title(y_axis) ax.legend(title=lines) fig.tight_layout() plt.show()
沪ICP备2024098111号-1
上海秋旦网络科技中心:上海市奉贤区金大公路8218号1幢 联系电话:17898875485