生成高斯数据簇

在机器学习领域，生成合成数据集是测试和验证算法性能的一个重要步骤。scikit-learn库提供了一个名为make_blobs的函数，它可以用来生成具有高斯分布的簇数据。这个函数非常灵活，允许用户指定样本数量、特征数量、簇中心数量等参数。

以下是make_blobs函数的一些关键参数及其说明：


                n_samples: int或array-like，默认值为100。如果为int，则表示所有簇中样本的总数，平均分配到各个簇中。如果为array-like，则每个元素表示每个簇中的样本数量。在版本v0.20中，可以传递array-like到n_samples参数。


                n_features: int，默认值为2。每个样本的特征数量。


                centers: int或array-like of shape (n_centers, n_features)，默认值为None。要生成的中心数量，或者是固定的中心位置。如果n_samples是int且centers是None，则生成3个中心。如果n_samples是array-like，则centers必须是None或者是长度等于n_samples的数组。


                cluster_std: float或array-like of float，默认值为1.0。簇的标准差。


                center_box: tuple of float (min, max)，默认值为(-10.0, 10.0)。当中心随机生成时，每个簇中心的边界框。


                shuffle: bool，默认值为True。是否打乱样本。


                random_state: int, RandomState instance或None，默认值为None。确定数据集创建的随机数生成。传递一个int以在多次函数调用之间获得可重现的输出。参见术语表。


                return_centers: bool，默认值为False。如果为True，则返回每个簇的中心。在版本0.23中添加。

函数返回值：


                X: ndarray of shape (n_samples, n_features)。生成的样本。


                y: ndarray of shape (n_samples,)。每个样本的簇成员身份的整数标签。


                centers: ndarray of shape (n_centers, n_features)。每个簇的中心。仅当return_centers=True时返回。

下面是一个使用make_blobs函数生成数据簇的示例代码：


                from sklearn.datasets import make_blobs
                X, y = make_blobs(n_samples=10, centers=3, n_features=2, random_state=0)
                print(X.shape)  # 输出: (10, 2)
                print(y)  # 输出: array([0, 0, 1, 0, 2, 2, 2, 1, 1, 0])


                X, y = make_blobs(n_samples=[3, 3, 4], centers=None, n_features=2, random_state=0)
                print(X.shape)  # 输出: (10, 2)
                print(y)  # 输出: array([0, 1, 2, 0, 2, 2, 2, 1, 1, 0])

生成高斯数据簇

生成双聚类结构数组

生成随机分类问题

沪ICP备2024098111号-1

上海秋旦网络科技中心：上海市奉贤区金大公路8218号1幢联系电话：17898875485

生成高斯数据簇

生成双聚类结构数组

生成随机分类问题

沪ICP备2024098111号-1

上海秋旦网络科技中心：上海市奉贤区金大公路8218号1幢 联系电话：17898875485

上海秋旦网络科技中心：上海市奉贤区金大公路8218号1幢联系电话：17898875485