流形学习技术在球面数据集上的应用

在处理球面数据集时，可以通过降维技术来获得直观的理解。这种方法允许在保持原始高维空间中的距离的同时，将数据集展开并投影到二维空间。例如，通过从球面上切除两极和侧面的薄片，流形学习技术能够将数据集展开。对于类似的S曲线数据集，可以参考流形学习方法的比较。

多维尺度分析（MDS）的目的是找到一个低维（这里是二维）的数据表示，使得这些距离能够很好地反映原始高维空间中的距离。与其他流形学习算法不同，MDS并不寻求在低维空间中对数据进行各向同性的表示。这里的流形问题与地球平面投影的标准相似：0.064秒；ltsa：0.1秒；Hessian：0.17秒；修改后：0.11秒；ISO：0.2秒；MDS：0.78秒；谱嵌入：0.046秒；t-SNE：3.7秒。

Jaques Grobler（jaques.grobler@inria.fr）使用BSD 3条款许可证，通过以下代码展示了流形学习技术的应用。首先，导入了必要的库，包括time、matplotlib.pyplot、numpy等，并设置了流形学习的变量，如邻居数量和样本数量。然后，创建了一个球体，并从球体上切除了两极。接下来，使用matplotlib绘制了数据集，并执行了局部线性嵌入流形学习。


from time import time
import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d  # noqa: F401
import numpy as np
from matplotlib.ticker import NullFormatter
from sklearn import manifold
from sklearn.utils import check_random_state

# 流形学习的变量
n_neighbors = 10
n_samples = 1000

# 创建球体
random_state = check_random_state(0)
p = random_state.rand(n_samples) * (2 * np.pi - 0.55)
t = random_state.rand(n_samples) * np.pi

# 从球体上切除两极
indices = (t < (np.pi - (np.pi / 8))) & (t > (np.pi / 8))
colors = p[indices]
x, y, z = (np.sin(t[indices]) * np.cos(p[indices]),
           np.sin(t[indices]) * np.sin(p[indices]),
           np.cos(t[indices]))

# 绘制数据集
fig = plt.figure(figsize=(15, 8))
plt.suptitle("流形学习，%i个点，%i个邻居" % (1000, n_neighbors), fontsize=14)
ax = fig.add_subplot(251, projection="3d")
ax.scatter(x, y, z, c=colors, cmap=plt.cm.rainbow)
ax.view_init(40, -10)

sphere_data = np.array([x, y, z]).T

# 执行局部线性嵌入流形学习
methods = ["standard", "ltsa", "hessian", "modified"]
labels = ["LLE", "LTSA", "Hessian LLE", "Modified LLE"]
for i, method in enumerate(methods):
    t0 = time()
    trans_data = (manifold.LocallyLinearEmbedding(n_neighbors=n_neighbors, n_components=2, method=method, random_state=42).fit_transform(sphere_data).T)
    t1 = time()
    print("%s: %.2g sec" % (methods[i], t1 - t0))
    ax = fig.add_subplot(252 + i)
    plt.scatter(trans_data[0], trans_data[1], c=colors, cmap=plt.cm.rainbow)
    plt.title("%s (%.2g sec)" % (labels[i], t1 - t0))
    ax.xaxis.set_major_formatter(NullFormatter())
    ax.yaxis.set_major_formatter(NullFormatter())
    plt.axis("tight")

# 执行等距映射流形学习
t0 = time()
trans_data = (manifold.Isomap(n_neighbors=n_neighbors, n_components=2).fit_transform(sphere_data).T)
t1 = time()
print("ISO: %.2g sec" % (t1 - t0))
ax = fig.add_subplot(257)
plt.scatter(trans_data[0], trans_data[1], c=colors, cmap=plt.cm.rainbow)
plt.title("Isomap (%.2g sec)" % (t1 - t0))
ax.xaxis.set_major_formatter(NullFormatter())
ax.yaxis.set_major_formatter(NullFormatter())
plt.axis("tight")

# 执行多维尺度分析
t0 = time()
mds = manifold.MDS(2, max_iter=100, n_init=1, random_state=42)
trans_data = mds.fit_transform(sphere_data).T
t1 = time()
print("MDS: %.2g sec" % (t1 - t0))
ax = fig.add_subplot(258)
plt.scatter(trans_data[0], trans_data[1], c=colors, cmap=plt.cm.rainbow)
plt.title("MDS (%.2g sec)" % (t1 - t0))
ax.xaxis.set_major_formatter(NullFormatter())
ax.yaxis.set_major_formatter(NullFormatter())
plt.axis("tight")

# 执行谱嵌入
t0 = time()
se = manifold.SpectralEmbedding(n_components=2, n_neighbors=n_neighbors, random_state=42)
trans_data = se.fit_transform(sphere_data).T
t1 = time()
print("谱嵌入: %.2g sec" % (t1 - t0))
ax = fig.add_subplot(259)
plt.scatter(trans_data[0], trans_data[1], c=colors, cmap=plt.cm.rainbow)
plt.title("谱嵌入 (%.2g sec)" % (t1 - t0))
ax.xaxis.set_major_formatter(NullFormatter())
ax.yaxis.set_major_formatter(NullFormatter())
plt.axis("tight")

# 执行t分布随机邻域嵌入
t0 = time()
tsne = manifold.TSNE(n_components=2, random_state=0)
trans_data = tsne.fit_transform(sphere_data).T
t1 = time()
print("t-SNE: %.2g sec" % (t1 - t0))
ax = fig.add_subplot(2, 5, 10)
plt.scatter(trans_data[0], trans_data[1], c=colors, cmap=plt.cm.rainbow)
plt.title("t-SNE (%.2g sec)" % (t1 - t0))
ax.xaxis.set_major_formatter(NullFormatter())
ax.yaxis.set_major_formatter(NullFormatter())
plt.axis("tight")
plt.show()

脚本的总运行时间为：0分钟5.743秒。

数字数据集嵌入技术比较

本文介绍了如何使用不同的嵌入技术对数字数据集进行处理，并比较了各种方法的效果。

多维尺度分析与非度量多维尺度分析

本文介绍了如何使用Python进行多维尺度分析（MDS）和非度量多维尺度分析（NMDS），并通过代码示例展示了两种方法在数据可视化中的应用。

流形学习技术在球面数据集上的应用

数字数据集嵌入技术比较

多维尺度分析与非度量多维尺度分析

沪ICP备2024098111号-1

上海秋旦网络科技中心：上海市奉贤区金大公路8218号1幢联系电话：17898875485

流形学习技术在球面数据集上的应用

数字数据集嵌入技术比较

多维尺度分析与非度量多维尺度分析

沪ICP备2024098111号-1

上海秋旦网络科技中心：上海市奉贤区金大公路8218号1幢 联系电话：17898875485

上海秋旦网络科技中心：上海市奉贤区金大公路8218号1幢联系电话：17898875485