机器学习模型部署与Django应用

机器学习代码

首先,来理解机器学习代码。首先导入所需的库:

import pandas as pd import matplotlib.pyplot as plt import numpy as np import seaborn as sns import pickle

接下来,导入数据,并将训练和测试数据合并,以便直接对整个数据进行预处理。复制输出列到另一个变量,然后从数据中删除该列。

import pandas as pd train_data = pd.read_csv('train_.csv') test_data = pd.read_csv('test_.csv') data = train_data.append(test_data, ignore_index=True) y = data['is_promoted'] data = data.drop(['is_promoted'],axis = 1) print(data)

现在对整个数据进行预处理。查看以下代码。

dept_counts = data['department'].value_counts() region_count = data['region'].value_counts() region_data = data['region'].str.replace("[a-zA-Z_]","") data['region']= data['region'].str.replace("[a-zA-Z_]","") region = data['region'].astype(int) region = region.astype(int) data = pd.get_dummies(data, columns=['gender']) data = data.drop(['gender_f'],axis = 1) data = pd.get_dummies(data, columns=['education']) data = data.drop(['education_Below Secondary'],axis = 1) data = pd.get_dummies(data, columns=['recruitment_channel']) data = data.drop(['recruitment_channel_referred'],axis = 1) from sklearn.preprocessing import LabelBinarizer lb_style = LabelBinarizer() lb = lb_style.fit_transform(data["department"]) data['previous_year_rating'] = data['previous_year_rating'].fillna(data['previous_year_rating'].median()) data = data.drop(['department'],axis = 1) d1 =data.insert(1,'Region',region) data = data.drop(['region'],axis = 1) d = data count_ofall_nan = data.isna().sum() X= data.iloc[:,0:14].values X= np.hstack((X,lb)) count_ = np.isnan(np.sum(lb)) data = data.astype(np.int64)

如果对解决机器学习问题有一定的了解,会很容易理解预处理部分。所做的基本上是将分类变量转换为数值,并用中位数或均值填充NaN值。在这里,使用了中位数。Pandas有一个get_dummies函数,可以为完成编码部分。还有来自sklearn的labelbinarizer。在这里做了一些基本的预处理,需要仔细研究数据集,并可以使用更好的技术来提高准确性。

现在已经完成了预处理,让将数据集重新划分为训练和测试数据。同时,将输出列重新添加到训练变量中,因为将需要它来让模型学习。将测试数据保存为.csv文件。为此...

#divide into train and test train = X[:length of train data,:] test = X[length of train data:,:] test.to_csv('test_preprocessed.csv')

接下来,使用不同的模型并将它们拟合到训练数据中。在这里,只是使用了3个模型,可以尝试不同的模型并调整它们,这将给带来最大的准确性。现在需要保存模型,因为将使用Django从网站预测输出。为了保存模型,使用pickle,然后使用dump函数保存模型。

from sklearn.naive_bayes import GaussianNB nb = GaussianNB() nb.fit(X, y) pickle.dump(nb, open('gNB.sav','wb')) #random forest classifier from sklearn.ensemble import RandomForestClassifier random = RandomForestClassifier(n_estimators=100) random.fit(X,y) pickle.dump(random, open('random_forest.sav','wb')) from sklearn.naive_bayes import MultinomialNB classifier_multi = MultinomialNB() classifier_multi.fit(X, y) pickle.dump(classifier_multi, open('classifier_multi_NB.sav','wb'))

Django应用

现在已经使用pickle保存了模型,让进入Django来从网站预测值。在前端,将有三个按钮在表单标签中,它们将与Django交互。表单动作指向链接‘download’,稍后会看到。以下是那部分代码。

<form action="download" method="POST">    {% csrf_token %}    <input type="submit"        name="gNB"        value="Gaussian Naive Bayes" class="btn btn-success">    <input type="submit"        name="multiNB"        value="Multinomial Naive Bayes" class="btn btn-success">    <input type="submit"        name="rf"        value="Random Forest" class="btn btn-success"> </form>

接下来,转到views.py文件,首先导入测试数据,以便可以使用它。

test_data_preprocessed = pd.read_csv('test_preprocessed.csv') test_data_preprocessed = test_data_preprocessed.drop(['Unnamed: 0'],axis =1) test_data_preprocessed = test_data_preprocessed.iloc[:,:].values

在views.py文件中创建一个名为home的函数,以便可以看到3个按钮以及其他所有HTML内容的网站。

def home(request): return render(request,"index.html")

在urls.py文件中添加以下代码。

urlpatterns = [ path('',views.home, name = 'home') ]

现在,来处理按钮的功能。再次在views.py文件中,将创建一个名为models的函数。在上面的HTML文件中,已经命名了按钮(粗体文本)。在这里,将使用这些名称来了解用户点击了哪个按钮,然后它将根据该模型预测值。查看以下代码。

def models(request): if 'gNB' in request.POST: gaussian = pickle.load(open('gNB.sav','rb')) y_pred = gaussian.predict(test_data_preprocessed) output = pd.DataFrame(y_pred) output.to_csv('gaussianNB.csv') filename = 'gaussianNB.csv' response = HttpResponse(open(filename, 'rb').read(), content_type='text/csv') response['Content-Length'] = os.path.getsize(filename) response['Content-Disposition'] = 'attachment; filename=%s' % 'gaussianNB.csv' return response if 'multiNB' in request.POST: multi = pickle.load(open('classifier_multi_NB.sav','rb')) y_pred = multi.predict(test_data_preprocessed) output = pd.DataFrame(y_pred) output.to_csv('multi_NB.csv') filename = 'multi_NB.csv' response = HttpResponse(open(filename, 'rb').read(), content_type='text/csv') response['Content-Length'] = os.path.getsize(filename) response['Content-Disposition'] = 'attachment; filename=%s' % 'multi_NB.csv' return response if 'rf' in request.POST: rf = pickle.load(open('random_forest.sav','rb')) y_pred = rf.predict(test_data_preprocessed) output = pd.DataFrame(y_pred) output.to_csv('rf.csv') filename = 'rf.csv' response = HttpResponse(open(filename, 'rb').read(), content_type='text/csv') response['Content-Length'] = os.path.getsize(filename) response['Content-Disposition'] = 'attachment; filename=%s' % 'rf.csv' return response

如果语句将检查按钮名称,然后加载之前导入的测试数据。之后,使用predict函数来预测值。将其转换为dataframe,然后创建一个CSV文件。但主要任务是下载文件,因此,在Django中有一个HTTP响应,它将文件发送到浏览器,以便用户可以将其作为附件下载。这就是下载预测文件的方式。

urlpatterns = [ path('',views.home, name = 'home'), path('download',views.models) ]
沪ICP备2024098111号-1
上海秋旦网络科技中心:上海市奉贤区金大公路8218号1幢 联系电话:17898875485