您現在的位置是：首頁 > 遊戲

不靠譜的預測：今年雙十一的銷量是 6213 億元

由 CDA資料分析師發表于遊戲2022-12-02

簡介fit_transform（list（df［‘年份int’］）+［［13］］）# 散點圖import plotly as pyimport plotly

true falae是什麼變數

CDA資料分析師出品

作者：曹鑫

雙十一到今年已經是13個年頭，每年大家都在滿心期待看著螢幕上的數字跳動，年年打破記錄。而 2019 年的天貓雙11的銷售額卻被一位微博網友提前7個月用資料擬合的方法預測出來了。他的預測值是2675。37或者2689。00億元，而實際成交額是2684億元。只差了5億元，誤差率只有千分之一。

但如果你用同樣的方法去做預測2020年的時候，發現，預測是3282億，實際卻到了 4982億。原來2020改了規則，實際上統計的是11月1到11日的銷量，理論上已經不能和歷史資料合併預測，但咱們就為了圖個樂，主要是為了練習一下 Python 的多項式迴歸和視覺化繪圖。

把預測先發出來：今年雙十一的銷量是 9029。688 億元！坐等雙十一，各位看官回來打我的臉。

NO。01、統計歷年雙十一銷量資料

從網上搜集來歷年淘寶天貓雙十一銷售額資料，單位為億元，利用 Pandas 整理成 Dataframe，又添加了一列‘年份int’，留作後續的計算使用。

import pandas as pd# 資料為網路收集，歷年淘寶天貓雙十一銷售額資料，單位為億元，僅做示範double11_sales = {‘2009年’：［0。50］， ‘2010年’：［9。36］， ‘2011年’：［34］， ‘2012年’：［191］， ‘2013年’：［350］， ‘2014年’：［571］， ‘2015年’：［912］， ‘2016年’：［1207］， ‘2017年’：［1682］， ‘2018年’：［2135］， ‘2019年’：［2684］， ‘2020年’：［4982］， }df = pd。DataFrame（double11_sales）。T。reset_index（）df。rename（columns={‘index’：‘年份’，0：‘銷量’}，inplace=True）df［‘年份int’］ = ［［i］ for i in list（range（1，len（df［‘年份’］）+1））］df

。dataframe tbody tr th { vertical-align： top；}。dataframe thead th { text-align： right；}

NO。02、繪製散點圖

利用 plotly 工具包，將年份對應銷售量的散點圖繪製出來，可以明顯看到2020年的資料立馬飆升。

# 散點圖import plotly as pyimport plotly。graph_objs as goimport numpy as npyear = df［：］［‘年份’］sales = df［‘銷量’］trace = go。Scatter（ x=year， y=sales， mode=‘markers’）data = ［trace］layout = go。Layout（title=‘2009年-2020年天貓淘寶雙十一歷年銷量’）fig = go。Figure（data=data， layout=layout）fig。show（）

NO。03、引入 Scikit-Learn 庫搭建模型

一元多次線性迴歸

我們先來回顧一下2009-2019年的資料多麼美妙。先只選取2009-2019年的資料：

df_2009_2019 = df［：-1］df_2009_2019

。dataframe tbody tr th { vertical-align： top；}。dataframe thead th { text-align： right；}

透過以下程式碼生成二次項資料：

from sklearn。preprocessing import PolynomialFeaturespoly_reg = PolynomialFeatures（degree=2）X_ = poly_reg。fit_transform（list（df_2009_2019［‘年份int’］））

1。第一行程式碼引入用於增加一個多次項內容的模組 PolynomialFeatures

2。第二行程式碼設定最高次項為二次項，為生成二次項資料（x平方）做準備

3。第三行程式碼將原有的X轉換為一個新的二維陣列X_，該二維資料包含新生成的二次項資料（x平方）和原有的一次項資料（x）

X_ 的內容為下方程式碼所示的一個二維陣列，其中第一列資料為常數項（其實就是X的0次方），沒有特殊含義，對分析結果不會產生影響；第二列資料為原有的一次項資料（x）；第三列資料為新生成的二次項資料（x的平方）。

array（［［ 1。， 1。， 1。］，［ 1。， 2。， 4。］，［ 1。， 3。， 9。］，［ 1。， 4。， 16。］，［ 1。， 5。， 25。］，［ 1。， 6。， 36。］，［ 1。， 7。， 49。］，［ 1。， 8。， 64。］，［ 1。， 9。， 81。］，［ 1。， 10。， 100。］，［ 1。， 11。， 121。］］）

from sklearn。linear_model import LinearRegressionregr = LinearRegression（）regr。fit（X_，list（df_2009_2019［‘銷量’］））

LinearRegression（）

1。第一行程式碼從 Scikit-Learn 庫引入線性迴歸的相關模組 LinearRegression；

2。第二行程式碼構造一個初始的線性迴歸模型並命名為 regr；

3。第三行程式碼用fit（）函式完成模型搭建，此時的regr就是一個搭建好的線性迴歸模型。

NO。04、模型預測

接下來就可以利用搭建好的模型 regr 來預測資料。加上自變數是12，那麼使用 predict（）函式就能預測對應的因變數有，程式碼如下：

XX_ = poly_reg。fit_transform（［［12］］）

XX_

array（［［ 1。， 12。， 144。］］）

y = regr。predict（XX_）y

array（［3282。23478788］）

這裡我們就得到了如果按照這個趨勢2009-2019的趨勢預測2020的結果，就是3282，但實際卻是4982億，原因就是上文提到的合併計算了，金額一下子變大了，繪製成圖，就是下面這樣：

# 散點圖import plotly as pyimport plotly。graph_objs as goimport numpy as npyear = list（df［‘年份’］）sales = df［‘銷量’］trace1 = go。Scatter（ x=year， y=sales， mode=‘markers’， name=“實際銷量” # 第一個圖例名稱）XX_ = poly_reg。fit_transform（list（df［‘年份int’］）+［［13］］）regr = LinearRegression（）regr。fit（X_，list（df_2009_2019［‘銷量’］））trace2 = go。Scatter（ x=list（df［‘年份’］）， y=regr。predict（XX_）， mode=‘lines’， name=“擬合數據”， # 第2個圖例名稱）data = ［trace1，trace2］layout = go。Layout（title=‘天貓淘寶雙十一歷年銷量’， xaxis_title=‘年份’， yaxis_title=‘銷量’）fig = go。Figure（data=data， layout=layout）fig。show（）

var gd = document。getElementById（‘e8ae9262-7d14-4b38-b661-fb79f13ff6a7’）；var x = new MutationObserver（function （mutations， observer） {{ var display = window。getComputedStyle（gd）。display； if （！display || display === ‘none’） {{ console。log（［gd， ‘removed！’］）； Plotly。purge（gd）； observer。disconnect（）； }}}}）；// Listen for the removal of the full notebook cellsvar notebookContainer = gd。closest（‘#notebook-container’）；if （notebookContainer） {{ x。observe（notebookContainer， {childList： true}）；}}// Listen for the clearing of the current output cellvar outputEl = gd。closest（‘。output’）；if （outputEl） {{ x。observe（outputEl， {childList： true}）；}}

}） }； }）；

NO。05、預測2021年的銷量

既然資料發生了巨大的偏離，咱們也別深究了，就大力出奇跡。同樣的方法，把2020年的真實資料納入進來，二話不說擬合一樣，看看會得到什麼結果：

from sklearn。preprocessing import PolynomialFeaturespoly_reg = PolynomialFeatures（degree=5）X_ = poly_reg。fit_transform（list（df［‘年份int’］））

## 預測2020年regr = LinearRegression（）regr。fit（X_，list（df［‘銷量’］））

LinearRegression（）

XXX_ = poly_reg。fit_transform（list（df［‘年份int’］）+［［13］］）

# 散點圖import plotly as pyimport plotly。graph_objs as goimport numpy as npyear = list（df［‘年份’］）sales = df［‘銷量’］trace1 = go。Scatter（ x=year+［‘2021年’，‘2022年’，‘2023年’］， y=sales， mode=‘markers’， name=“實際銷量” # 第一個圖例名稱）trace2 = go。Scatter（ x=year+［‘2021年’，‘2022年’，‘2023年’］， y=regr。predict（XXX_）， mode=‘lines’， name=“預測銷量” # 第一個圖例名稱）trace3 = go。Scatter（ x=［‘2021年’］， y=［regr。predict（XXX_）［-1］］， mode=‘markers’， name=“2021年預測銷量” # 第一個圖例名稱）data = ［trace1，trace2，trace3］layout = go。Layout（title=‘天貓淘寶雙十一歷年銷量’， xaxis_title=‘年份’， yaxis_title=‘銷量’）fig = go。Figure（data=data， layout=layout）fig。show（）

NO。06、多項式預測的次數到底如何選擇

在選擇模型中的次數方面，可以透過設定程式，迴圈計算各個次數下預測誤差，然後再根據結果反選引數。

df_new = df。copy（）df_new［‘年份int’］ = df［‘年份int’］。apply（lambda x： x［0］）df_new

。dataframe tbody tr th { vertical-align： top；}。dataframe thead th { text-align： right；}

# 多項式迴歸預測次數選擇# 計算 m 次多項式迴歸預測結果的 MSE 評價指標並繪圖from sklearn。pipeline import make_pipelinefrom sklearn。metrics import mean_squared_errortrain_df = df_new［：int（len（df）*0。95）］test_df = df_new［int（len（df）*0。5）：］# 定義訓練和測試使用的自變數和因變數train_x = train_df［‘年份int’］。valuestrain_y = train_df［‘銷量’］。values# print（train_x）test_x = test_df［‘年份int’］。valuestest_y = test_df［‘銷量’］。valuestrain_x = train_x。reshape（len（train_x），1）test_x = test_x。reshape（len（test_x），1）train_y = train_y。reshape（len（train_y），1）mse = ［］ # 用於儲存各最高次多項式 MSE 值m = 1 # 初始 m 值m_max = 10 # 設定最高次數while m <= m_max： model = make_pipeline（PolynomialFeatures（m， include_bias=False）， LinearRegression（）） model。fit（train_x， train_y） # 訓練模型 pre_y = model。predict（test_x） # 測試模型 mse。append（mean_squared_error（test_y， pre_y。flatten（））） # 計算 MSE m = m + 1print（“MSE 計算結果： ”， mse）# 繪圖plt。plot（［i for i in range（1， m_max + 1）］， mse， ‘r’）plt。scatter（［i for i in range（1， m_max + 1）］， mse）# 繪製圖名稱等plt。title（“MSE of m degree of polynomial regression”）plt。xlabel（“m”）plt。ylabel（“MSE”）

MSE 計算結果：［1088092。9621201046， 481951。27857828484， 478840。8575107471， 477235。9140442428， 484657。87153138855， 509758。1526412842， 344204。1969956556， 429874。9229308078， 8281846。231771571， 146298201。8473966］

Text（0， 0。5， ‘MSE’）

從誤差結果可以看到，次數取2到8誤差基本穩定，沒有明顯的減少了，但其實你試試就知道，次數選擇3的時候，預測的銷量是6213億元，次數選擇5的時候，預測的銷量是9029億元，對於銷售量來說，這個範圍已經夠大的了。我也就斗膽猜到9029億元，我的膽量也就預測到這裡了，破萬億就太誇張了，歡迎膽子大的同學留下你們的預測結果，讓我們11月11日，拭目以待吧。

NO。07、總結最後

希望這篇文章帶著對 Python 的多項式迴歸和 Plotly視覺化繪圖還不熟悉的同學一起練習一下。

本文出品：CDA資料分析師

上一篇：營養師提示：吃柿子好處多，但有5點“忌諱”需要注意

下一篇：紫金礦業(02899)以7000萬元收購洛陽實業所持有的洛陽銀輝30%股權

您現在的位置是：首頁 > 遊戲

不靠譜的預測：今年雙十一的銷量是 6213 億元

相關文章

推薦文章