用python+sklearn(机器学习)实现天气预测

我们访问上一步的网站，可以发现有选择城市和时间的接口(PS:右上角的链接可以切换成英文 ) 接下来我们选择我们要做天气预报的城市和顺便一个时间点，如广州(PS:在选择框直接输入城市的前几个字符可以快速索引 ) 然后点按钮，我们会跳转到一个网页这个就是我们要取到的数据了的地址，不过时间不对，接下来解析这个网址的规则，如

http://www.meteomanz.com/sy2?l=1&cou=2250&ind=59287&d1=13&m1=12&y1=2020&d2=13&m2=12&y2=2020

这里面

?前的分别是http://网络协议,www.meteomanz.com域名，/sy2地址，我们主要关注?后的参数
l是语言参数，l=1指英语
cou和ind是地区和城市代码
d1，m1，y1是时间段的开始年月日
d2，m2，y2是时间段的结束年月日

所以

http://www.meteomanz.com/sy2?l=1&cou=2250&ind=59287&d1=02&m1=02&y1=2019&d2=13&m2=12&y2=2020

指从2019/2/2到2020/12/13的广州的每日天气数据，不要忘记里面月和日是要补0的，但是进去就发现最多给30天的数据，所以我们以后并不会用到怎么多天的数据

4.分析页面规则

获取到网址规则后，分析页面规则用于爬虫匹配数据，按F12打开开发者工具 DEV 如图，我们可以发现rbody里面的第二个tr标签里面的每个td里是数据(PS:用右上角的左边第一个小按钮然后点击那些数据 )。在这里插入图片描述

当我们知道了这些规则，接下来就可以写爬虫爬取符合条件的数据集了。

勘误表

感谢”Gbilibili”的提醒，下面url代码生成片段应从

1
# 爬取数据链接
2
   url = "http://www.meteomanz.com/sy2?l=1&cou=2250&ind=59287&d1=" +
3
   str(week_ago.day).zfill(2) +
4
   "&m1=" + str(week_ago.month).zfill(2) +
5
   "&y1=" + str(week_ago.month) +
6
   "&d2=" + str(week_pre.day - years[0]).zfill(2) +
7
   "&m2=" + str(week_pre.month).zfill(2) +
8
   "&y2=" + str(week_pre.year - years[1])

改成

1
 # 爬取数据链接
2
url = "http://www.meteomanz.com/sy2?l=1&cou=2250&ind=59287&d1=" +
3
    str(week_ago.day).zfill(2) +
4
    "&m1=" + str( week_ago.month).zfill(2) +
5
   "&y1=" + str(week_ago.year - years[0]) +
6
  "&d2=" + str(week_pre.day).zfill(2) +
7
  "&m2=" + str(week_pre.month).zfill(2) +
8
    "&y2=" + str(week_pre.year - years[1])

感谢@L-Zzxnn的提醒，代码已添加负温度支持，最新代码请以github上为准

5.爬虫

爬虫这方面可以参考我之前的一篇文章

5.1确认要被爬取的网页网址

首先我们主要要爬取去年今日的半个月前到去年今日，而根据上一篇我们得出的网址规则，我们可以得到(PS:真正的链接里是没有换行的 )

http://www.meteomanz.com/sy2?l=1&cou=2250&ind=59287
&d1=去年今日的半个月前的日
&m1=去年今日的半个月前的月份
&y1=去年年份
&d2=今天的日期的日
&m2=今天的日期的月份
&y2=今年年份

而为什么是取去年和时间要半个月呢？因为去年的天气环境相比于前年或者更久之前是和我们现在的天气条件更相似的，可以减少误差，半个月而不是一个星期是因为使用多的数据量可以减少误差，不是一个月而是因为网站的限制，而且在实验中也会增加少量的误差。所以最终取用了去年和半个月的时间。

如果我们是只测今天这一次上面的网址就可以人工填写，但是如果我们要做不用人工填就要用datetime这个python库如下:

1
import datetime as DT
2

3
# 取现在日期
4
today = DT.datetime.now()
5
# 取b[0]天前日期
6
week_ago = (today - DT.timedelta(days=b[0])).date()
7
 # b[1]天后
8
 week_pre = (today + DT.timedelta(days=b[1])).date()

我们传入b = [-15 0]，就可以获取上个半月的日期在week_ago里，今天的日期在week_pre里所以，可以用这一行构建需要的网址

1
 # 爬取数据链接
2
 url = "http://www.meteomanz.com/sy2?l=1&cou=2250&ind=59287&d1=" +
3
   str(week_ago.day).zfill(2) +
4
   "&m1=" + str( week_ago.month).zfill(2) +
5
  "&y1=" + str(week_ago.year - years[0]) +
6
   "&d2=" + str(week_pre.day).zfill(2) +
7
   "&m2=" + str(week_pre.month).zfill(2) +
8
   "&y2=" + str(week_pre.year - years[1])

其中.zfill(2)是指填充2位，比如如果是1就返回01，如果是12就返回12 有了网址，接下来就是爬虫爬取网页然后分析网页元素取出里面的数据

5.2 爬虫部分

首先先写爬虫部分，这部分很简单，写了个GetData class

1
# -*- coding: utf-8 -*-
2
# @Time: 2020/12/16
3
# @Author: Eritque arcus
4
# 功能: 爬取数据
5
import urllib3
6

7

8
class GetData:
9
    url = ""
10
    headers = ""
11

12
    def __init__(self, url, header=""):
13
        """
14
        :param url: 获取的网址
15
        :param header: 请求头，默认已内置
16
        """
17
        self.url = url
18
        if header == "":
19
            self.headers = {
20
                'Connection': 'Keep-Alive',
21
                'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,'
22
                          '*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
23
                'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
24
                'Accept-Encoding': 'gzip, deflate',
25
                'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, '
26
                              'like Gecko) Chrome/87.0.4280.66 Mobile Safari/537.36 ',
27
                'Host': 'www.meteomanz.com'
28
            }
29
        else:
30
            self.headers = header
31

32
    def Get(self):
33
        """
34
        :return: 网址对应的网页内容
35
        """
36
        http = urllib3.PoolManager()
37
        return http.request('GET', self.url, headers=self.headers).data

本处用了urllib3库和GET方式，其中headers是申请头，这部分可以在按F12调出开发者工具，在Network那一栏，点击任意一个事件，往下滑就有了，可以用我的也可以。请求头主要是http协议里的东西，想要了解可以自行搜索。

5.3 网页内容匹配取出部分

本处使用了BeautifulSoup库

1
    g = GetData(url).Get()
2
        # beautifulsoup解析网页
3
        soup = BeautifulSoup(g, "html5lib")
4
        # 取<tbody>内容
5
        tb = soup.find(name="tbody")
6
        # 取tr内容
7
        past_tr = tb.find_all(name="tr")
8
        for tr in past_tr:
9
            # 取tr内每个td的内容
10
            text = tr.find_all(name="td")
11
            flag = False
12
            for i in range(0, len(text)):
13
                if i == 0:
14
                    text[i] = text[i].a.string
15
                    # 网站bug，跨月请求的话会给每个月第0天的数据，但是里面是全空的因为日期不存在，比如 00/11/2020(日/月/年),所以要手动drop掉这个数据
16
                    if "00/" in text[i]:
17
                        flag = True
18
                elif i == 8:
19
                    # 把/8去掉，网页显示的格式问题
20
                    text[i] = text[i].string.replace("/8", "")
21
                elif i == 5:
22
                    # 去掉Hpa单位
23
                    text[i] = text[i].string.replace(" Hpa", "")
24
                elif i == 6:
25
                    # 用正则去掉风力里括号内的内容
26
                    text[i] = re.sub(u"[º(.*?|N|W|E|S)]", "", text[i].string)
27
                else:
28
                    # 取每个元素的内容
29
                    text[i] = text[i].string
30
                # 丢失数据都取2(简陋做法)
31
                # 这么做 MAE=3.6021
32
                text[i] = text[i].replace("-", "2")
33
                text[i] = text[i].replace("Tr", "2")

如果有什么不清楚的评论里答复。

5.4写入csv文件格式化

1
  import csv
2
    # 创建文件对象
3
    f = open(c, 'w', encoding='utf-8', newline='')
4

5
    # 基于文件对象构建 csv写入对象
6
    csv_writer = csv.writer(f)
7
    # 写入内容,text数组
8
   csv_writer.writerow(text)
9
    # 关闭文件
10
    f.close()

5.6封装成类

Write.py

1
# -*- coding: utf-8 -*-
2
# @Time: 2020/12/16
3
# @Author: Eritque arcus
4
import re
5
from bs4 import BeautifulSoup
6
from GetData import GetData
7
import datetime as DT
8
import csv
9

10
def a(t):
11
    return t.replace(" - ", "0")
12

13
# 功能: 写csv
14
def write(years, b, c):
15
    """
16
    :param years: [开始日期距离现在的年份, 结束日期距离现在的年份]
17
    :param b: [开始日期距离现在日期的天数, 结束日期距离现在日期的天数]
18
    :param c: csv文件名
19
    :return: None
20
    """
21
    # 1. 创建文件对象
22
    f = open(c, 'w', encoding='utf-8', newline='')
23

24
    # 2. 基于文件对象构建 csv写入对象
25
    csv_writer = csv.writer(f)
26

27
    # 3. 构建列表头
28
    # , "negAve", "negMax", "negMin"
29
    csv_writer.writerow(["Time", "Ave_t", "Max_t", "Min_t", "Prec", "SLpress", "Winddir", "Windsp", "Cloud"])
30
    # 取现在日期
31
    today = DT.datetime.now()
32
    # 取20天前日期
33
    week_ago = (today - DT.timedelta(days=b[0])).date()
34
    # 20天后
35
    week_pre = (today + DT.timedelta(days=b[1])).date()
36
    # 城市id 广州59287 青岛 54857
37
    id = "59287"
38
    # 爬取数据链接
39
    url = "http://www.meteomanz.com/sy2?l=1&cou=2250&ind=" + id + "&d1=" + str(week_ago.day).zfill(2) + "&m1=" + str(
40
        week_ago.month).zfill(2) + "&y1=" + str(week_ago.year - years[0]) + "&d2=" + str(week_pre.day).zfill(
41
        2) + "&m2=" + str(week_pre.month).zfill(2) + "&y2=" + str(week_pre.year - years[1])
42
    # 显示获取数据集的网址
43
    print(url)
44
    g = GetData(url).Get()
45
    # beautifulsoup解析网页
46
    soup = BeautifulSoup(g, "html5lib")
47
    # 取<tbody>内容
48
    tb = soup.find(name="tbody")
49
    # 取tr内容
50
    past_tr = tb.find_all(name="tr")
51
    for tr in past_tr:
52
        # 取tr内每个td的内容
53
        text = tr.find_all(name="td")
54
        flag = False
55
        negA = negMax = negMin = False
56
        for i in range(0, len(text)):
57
            if i == 0:
58
                text[i] = text[i].a.string
59
                # 网站bug，会给每个月第0天，比如 00/11/2020,所以要drop掉
60
                if "00/" in text[i]:
61
                    flag = True
62
            elif i == 8:
63
                # 把/8去掉，网页显示的格式
64
                text[i] = text[i].string.replace("/8", "")
65
            elif i == 5:
66
                # 去掉单位
67
                text[i] = text[i].string.replace(" Hpa", "")
68
            elif i == 6:
69
                # 去掉风力里括号内的内容
70
                text[i] = re.sub(u"[º(.*?|N|W|E|S)]", "", text[i].string)
71
            else:
72
                # 取每个元素的内容
73
                text[i] = text[i].string
74
            # 丢失数据都取2(简陋做法)
75
            # 这么做 MAE=3.6021
76
            text[i] = "2" if text[i] == "-" else text[i]
77
            text[i] = "2" if text[i] == "Tr" else text[i]
78
        text = text[0:9]
79
        # ext += [str(int(negA)), str(int(negMax)), str(int(negMin))]
80
        # 4. 写入csv文件内容
81
        if not flag:
82
            csv_writer.writerow(text)
83
    # 5. 关闭文件
84
    f.close()

GetData.py

1
# -*- coding: utf-8 -*-
2
# @Time: 2020/12/16
3
# @Author: Eritque arcus
4
# 功能: 爬取数据
5
import urllib3
6

7

8
class GetData:
9
    url = ""
10
    headers = ""
11

12
    def __init__(self, url, header=""):
13
        """
14
        :param url: 获取的网址
15
        :param header: 请求头，默认已内置
16
        """
17
        self.url = url
18
        if header == "":
19
            self.headers = {
20
                'Connection': 'Keep-Alive',
21
                'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,'
22
                          '*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
23
                'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
24
                'Accept-Encoding': 'gzip, deflate',
25
                'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, '
26
                              'like Gecko) Chrome/87.0.4280.66 Mobile Safari/537.36 ',
27
                'Host': 'www.meteomanz.com'
28
            }
29
        else:
30
            self.headers = header
31

32
    def Get(self):
33
        """
34
        :return: 网址对应的网页内容
35
        """
36
        http = urllib3.PoolManager()
37
        return http.request('GET', self.url, headers=self.headers).data

到时候就可以直接用一行命令取得天气数据了，如下面是取去年今日的20天到去年今日的天气数据

1
  # 用近几年的数据做训练集
2
    # 如 [1,1], [20, 0]就是用2019年的今天的20天前到2019年的今天数据做训练集
3
    # 写入csv
4
    Write([1, 1], [20, 0], "weather_train_train.csv")

结果如下 weather_train_train.csv

Time,Ave_t,Max_t,Min_t,Prec,SLpress,Winddir,Windsp,Cloud

07/12/2019,14.8,20.8,8.8,0.0,1026.3,331,11,0
06/12/2019,15.2,19.8,10.7,0.0,1026.6,344,15,0
05/12/2019,14.5,20.4,8.6,2,1026.2,346,13,8
04/12/2019,13.8,20.4,7.1,0.0,1024.7,335,16,2
03/12/2019,13.0,18.9,7.1,0.0,1024.8,330,10,0
02/12/2019,18.2,24.9,11.5,0.0,1024.8,347,18,3
01/12/2019,18.1,24.9,11.4,0.0,1020.9,332,16,1
30/11/2019,17.5,23.6,11.4,0.0,1020.5,352,8,3
29/11/2019,15.8,20.1,11.5,0.0,1023.6,349,11,4
28/11/2019,20.4,27.1,13.8,0.0,1024.5,337,19,3
27/11/2019,21.9,27.1,16.6,0.0,1021.3,336,12,0
26/11/2019,22.2,28.4,16.1,0.0,1021.1,356,6,6
25/11/2019,22.2,29.3,15.2,0.0,1020.8,344,13,3
24/11/2019,21.4,29.3,13.6,0.0,1018.5,346,5,0
23/11/2019,20.7,28.4,13.0,0.0,1017.2,352,5,1
22/11/2019,19.6,27.6,11.6,0.0,1017.3,331,6,0
21/11/2019,18.4,25.1,11.6,0.0,1019.1,323,9,1
20/11/2019,18.3,24.2,12.4,0.0,1020.3,338,7,0
19/11/2019,19.1,25.4,12.8,0.0,1020.5,342,11,0
18/11/2019,22.2,28.8,15.7,0.0,1018.8,342,17,0
17/11/2019,22.2,28.8,15.7,0.0,1015.2,358,7,3

6.数据预处理

如果在把上面的数据作为数据集训练，我们还需要做些数据的预处理，因为有些情况下我们得到的数据会有残缺，这种情况我们就要选择抛弃那一列或者用方差或其他什么的方法填充缺少的数据。

因为在我已经决定把数据里丢失的项全部取2了，所以下面我会列出可能的解决方法而不使用。

新建个ProcessData.py里建立ProcessData方法以获得数据

1
# -*- coding: utf-8 -*-
2
# @Time: 2020/12/16
3
# @Author: Eritque arcus
4
from Write import Write
5
import pandas as pd
6
from sklearn.model_selection import train_test_split
7
from sklearn.impute import SimpleImputer
8
import seaborn as sns
9
import matplotlib.pyplot as plt
10

11

12
# 功能: 数据预处理
13
def ProcessData():
14
    """
15
    :return:
16
        [X_train X训练数据集,
17
        X_valid X训练数据集的验证集,
18
        y_train Y训练数据集,
19
        y_valid Y训练数据集的验证集,
20
        imputed_X_test 预测数据集]
21
    """
22
    # 用近几年的数据做训练集
23
    # 如 [1,1], [20, 0]就是用2019年的今天的20天前到2019年的今天数据做训练集
24
    # 写入csv
25
    Write([1, 1], [20, 0], "weather_train_train.csv")
26
    Write([1, 1], [0, 20], "weather_train_valid.csv")
27
    Write([0, 0], [20, 0], "weather_test.csv")
28
    X_test = pd.read_csv("weather_test.csv", index_col="Time", parse_dates=True)
29
    # 读取测试集和验证集
30
    X = pd.read_csv("weather_train_train.csv", index_col="Time", parse_dates=True)
31
    y = pd.read_csv("weather_train_valid.csv", index_col="Time", parse_dates=True)
32
    # 把全部丢失的数据都drop，MAE=3.7又高了，所以去掉了
33
    # dxtcol = [col for col in X_test.columns
34
    #           if X_test[col].isnull().all()]
35
    # dxcol = [col for col in X.columns
36
    #          if X[col].isnull().all()]
37
    # dycol = [col for col in y.columns
38
    #          if y[col].isnull().all()]
39
    # for a1 in [dxtcol, dxcol, dycol]:
40
    #     for a2 in a1:
41
    #         if a2 in X_test.columns:
42
    #             X_test = X_test.drop(a2, axis=1)
43
    #         if a2 in X.columns:
44
    #             X = X.drop(a2, axis=1)
45
    #         if a2 in y.columns:
46
    #             y = y.drop(a2, axis=1)
47
    # 数据归一化和标准化，无法还原不用
48
    # scaler = preprocessing.StandardScaler()
49
    # pars = [cols for cols in X.columns if cols != "Time"]
50
    # for data in [X, y, X_test]:
51
    #     for par in pars:
52
    #         data[par] = scaler.fit_transform(data[par].values.reshape(-1, 1))
53
    #         # temp = scaler.fit(data[par].values.reshape(-1, 1))
54
    #         # data[par] = scaler.fit_transform(data[par].values.reshape(-1, 1), temp)
55

56
    # 填充缺少的数值用方差，不清楚效果如何
57
    my_imputer = SimpleImputer()
58
    X_train, X_valid, y_train, y_valid = train_test_split(X, y, train_size=0.8, test_size=0.2, random_state=0)
59
    imputed_X_train = pd.DataFrame(my_imputer.fit_transform(X_train))
60
    imputed_X_valid = pd.DataFrame(my_imputer.transform(X_valid))
61
    imputed_X_train.columns = X_train.columns
62
    imputed_X_valid.columns = X_valid.columns
63
    imputed_y_train = pd.DataFrame(my_imputer.fit_transform(y_train))
64
    imputed_y_valid = pd.DataFrame(my_imputer.transform(y_valid))
65
    imputed_y_train.columns = y_train.columns
66
    imputed_y_valid.columns = y_valid.columns
67
    imputed_X_test = pd.DataFrame(my_imputer.fit_transform(X_test))
68

69
    # 画折线图
70
    # sns.lineplot(data=X)
71
    # plt.show()
72
    # sns.lineplot(data=y)
73
    # plt.show()
74
    # sns.lineplot(data=X_test)
75
    # plt.show()
76
    # 返回分割后的数据集
77
    return [imputed_X_train, imputed_X_valid, imputed_y_train, imputed_y_valid, imputed_X_test]

1
from sklearn.ensemble import RandomForestRegressor # 随机树森林模型
2
import joblib # 保存模型为pkl
3
from sklearn.metrics import mean_absolute_error # MAE评估方法
4
from ProcessData import ProcessData # 取数据

1
# 取到数据
2
    [X_train, X_valid, y_train, y_valid, X_test] = ProcessData()

7.2建立模型

1
  # 用XGB模型，不过用有bug
2
    # modelX = XGBRegressor(n_estimators=1000, learning_rate=0.05, random_state=0, n_jobs=4)
3
    # # model.fit(X_train_3, y_train_3)
4
    # # model.fit(X_train_2, y_train_2)
5
    # col = ["Ave_t", "Max_t", "Min_t", "Prec","SLpress", "Winddir", "Windsp", "Cloud"]
6
    # modelX.fit(X_train, y_train,
7
    #           early_stopping_rounds=5,
8
    #           eval_set=[(X_valid, y_valid)],
9
    #           verbose=False)
10
    # 随机树森林模型
11
    model = RandomForestRegressor(random_state=0, n_estimators=1001)
12
    # 训练模型
13
    model.fit(X_train, y_train)

其中n_estimators是可自己选的，不过在多次调试后得到1001是MAE最优

7.3 获取模型评估结果

1
  # 用MAE评估
2
    score = mean_absolute_error(y_valid, preds)

7.4 用joblib模块保存模型

保存后的模型便于传播即可多次使用，但当前环境下的需求不大但我还是写了

1
    # 保存模型到本地
2
    joblib.dump(model, a)

7.5 封装

GetModel.py

1
# -*- coding: utf-8 -*-
2
# @Time: 2020/12/16
3
# @Author: Eritque arcus
4
from sklearn.ensemble import RandomForestRegressor
5
import joblib
6
from sklearn.metrics import mean_absolute_error
7
from ProcessData import ProcessData
8

9

10
# 训练并保存模型
11
def GetModel(a="Model.pkl"):
12
    """
13
    :param a: 模型文件名
14
    :return:
15
        [socre: MAE评估结果,
16
        X_test: 预测数据集]
17
    """
18
    # 取到数据
19
    [X_train, X_valid, y_train, y_valid, X_test] = ProcessData()
20
    # 用XGB模型，不过用有bug
21
    # modelX = XGBRegressor(n_estimators=1000, learning_rate=0.05, random_state=0, n_jobs=4)
22
    # # model.fit(X_train_3, y_train_3)
23
    # # model.fit(X_train_2, y_train_2)
24
    # col = ["Ave_t", "Max_t", "Min_t", "Prec","SLpress", "Winddir", "Windsp", "Cloud"]
25
    # modelX.fit(X_train, y_train,
26
    #           early_stopping_rounds=5,
27
    #           eval_set=[(X_valid, y_valid)],
28
    #           verbose=False)
29
    # 随机树森林模型
30
    model = RandomForestRegressor(random_state=0, n_estimators=1001)
31
    # 训练模型
32
    model.fit(X_train, y_train)
33
    # 预测模型，用上个星期的数据
34
    preds = model.predict(X_valid)
35
    # 用MAE评估
36
    score = mean_absolute_error(y_valid, preds)
37
    # 保存模型到本地
38
    joblib.dump(model, a)
39
    # 返回MAE
40
    return [score, X_test]

8 总控

8.1 代码

这几篇文章写了零零散散好几个类，所以要写个总文件也就是启动文件串起来，然后在控制台输出 Main.py

1
# -*- coding: utf-8 -*-
2
# @Time: 2020/12/16
3
# @Author: Eritque arcus
4
import joblib
5
import datetime as DT
6
from GetModel import GetModel
7
import matplotlib.pyplot as plt
8

9

10
# 训练并保存模型并返回MAE
11
r = GetModel()
12
print("MAE:", r[0])
13
# 读取保存的模型
14
model = joblib.load('Model.pkl')
15

16
# 最终预测结果
17
preds = model.predict(r[1])
18
# 反归一化或标准化，不过出bug了，不用
19
# for cols in range(0, len(preds)):
20
#     preds[cols] = scaler.inverse_transform(preds[cols])
21
# sns.lineplot(data=preds)
22
# plt.show()
23
# 打印结果到控制台
24
print("未来7天预测")
25
print(preds)
26
all_ave_t = []
27
all_high_t = []
28
all_low_t = []
29
for a in range(1, 7):
30
    today = DT.datetime.now()
31
    time = (today + DT.timedelta(days=a)).date()
32
    print(time.year, '/', time.month, '/', time.day,
33
          ': 平均气温',  preds[a][0],
34
          '最高气温', preds[a][1],
35
          '最低气温', preds[a][2],
36
          "降雨量", preds[a][3],
37
          "风力", preds[a][4])
38
    all_ave_t.append(preds[a][0])
39
    all_high_t.append(preds[a][1])
40
    all_low_t.append(preds[a][2])
41
temp = {"ave_t": all_ave_t, "high_t": all_high_t, "low_t": all_low_t}
42
# 绘画折线图
43
plt.plot(range(1, 7), temp["ave_t"], color="green", label="ave_t")
44
plt.plot(range(1, 7), temp["high_t"], color="red", label="high_t")
45
plt.plot(range(1, 7), temp["low_t"], color="blue", label="low_t")
46
plt.legend()  # 显示图例
47
plt.ylabel("Temperature(°C)")
48
plt.xlabel("day")
49
# 显示
50
plt.show()

8.2 使用方法

直接用python运行pre_weather/Main.py，就会在控制台输出预测的数据

python pre_weather/Main.py

或

在你的python代码里用joblib导入生成的模型，然后输入你的数据进行预测

(PS: 因为模型的训练用的数据日期和你预测数据的日期有关，所以不建议直接用使用非当天训练的模型进行预测，误差可能偏大)

如以下代码(在Main.py的11行):

1
import joblib
2

3
# 读取保存的模型
4
model = joblib.load('Model.pkl')
5

6
# 最终预测结果
7
preds = model.predict(r[1])
8
其中，r[1]是预测数据

或

参考Main.py，自己写一个符合你需求的启动文件

9 最后效果

本系列教程到这就结束了，代码具体还要以github项目:PYWeatherReport 为主，可能会在这个github项目上不定期优化更新

-END-

项目地址

系列教程

0.流程介绍

1. 环境搭建

a.python

1.1 涉及到的机器学习相关库

1.1.1 sklearn

1.1.2 panda

1.1.3 seaborn

1.1.4 joblib

2.寻找数据来源

3.分析数据源网址规则