python一键下载与替换hexo博客里的图片地址
2024-02-09 17:13:58

要求是在hexo博客里的图片地址都是网络地址, 然后把图片下载到本地, 然后替换掉原来的网络地址

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# -*- coding: utf-8 -*-
# @Time: 2021/3/31
# @Update: 2024/02/09
# @Author: Eritque arcus
# @File: downloadImage.py
import mimetypes, os, pathlib, re, requests

if __name__ == '__main__':
p = pathlib.Path("blog/source")
image_p = p / "images"
post_p = p / "_posts"
if not post_p.exists():
print(f"{post_p.absolute()} noy found")
exit(1)
else:
os.chmod(post_p, 0o755)
image_p.mkdir(exist_ok=True)
os.chmod(p, 0o755)
# 循环目录下每一个.md文件
post_p.glob("*.md")
for file in post_p.glob("*.md"):
print(file)
# 以utf8编码读取
f = open(file, "r", encoding='utf8', errors='ignore')
i = 0
content = f.read()
f.close()
# 匹配markdown中的图片 即 ![tag](url)
for text in re.findall(r"\!\[[^\s].*\]\([a-zA-z]+:\/\/[^\s]*\)", content):
print(text)
# 提取tag内容
tag = re.findall(r"\!\[[^\s].*\](?=\([a-zA-z]+:\/\/[^\s]*\))", text)[0]
# 提取url内容
urldata = re.findall(r"(?<=\!\[[^\s].*\])\([a-zA-z]+:\/\/[^\s]*\)", text)[0]
# 去掉括号
u = urldata[1:len(urldata) - 1]
# 提取当前文章的名字加上编号即为图片名字
name = file.name + "-" + str(i)
# 获取图片
response = requests.get(u)
# 取请求头content-type属性获取扩展名如.png/.jpg
content_type = response.headers['content-type']
extension = mimetypes.guess_extension(content_type)
# 请求返回内容
img = response.content
# 写入图片文件
with open(image_p / (name + extension), 'wb') as f:
f.write(img)
f.close()
# 新引用地址
new_u = "/images/" + name + extension
# 替换原文
content = content.replace(text, tag + "(" + new_u + ")")
i += 1
# 写入文章文件
f = open(file, "w", encoding="utf8")
f.write(content)
f.close()
Prev
2024-02-09 17:13:58
Next