第九章 穿越网页表单与登录窗口进行采集

  • POST方法, 把消息推送给网络服务器进行存储和分析

9.1 Python Requests库

  • Requests库 擅长处理复杂HTTP请求、cookie、header 等内容的Python第三方库

9.2 提交一个基本表单

1
2
3
4
import requests
params = {'firstname':'Ryan', 'lastname':'Mitchell'}
r = requests.post('http://pythonscraping.com/files/processing.php', data=params)
print(r.text)
Hello there, Ryan Mitchell!

9.3 单选按钮、复选框和其他输入

9.4 提交文件和图像

9.5 处理登录和cookie

1
2
3
4
5
6
7
8
9
10
import requests

params = {'username':'Ryan', 'password':'password'}
r = requests.post('http://pythonscraping.com/pages/cookies/welcome.php', params)
print('Cookie is set to:')
print(r.cookies.get_dict())
print('--------')
print('Going to profile page...')
r = requests.get('http://pythonscraping.com/pages/cookies/profile.php', cookies = r.cookies)
print(r.text)
Cookie is set to:
{'loggedin': '1', 'username': 'Ryan'}
--------
Going to profile page...
Hey Ryan! Looks like you're still logged into the site!
1
2
3
4
5
6
7
8
9
10
11
12
import requests

session = requests.Session()

params = {'username':'username', 'password':'password'}
s = session.post('http://pythonscraping.com/pages/cookies/welcome.php', params)
print('Cookie is set to:')
print(s.cookies.get_dict())
print('----------')
print('Going to profile page...')
s = session.get('http://pythonscraping.com/pages/cookies/profile.php')
print(s.text)
Cookie is set to:
{'loggedin': '1', 'username': 'username'}
----------
Going to profile page...
Hey username! Looks like you're still logged into the site!
1
2
3
4
5
6
import requests
from requests.auth import AuthBase
from requests.auth import HTTPBasicAuth
auth = HTTPBasicAuth('rtan', 'password')
r = requests.post('http://pythonscraping.com/pages/auth/login.php', auth=auth)
print(r.text)
<p>Hello rtan.</p><p>You entered password as your password.</p>
分享到