第四章 使用API

4.1API概述

http://freegeoip.net/json/50.78.253.58

1
2
3
{"ip":"50.78.253.58","country_code":"US","country_name":"美国","region_
code":"MA","region_name":"Massachusetts","city":"Chelmsford","zipcode":"01824",
"latitude":42.5879,"longitude":-71.3498,"metro_code":"506","area_code":"978"}

4.2API通用规则

4.2.1方法

利用HTTP从网络服务获取信息的方式

GET 输入网址浏览网站

POST 填写表单、提交信息

PUT 更新对象或信息

DELETE 删除对象

4.2.2验证

http://developer.echonest.com/api/v4/artist/songs?api_key=<你的api_key>
%20&name=guns%20n%27%20roses&format=json&start=0&results=100

4.3服务器响应

  • JSON 比XML 更受欢迎

4.4Echo Nest

4.5Twitter API

4.6GOOGLE API

4.7解析JSON数据

1
2
3
4
5
6
7
8
9
import json
from urllib.request import urlopen

def getCountry(ipAddress):
response = urlopen('http://freegeoip.net/json/'+ipAddress).read().decode('utf-8')
responseJson = json.loads(response)
return responseJson.get('country_code')

print(getCountry('50.78.253.58'))
US
1
2
3
4
5
6
7
8
import json
jsonString = '{"arrayOfNums":[{"number":0}, {"number":1}, {"number":2}],"arrayOfFruits":[{"fruit":"apple"}, {"fruit":"banana"},{"fruit":"pear"}]}'
jsonObj = json.loads(jsonString)
print(jsonObj.get("arrayOfNums"))
print(jsonObj.get("arrayOfNums")[1])
print(jsonObj.get("arrayOfNums")[1].get("number")+
jsonObj.get("arrayOfNums")[2].get("number"))
print(jsonObj.get("arrayOfFruits")[2].get("fruit"))
[{'number': 0}, {'number': 1}, {'number': 2}]
{'number': 1}
3
pear

4.8回到主题

  • 寻找维基百科贡献者来自哪里
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
from urllib.request import urlopen
from bs4 import BeautifulSoup
import datetime
import random
import re

random.seed(datetime.datetime.now())
def getLinks(articleUrl):
html = urlopen('http://en.wikipedia.org'+articleUrl)
bsObj = BeautifulSoup(html, 'lxml')
return bsObj.find('div',{'id':'bodyContent'}).findAll('a',
href=re.compile('^(/wiki)((?!:).)*$'))
def getHistoryIPs(pageUrl):
#编辑历史页面URL链接格式是:
#http://en.wikipedia.org/w/index/php?title=Title_in_URL&action=history
pageUrl = pageUrl.replace('/wiki/','')
historyUrl = "http://en.wikipedia.org/w/index.php?title="+pageUrl+'&action=history'
print('history url is:'+historyUrl)
html = urlopen(historyUrl)
bsObj = BeautifulSoup(html, 'lxml')
#找出class属性是'mw-anonuserlink'的链接
#它们用IP地址代替用户名
ipAddresses = bsObj.findAll('a',{'class':'mw-anonuserlink'})
addressList = set()
for ipAddress in ipAddresses:
addressList.add(ipAddress.get_text())
return addressList

def getCountry(ipAddress):
try:
response = urlopen('http://freegeoip.net/json/'+ipAddress).read().decode('utf-8')
except HTTPError:
return None
responseJson = json.loads(response)
return responseJson.get('country_code')

links = getLinks('/wiki/Python_(programming_language)')

while(len(links) > 0):
for link in links:
print('-----------')
historyIPs = getHistoryIPs(link.attrs['href'])
for historyIP in historyIPs:
country = getCountry(historyIP)
if country is not None:
print(historyIP+'is from'+country)
newLink = links[random.randint(0, len(links)-1)].attrs['href']
links = getLinks(newLink)
-----------
history url is:http://en.wikipedia.org/w/index.php?title=Programming_paradigm&action=history
2605:a601:e0c:6300:996d:68c0:fb03:af2cis fromUS
168.216.130.133is fromUS
92.115.222.143is fromMD
113.162.8.249is fromVN



---------------------------------------------------------------------------

KeyboardInterrupt                         Traceback (most recent call last)
分享到