第四章使用API

2017-12-11

4.1API概述

http://freegeoip.net/json/50.78.253.58

1
2
3

{"ip":"50.78.253.58","country_code":"US","country_name":"美国","region_
code":"MA","region_name":"Massachusetts","city":"Chelmsford","zipcode":"01824",
"latitude":42.5879,"longitude":-71.3498,"metro_code":"506","area_code":"978"}

4.2API通用规则

4.2.1方法

利用HTTP从网络服务获取信息的方式

GET 输入网址浏览网站

POST 填写表单、提交信息

PUT 更新对象或信息

DELETE 删除对象

4.2.2验证

http://developer.echonest.com/api/v4/artist/songs?api_key=<你的api_key>
%20&name=guns%20n%27%20roses&format=json&start=0&results=100

4.3服务器响应

JSON 比XML 更受欢迎

4.4Echo Nest

4.5Twitter API

4.6GOOGLE API

4.7解析JSON数据

import json
from urllib.request import urlopen

def getCountry(ipAddress):
    response = urlopen('http://freegeoip.net/json/'+ipAddress).read().decode('utf-8')
    responseJson = json.loads(response)
    return responseJson.get('country_code')

print(getCountry('50.78.253.58'))

US

import json
jsonString = '{"arrayOfNums":[{"number":0}, {"number":1}, {"number":2}],"arrayOfFruits":[{"fruit":"apple"}, {"fruit":"banana"},{"fruit":"pear"}]}'
jsonObj = json.loads(jsonString)
print(jsonObj.get("arrayOfNums"))
print(jsonObj.get("arrayOfNums")[1])
print(jsonObj.get("arrayOfNums")[1].get("number")+
    jsonObj.get("arrayOfNums")[2].get("number"))
print(jsonObj.get("arrayOfFruits")[2].get("fruit"))

[{'number': 0}, {'number': 1}, {'number': 2}]
{'number': 1}
3
pear

4.8回到主题

寻找维基百科贡献者来自哪里

from urllib.request import urlopen
from bs4 import BeautifulSoup
import datetime
import random
import re

random.seed(datetime.datetime.now())
def getLinks(articleUrl):
    html = urlopen('http://en.wikipedia.org'+articleUrl)
    bsObj = BeautifulSoup(html, 'lxml')
    return bsObj.find('div',{'id':'bodyContent'}).findAll('a',
                                                        href=re.compile('^(/wiki)((?!:).)*$'))
def getHistoryIPs(pageUrl):
    #编辑历史页面URL链接格式是：
    #http://en.wikipedia.org/w/index/php?title=Title_in_URL&action=history
    pageUrl = pageUrl.replace('/wiki/','')
    historyUrl = "http://en.wikipedia.org/w/index.php?title="+pageUrl+'&action=history'
    print('history url is:'+historyUrl)
    html = urlopen(historyUrl)
    bsObj = BeautifulSoup(html, 'lxml')
    #找出class属性是'mw-anonuserlink'的链接
    #它们用IP地址代替用户名
    ipAddresses = bsObj.findAll('a',{'class':'mw-anonuserlink'})
    addressList = set()
    for ipAddress in ipAddresses:
        addressList.add(ipAddress.get_text())
    return addressList

def getCountry(ipAddress):
    try:
        response = urlopen('http://freegeoip.net/json/'+ipAddress).read().decode('utf-8')
    except HTTPError:
        return None
    responseJson = json.loads(response)
    return responseJson.get('country_code')

links = getLinks('/wiki/Python_(programming_language)')

while(len(links) > 0):
    for link in links:
        print('-----------')
        historyIPs = getHistoryIPs(link.attrs['href'])
        for historyIP in historyIPs:
            country = getCountry(historyIP)
            if country is not None:
                print(historyIP+'is from'+country)
    newLink = links[random.randint(0, len(links)-1)].attrs['href']
    links = getLinks(newLink)

-----------
history url is:http://en.wikipedia.org/w/index.php?title=Programming_paradigm&action=history
2605:a601:e0c:6300:996d:68c0:fb03:af2cis fromUS
168.216.130.133is fromUS
92.115.222.143is fromMD
113.162.8.249is fromVN



---------------------------------------------------------------------------

KeyboardInterrupt                         Traceback (most recent call last)