通常在访问需要cookie的页面时,python一般需要三个模块来解决:urllib2/urllib/cookielib模块,其中urllib2模块负责请求页面,urllib模块负责对POST数据进行编码,cookielib模块负责处理cookie信息。
cookielib模块与urllib2模块结合使用的代码一般如下:
mcj=cookielib.MozillaCookieJar() opener=urllib2.build_opener(urllib2.HTTPCookieProcessor(mcj)) urllib2.install_opener(opener)
有了上面的代码后,在程序中就不用管cookie了,cookielib模块会处理好,是不是很爽。但是昨天在写程序时遇到了一个特殊需求:需要获取登录成功后的cookie值,而不仅仅是成功登录(因为另一个程序是将登录成功过后的cookie加入到header来获取登录后的内容)。
通过查阅资料发现获取登陆成功后的cookie值的有两种方法:
- cookielib模块中的FileCookieJar/MozillaCookieJar/LWPCooieJar对象先转换成字符串,然后处理字符串获取cookie即可。
- cookielib模块中的FileCookieJar/MozillaCookieJar/LWPCooieJar对象的._cookies.values()就是获取的cookie值,将其转换成标准的cookie字符串即可。
下面详细说一下过程(以189邮箱的cookie为例)
一、方法1
cookielib模块中的FileCookieJar/MozillaCookieJar/LWPCooieJar对象打印出来如下:
<cookielib.FileCookieJar[<Cookie ACCOUNT=13541295162@189.cn for .189.cn/>, <Cookie SESSION_ID=000000155790240-20131211091907124367-020 for .189.cn/>, <Cookie SSONKEY=76add719b0af7a2fc80b95bb436bfb4a0ae869f6171d2177f438366a951d3b9b60ca45e15c71143eea4c7d9f72a1d911f33c466662972fa3d97f83956627e79438911703cc2f9d09badeece1dd73ec606b85e040bb1c0d19753f22f49fbb4761505319fa67c68ca7e590582dda831d648a7d51f669902c7583f83bedf730e9fb2d49dc363122a48485dfa19af45d8f6af076d7fba9922c4dcd6e20cdeb23817ed712e89f318fe1e74128095f6d948e89e3963dbe9d15eeb10bc6112cc029c82535aa3f5c7bac0daf89b40f5270e2151b01c035816ef23428 for .189.cn/>, <Cookie VERIFY_LOGON=7e0763f479ce9f4a98cba921d38659c2 for .189.cn/>, <Cookie SSON=57effa525080063a774c0b063df3844dde36dbf3ae12389fa35a9bc0a8f4af063005f929b94898e8cfd0e7140cf0e2aec6b6c4741705b2f3df979e4387a75ed82e640250917828c0 for .e.189.cn/>, <Cookie JSESSIONID=abcwaGbTt1gd9biA98Glu for open.e.189.cn/>, <Cookie JSESSIONID=aw9_oF8LCWyb8s98Gl for webmail14.189.cn/>]>
所以,用str()函数将其转换成为字符串后,用正则表达式提取即可。
二、方法2
FileCookieJar/MozillaCookieJar/LWPCooieJar对象的._cookies.values()的内容形式如下:
[{'/': {'JSESSIONID': Cookie(version=0, name='JSESSIONID', value='aQsFGM9Ku1sdLbXSFl', port=None, port_specified=False, domain='webmail5.189.cn', domain_specified=False, domain_initial_dot=False, path='/', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={}, rfc2109=False)}}, {'/': {'ACCOUNT': Cookie(version=0, name='ACCOUNT', value='13541295162@189.cn', port=None, port_specified=False, domain='.189.cn', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={}, rfc2109=False), 'VERIFY_LOGON': Cookie(version=0, name='VERIFY_LOGON', value='7e0763f479ce9f4a98cba921d38659c2', port=None, port_specified=False, domain='.189.cn', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={}, rfc2109=False), 'SSONKEY': Cookie(version=0, name='SSONKEY', value='76add719b0af7a2fc80b95bb436bfb4a0ae869f6171d2177f438366a951d3b9b60ca45e15c71143eea4c7d9f72a1d911f33c466662972fa3d97f83956627e79438911703cc2f9d09badeece1dd73ec606b85e040bb1c0d19753f22f49fbb4761505319fa67c68ca7e590582dda831d648a7d51f669902c7583f83bedf730e9fb2d49dc363122a48485dfa19af45d8f6af076d7fba9922c4dcd6e20cdeb23817ed712e89f318fe1e74128095f6d948e897e5cfae642cdbd9ebc235f33fcdbffd382eb2f31cd3d3117cbda44a57f5f3e4e37af9c88f13ba75c', port=None, port_specified=False, domain='.189.cn', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={}, rfc2109=False), 'SESSION_ID': Cookie(version=0, name='SESSION_ID', value='000002683296688-20131211032845368117-005', port=None, port_specified=False, domain='.189.cn', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={}, rfc2109=False)}}, {'/': {'SSON': Cookie(version=0, name='SSON', value='57effa525080063a774c0b063df3844dde36dbf3ae12389fa35a9bc0a8f4af06936ad8a9fdb4727c64a34d285b7d0ce1bbf4814917563439df979e4387a75ed82e640250917828c0', port=None, port_specified=False, domain='.e.189.cn', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={}, rfc2109=False)}}, {'/': {'JSESSIONID': Cookie(version=0, name='JSESSIONID', value='abc81jZBd0Ogx4epXSFlu', port=None, port_specified=False, domain='open.e.189.cn', domain_specified=False, domain_initial_dot=False, path='/', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={}, rfc2109=False)}}]
可以看出,上述内容先是一个列表,列表中的每一项是一个字典(字典的键名是cookie的路径,键值是一个字典(字典的键名是cookie名称,键值是一个cookie对象))。其中cookie对象的属性如下(可以参考http://docs.python.org/2/library/cookielib.html中【20.21.5. Cookie Objects】章节):
- Integer or None. Netscape cookies have version 0. RFC 2965 and RFC 2109 cookies have a version cookie-attribute of 1. However, note that cookielib may ‘downgrade’ RFC 2109 cookies to Netscape cookies, in which case version is 0.
- Cookie name (a string).
- Cookie value (a string), or None.
- String representing a port or a set of ports (eg. ‘80’, or ‘80,8080’), or None.
- Cookie path (a string, eg. '/acme/rocket_launchers').
- True if cookie should only be returned over a secure connection.
- Integer expiry date in seconds since epoch, or None. See also the is_expired() method.
- True if this is a session cookie.
- String comment from the server explaining the function of this cookie, or None.
- URL linking to a comment from the server explaining the function of this cookie, or None.
- True if this cookie was received as an RFC 2109 cookie (ie. the cookie arrived in a Set-Cookie header, and the value of the Version cookie-attribute in that header was 1). This attribute is provided because cookielib may ‘downgrade’ RFC 2109 cookies to Netscape cookies, in which case version is 0.New in version 2.5.
- True if a port or set of ports was explicitly specified by the server (in the Set-Cookie / Set-Cookie2 header).
- True if a domain was explicitly specified by the server.
- True if the domain explicitly specified by the server began with a dot ('.').
Cookies may have additional non-standard cookie-attributes. These may be accessed using the following methods:
- Return true if cookie has the named cookie-attribute.
- If cookie has the named cookie-attribute, return its value. Otherwise, return default.
- Set the value of the named cookie-attribute.
The Cookie class also defines the following method:
- True if cookie has passed the time at which the server requested it should expire. If now is given (in seconds since the epoch), return whether the cookie has expired at the specified time.
所以将CookieJar对象中的cookie值转换成可加入请求头中的cookie字符串的详细代码如下:
cookie_str="" for content in mcj._cookies.values(): for path,value in content.items(): for name,cookie in value.items(): if cookie.domain.find('e.189.cn')==-1: cookie_str=cookie_str+cookie.name+"="+cookie.value+"; " cookie_str=cookie_str[:-2]
通过上面代码处理后就可以将CookieJar对象中的cookie值转换成可用于请求的cookie字符串,如下:
JSESSIONID=aQsFGM9Ku1sdLbXSFl; ACCOUNT=13541295162@189.cn; VERIFY_LOGON=7e0763f479ce9f4a98cba921d38659c2; SSONKEY=76add719b0af7a2fc80b95bb436bfb4a0ae869f6171d2177f438366a951d3b9b60ca45e15c71143eea4c7d9f72a1d911f33c466662972fa3d97f83956627e79438911703cc2f9d09badeece1dd73ec606b85e040bb1c0d19753f22f49fbb4761505319fa67c68ca7e590582dda831d648a7d51f669902c7583f83bedf730e9fb2d49dc363122a48485dfa19af45d8f6af076d7fba9922c4dcd6e20cdeb23817ed712e89f318fe1e74128095f6d948e897e5cfae642cdbd9ebc235f33fcdbffd382eb2f31cd3d3117cbda44a57f5f3e4e37af9c88f13ba75c; SESSION_ID=000002683296688-20131211032845368117-005
———————————————————————————————————————————————–
查阅资料时,无意翻到《Python利用CookieJar自动处理Cookies》。作者写的python代码比我简便多了,是在佩服。代码如下:
for c in mcj: cookie += c.name + '=' + c.value + '; ' # 输出合并后的cookie信息,跟上面的输出进行对比,看看是否是真的合并了cookie信息 print cookie
即原来CookieJar是一个包含cookie对象的列表,所以只要简单的遍历列表,然后利用cookie对象的属性就可以简便的将CookieJar对象转换为cookie字符串。
转载请注明:jinglingshu的博客 » 转换cookielib模块中cookie为可用于header中的cookie字符串