requests库是python中常用的网络库,有着比原生urllib更丰富的功能和更易用的接口,但是并不一定有相当的灵活性。这不现在就有一个问题。
遇到问题
以数据万象的图片处理请求为例:
http://examples-1251000004.picsh.myqcloud.com/sample.jpeg?imageMogr2/sharpen/55|imageView2/1/w/200/h/300/q/85
该请求是将指定图片进行锐化处理然后再做压缩操作,正常请求是没问题的,但当使用requests库进行下载时却:
发现requests的下载结果并没有生效,抓包对比发现,requests将请求的url做了urlencode,导致变成了
http://examples-1251000004.picsh.myqcloud.com/sample.jpeg?imageMogr2/sharpen/55|imageView2/1/w/200/h/300/q/85
可见 管道操作符 | 变成了 %7C 从而导致管道操作失效了。
无功而返
纳尼,这有办法关闭吗?requests做的这么好一定留了开关吧,抱着试一试的态度翻开了requests的代码:
def request(self, method, url, params=None, data=None, headers=None, cookies=None, files=None, auth=None, timeout=None, allow_redirects=True, proxies=None, hooks=None, stream=None, verify=None, cert=None, json=None): """Constructs a :class:`Request <Request>`, prepares it and sends it. Returns :class:`Response <Response>` object. :param method: method for the new :class:`Request` object. :param url: URL for the new :class:`Request` object. :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`. :param data: (optional) Dictionary, list of tuples, bytes, or file-like object to send in the body of the :class:`Request`. :param json: (optional) json to send in the body of the :class:`Request`. :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`. :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`. :param files: (optional) Dictionary of ``'filename': file-like-objects`` for multipart encoding upload. :param auth: (optional) Auth tuple or callable to enable Basic/Digest/Custom HTTP Auth. :param timeout: (optional) How long to wait for the server to send data before giving up, as a float, or a :ref:`(connect timeout, read timeout) <timeouts>` tuple. :type timeout: float or tuple :param allow_redirects: (optional) Set to True by default. :type allow_redirects: bool :param proxies: (optional) Dictionary mapping protocol or protocol and hostname to the URL of the proxy. :param stream: (optional) whether to immediately download the response content. Defaults to ``False``. :param verify: (optional) Either a boolean, in which case it controls whether we verify the server's TLS certificate, or a string, in which case it must be a path to a CA bundle to use. Defaults to ``True``. :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair. :rtype: requests.Response """
事与愿违,并没有这个开关,而且requests还是对整个uri做的urlencode,也就是不管你使用 params设置的query参数还是直接加到url中自己拼好的,它都要干预一下:
def prepare_url(self, url, params): """Prepares the given HTTP URL.""" #: Accept objects that have string representations. #: We're unable to blindly call unicode/str functions #: as this will include the bytestring indicator (b'') #: on python 3.x. #: https://github.com/requests/requests/pull/2238 '''省略很多信息''' enc_params = self._encode_params(params) if enc_params: if query: query = '%s&%s' % (query, enc_params) else: query = enc_params url = requote_uri(urlunparse([scheme, netloc, path, None, query, fragment])) self.url = url def requote_uri(uri): """Re-quote the given URI. This function passes the given URI through an unquote/quote cycle to ensure that it is fully and consistently quoted. :rtype: str """ safe_with_percent = "!#$%&'()*+,/:;=?@[]~" safe_without_percent = "!#$&'()*+,/:;=?@[]~" try: # Unquote only the unreserved characters # Then quote only illegal characters (do not quote reserved, # unreserved, or '%') return quote(unquote_unreserved(uri), safe=safe_with_percent) except InvalidURL: # We couldn't unquote the given URI, so let's try quoting it, but # there may be unquoted '%'s in the URI. We need to make sure they're # properly quoted so they do not cause issues elsewhere. return quote(uri, safe=safe_without_percent)
就这样被我征服
没有了办法,被requests逼上绝路,只能自己另辟蹊径了。如何能不更改requests源码而更通用的解决问题呢,可能这是一个小众问题,被股哥和度姐拒绝后,我开始了研究源码,既然没有参数控制,看看能不能将requests.url修改一下,如下所示,自己设置的url参数被放在了 req.url 来保存,而该参数则是在 prepare_request 函数中进行了urlencode的修改:
def request(self, method, url, params=None, data=None, headers=None, cookies=None, files=None, auth=None, timeout=None, allow_redirects=True, proxies=None, hooks=None, stream=None, verify=None, cert=None, json=None): """Constructs a :class:`Request <Request>`, prepares it and sends it. Returns :class:`Response <Response>` object. :rtype: requests.Response """ # Create the Request. req = Request( method=method.upper(), url=url, headers=headers, files=files, data=data or {}, json=json, params=params or {}, auth=auth, cookies=cookies, hooks=hooks, ) prep = self.prepare_request(req) proxies = proxies or {} settings = self.merge_environment_settings( prep.url, proxies, stream, verify, cert ) # Send the request. send_kwargs = { 'timeout': timeout, 'allow_redirects': allow_redirects, } send_kwargs.update(settings) resp = self.send(prep, **send_kwargs) return resp
若想修改该参数只能在这之后,而能拿到req也就是 prep 参数的只有本身的 request函数 和 send函数了,而request函数逻辑太重,何不接管 send函数呢,说做就做。
import requestsclass TrickUrlSession(requests.Session): def setUrl(self, url): self._trickUrl = url def send(self, request, **kwargs): if self._trickUrl: request.url = self._trickUrl return requests.Session.send(self, request, **kwargs)'''使用方法'''session = TrickUrlSession()session.setUrl(url)session.get(url)
这样就可以以最小的代价达到目的了,使用也很方便,但如果是多线程的话,则必须每个线程一个session,这样达不到共享连接池的效果,我们可以稍作修改,线程共用session,每个线程单独保存自己的trickUrl:
import requestsimport threading localData = threading.local()class TrickUrlSession(requests.Session): def send(self, request, **kwargs): if hasattr(localData, 'trickUrl') and localData.trickUrl: request.url = localData.trickUrl return requests.Session.send(self, request, **kwargs) '''使用方法'''session = TrickUrlSession()localData.trickUrl = url session.get(url)
问题解决,可以悠然的在多线程间共享连接池从数据万象下载图片了呢。
https://cloud.tencent.com/developer/article/1394648