So requests is a great library and often I used it like so:
payload = {
# ...
}
results = requests.get(some_url, params=payload)
and requests mangles all the key value pairs into the uri and goes ahead and makes the GET request.
Is there a way to construct the url of results.url without having to call .get?
Yes, but you will need to use the "raw" Request object and call its prepare method. Then you will be able to grab the prepared request's url attribute.
r = requests.Request('get', 'http://url', params={'a': 1, 'b': 2})
prepared_r = r.prepare()
print(prepared_r.url)
# http://url/?a=1&b=2
To make the request you will need a Session object:
s = requests.Session()
s.send(prepared_r)
Related
I'm writing a short Python program to request a JSON file using a Rest API call. The API limits me to a relatively small results set (50 or so) and I need to retrieve several thousand result sets. I've implemented a while loop to achieve this and it's working fairly well but I can't figure out the logic for 'continuing the while loop' until there are no more results to retrieve. Right now I've implemented a hard number value but would like to replace it with a conditional that stops the loop if no more results come back. The 'offset' field is the parameter that the API forces you to use to specify which set of results you want in your 50. My logic looks something like...
import requests
import json
from time import sleep
url = "https://someurl"
offsetValue = 0
PARAMS = {'limit':50, 'offset':offsetValue}
headers = {
"Accept": "application/json"
}
while offsetValue <= 1000:
response = requests.request(
"GET",
url,
headers=headers,
params = PARAMS
)
testfile = open("testfile.txt", "a")
testfile.write(json.dumps(json.loads(response.text), sort_keys=True, indent=4, separators=(",", ": ")))
testfile.close()
offsetValue = offsetValue + 1
sleep(1)
So I want to change the conditional the controls the while loop from a fixed number to a check to see if the result set for the getRequest is empty. Hopefully this makes sense.
Your loop can be while true. After each fetch, convert the payload to a dict. If the number of results is 0, then break.
Depending on how the API works, there may be other signals that there’s nothing more to fetch, e.g. some HTTP error, not necessarily the result count — you’ll have to discover the API’s logic for that.
I am trying to mock a simple POST request that creates a resource from the request body, and returns the resource that was created. For simplicity, let's assume the created resource is exactly as passed in, but given an ID when created. Here is my code:
def test_create_resource(requests_mock):
# Helper function to generate dynamic response
def get_response(request, context):
context.status_code = 201
# I assumed this would contain the request body
response = request.json()
response['id'] = 100
return response
# Mock the response
requests_mock.post('test-url/resource', json=get_response)
resource = function_that_creates_resource()
assert resource['id'] == 100
I end up with runtime error JSONDecodeError('Expecting value: line 1 column 1 (char 0)'). I assume this is because request.json() does not contain what I am looking for. How can I access the request body?
I had to hack up your example a little bit as there is some information missing - but the basic idea works fine for me. I think as mentioned something is wrong with the way you're creating the post request.
import requests
import requests_mock
with requests_mock.mock() as mock:
# Helper function to generate dynamic response
def get_response(request, context):
context.status_code = 201
# I assumed this would contain the request body
response = request.json()
response['id'] = 100
return response
# Mock the response
mock.post('http://example.com/test-url/resource', json=get_response)
# resource = function_that_creates_resource()
resp = requests.post('http://example.com/test-url/resource', json={'a': 1})
assert resp.json()['id'] == 100
This example is not complete and so we cannot truly see what is happening.
In particular, it would be useful to see a sample function_that_creates_resource.
That said, I think your get_response code is valid.
I believe that you are not sending valid JSON data in your post request in function_that_creates_resource.
I use Scrapy 1.5.1
My Goal is to go through entire chain of requests for each variable before moving to the next variable. For some reason Scrapy takes 2 variables, then sends 2 requests, then takes another 2 variables and so on.
CONCURRENT_REQUESTS = 1
Here is my code sample:
def parsed ( self, response):
# inspect_response(response, self)
search = response.meta['search']
for idx, i in enumerate(response.xpath("//table[#id='ctl00_ContentPlaceHolder1_GridView1']/tr")[1:]):
__EVENTARGUMENT = 'Select${}'.format(idx)
data = {
'__EVENTARGUMENT': __EVENTARGUMENT,
}
yield scrapy.Request(response.url, method = 'POST', headers = self.headers, body = urlencode(data),callback = self.res_before_get,meta = {'search' : search}, dont_filter = True)
def res_before_get ( self, response):
# inspect_response(response, self)
url = 'http://www.moj-yemen.net/Search_detels.aspx'
yield scrapy.Request(url, callback = self.results, dont_filter = True)
My desired behavior is:
1 value from Parse is sent to res_before_get and then i do smth with it.
then another values from Parse is sent to res_before_get and so on.
Post
Get
Post
Get
But currently Scrapy takes 2 values from Parse and adds them to queue , then sends 2 requests from res_before_get. Thus im getting duplicate results.
Post
Post
Get
Get
What do I miss?
P.S.
This is asp.net site. Its logic is as follows:
makes POST request with search payload.
Make GET request to get actual data.
Both request share the same sessionID
Thats why it is important to preserve the order.
At the moment im getting POST1 and POST2. And since the sessionID is associated with POST2, both GET1 and GET2 return the same page.
Scrapy works asynchronously, so you cannot expect it to respect the order of your loops or anything.
If you need it to work sequentially, you'll have to accommodate the callbacks to work like that, for example:
def parse1(self, response):
...
yield Request(..., callback=self.parse2, meta={...(necessary information)...})
def parse2(self, response):
...
if (necessary information):
yield Request(...,
callback=self.parse2,
meta={...(remaining necessary information)...},
)
I want to resend the initialized cookies from the first call in the second call, so that the session is not changing. This is not working.
Why? And how can I solve it. Sorry, new in python
https_url = "www.google.com"
r = requests.get(https_url)
print(r.cookies.get_dict())
#cookie = {id: abc}
response = requests.get(https_url, cookies=response.cookies.get_dict())
print(response.cookies.get_dict())
#cookie = {id: def}
You aren't necessarily doing it wrong with the way you're passing the cookies from the last response to the next request, except that:
"www.google.com" is not a valid URL.
Even you had used http://www.google.com as the URL, the cookies returned by Google in such a GET request aren't meant to be session cookies and won't be persistent across requests.
You used the variable r to receive the returning value from the first requests.get, and yet you used response.cookies when you make the second requests.get. A possible typo?
If all of the above are due to your trying to mock up your real code, you should really consider using requests.Session to avoid micro-managing session cookies.
Please read requests.Session's documentation for more details.
import requests
with requests.Session() as s:
r = s.get(https_url)
# cookies from the first s.get are automatically passed on to the second s.get
r = s.get(https_url)
...
I am using aiohttp get request to download some content from another web api
but i am receiving:
exception = TypeError('not a valid non-string sequence or mapping object',)
Following is the data which i am trying to sent.
data = "symbols=LGND-US&exprs=CS_EVENT_TYPE_CD_R(%27%27,%27now%27,%271D%27)"
How to resolve it?
I tried it in 2 ways:
r = yield from aiohttp.get(url, params=data) # and
r = yield from aiohttp.post(url, data=data)
At the same time i am able to fetch data using:
r = requests.get(url, params=data) # and
r = requests.post(url, data=data)
But i need async implementation.
And also suggest me some way if i can use import requests library instead of import aiohttp to make async http request, because in many cases aiohttp post & get request are not working but the same are working for requests.get & post requests.
The docs use bytes (i.e. the 'b' prefix) for the data argument.
r = await aiohttp.post('http://httpbin.org/post', data=b'data')
Also, the params argument should be a dict or a list of tuples.