How to make multiple REST calls asynchronous in python3 - python-3.x

I have the following code to make multiple REST calls. Basically I have a dictionary where key is a string and value is a JSON date that I need to use as payload to pass to a REST API POST method.
At the moment, the dictionary contains 10 entries, so I need to make 10 REST calls.
At the moment, I have implemented using requests package in python3 which is synchronous in nature. So after 1 REST call, it waits for its response and similarly for 10 REST calls, it will wait 10 times for the response from API.
def createCategories(BACKEND_URL, token, category):
url = os.path.join(BACKEND_URL, 'api/v1/category-creation')
category_dict = read_payloads(category)
headers = {
"token": f'{token}',
"Content-Type": "application/json",
"accept": "application/json"
}
for name, category_payload in category_dict.items():
json_payload = json.dumps(category_payload)
response = requests.request("POST", url, headers=headers, data=json_payload)
##########################
## Load as string and parsing
response_data = json.loads(response.text)
print(response_data)
category_id = response_data['id']
message = 'The entity with id: ' + str(category_id) + ' is created successfully. '
logging.info(message)
return "categories created successfully."
I read that we need to use asyncio to make these asynchronous. What code changes do I need to make?

You can continue using requests library. You need to use threading or concurrent.futures modules to make several requests simutaneoudly.
Another option is to use some async library like aiohttp or some others.
import requests
from threading import current_thread
from concurrent.futures import ThreadPoolExecutor, Future
from time import sleep, monotonic
URL = "https://api.github.com/events"
def make_request(url: str) -> int:
r = requests.get(url)
sleep(2.0) # wait n seconds
return r.status_code
def done_callback(fut: Future):
if fut.exception():
res = fut.exception()
print(f"{current_thread().name}. Error: {res}")
elif fut.cancelled():
print(f"Task was canceled")
else:
print(f"{current_thread().name}. Result: {fut.result()}")
if __name__ == '__main__':
urls = [URL for i in range(20)] # 20 tasks
start = monotonic()
with ThreadPoolExecutor(5) as pool:
for i in urls:
future_obj = pool.submit(make_request, i)
future_obj.add_done_callback(done_callback)
print(f"Time passed: {monotonic() - start}")

Related

parallelizing a Python function?

I have a function that submits a search job to a REST API, waits for the API to respond, then downloads 2 sets of JSON data, converts the both JSON's into Pandas dataframes, and returns both dataframes. below is a very simplified version of the function(minus error handling, logging, data scrubbing, etc...)
def getdata(searchstring, url, uname, passwd):
headers = {'content-type': 'application/json'}
json_data = CreateJSONPayload(searchstring)
rPOST = requests.post(url, auth=(uname, passwd), data=json_data, headers=headers)
statusURL = (str(json.loads(rPOST.text)[u'link'][u'href']))
Processing = True
while Processing == True:
rGET = requests.get(statusURL, auth=(uname, passwd))
if rGET.status_code== 200:
url1 = url + "/dataset1"
url2 = url + "/dataset2"
rGET1 = requests.get(url1, auth=(uname, passwd))
rGET2 = requests.get(url2, auth=(uname, passwd))
dfData1 = pd.read_json(rGET1.text)
dfData2 = pd.read_json(rGET2.text)
Processing = False
elif StatusCode == "Other return code handling":
print("handle errors") # Not relevant to question.
else:
sleep(15)
return dfData1, dfData2
The function itself works as expected. However the API being called can take anywhere from a couple of minutes to an hour to return the data depending on the parameters I pass to it and I need to submit multiple searches to it, so I'd rather not submit each search one after the other.
What's the best way to parallelize the calling of a function like this so that I can submit multiple requests to it at the same time, wait for all calls of the function have returned data, and finally continue on with data processing in the script?
I also need to be able to throttle the requests too, as the API rate limits me to no more than 15 concurrent connections at a time.

Using request_mock to dynamically set response based on request

I am trying to mock a simple POST request that creates a resource from the request body, and returns the resource that was created. For simplicity, let's assume the created resource is exactly as passed in, but given an ID when created. Here is my code:
def test_create_resource(requests_mock):
# Helper function to generate dynamic response
def get_response(request, context):
context.status_code = 201
# I assumed this would contain the request body
response = request.json()
response['id'] = 100
return response
# Mock the response
requests_mock.post('test-url/resource', json=get_response)
resource = function_that_creates_resource()
assert resource['id'] == 100
I end up with runtime error JSONDecodeError('Expecting value: line 1 column 1 (char 0)'). I assume this is because request.json() does not contain what I am looking for. How can I access the request body?
I had to hack up your example a little bit as there is some information missing - but the basic idea works fine for me. I think as mentioned something is wrong with the way you're creating the post request.
import requests
import requests_mock
with requests_mock.mock() as mock:
# Helper function to generate dynamic response
def get_response(request, context):
context.status_code = 201
# I assumed this would contain the request body
response = request.json()
response['id'] = 100
return response
# Mock the response
mock.post('http://example.com/test-url/resource', json=get_response)
# resource = function_that_creates_resource()
resp = requests.post('http://example.com/test-url/resource', json={'a': 1})
assert resp.json()['id'] == 100
This example is not complete and so we cannot truly see what is happening.
In particular, it would be useful to see a sample function_that_creates_resource.
That said, I think your get_response code is valid.
I believe that you are not sending valid JSON data in your post request in function_that_creates_resource.

Scrapy does not respect LIFO

I use Scrapy 1.5.1
My Goal is to go through entire chain of requests for each variable before moving to the next variable. For some reason Scrapy takes 2 variables, then sends 2 requests, then takes another 2 variables and so on.
CONCURRENT_REQUESTS = 1
Here is my code sample:
def parsed ( self, response):
# inspect_response(response, self)
search = response.meta['search']
for idx, i in enumerate(response.xpath("//table[#id='ctl00_ContentPlaceHolder1_GridView1']/tr")[1:]):
__EVENTARGUMENT = 'Select${}'.format(idx)
data = {
'__EVENTARGUMENT': __EVENTARGUMENT,
}
yield scrapy.Request(response.url, method = 'POST', headers = self.headers, body = urlencode(data),callback = self.res_before_get,meta = {'search' : search}, dont_filter = True)
def res_before_get ( self, response):
# inspect_response(response, self)
url = 'http://www.moj-yemen.net/Search_detels.aspx'
yield scrapy.Request(url, callback = self.results, dont_filter = True)
My desired behavior is:
1 value from Parse is sent to res_before_get and then i do smth with it.
then another values from Parse is sent to res_before_get and so on.
Post
Get
Post
Get
But currently Scrapy takes 2 values from Parse and adds them to queue , then sends 2 requests from res_before_get. Thus im getting duplicate results.
Post
Post
Get
Get
What do I miss?
P.S.
This is asp.net site. Its logic is as follows:
makes POST request with search payload.
Make GET request to get actual data.
Both request share the same sessionID
Thats why it is important to preserve the order.
At the moment im getting POST1 and POST2. And since the sessionID is associated with POST2, both GET1 and GET2 return the same page.
Scrapy works asynchronously, so you cannot expect it to respect the order of your loops or anything.
If you need it to work sequentially, you'll have to accommodate the callbacks to work like that, for example:
def parse1(self, response):
...
yield Request(..., callback=self.parse2, meta={...(necessary information)...})
def parse2(self, response):
...
if (necessary information):
yield Request(...,
callback=self.parse2,
meta={...(remaining necessary information)...},
)

python3.6 start 1 million requests with aiohttp and asyncio

I'm trying to make 1 million requests with aiohttp and asyncio continuously in 10 times which 10k at each time. When I print the start time of each request, I found that the 1 million requests are NOT start at a very closed time but in serval minutes. In my understanding, the 1 million requests will be sent without any wait(or just say in microseconds?) Hope someone can help me give a suggestion how to change the code, and my code is as below. Thanks in advance!
import asyncio
import requests
import json
import pymysql
from aiohttp import ClientSession
from datetime import datetime
import uvloop
# login config
URL_LOGIN = "https://test.com/user/login"
APP_ID = "sample_app_id"
APP_SECRET = "sample_secret"
async def login_user(phone, password, session, i):
start_time = datetime.now()
h = {
"Content-Type": "application/json"
}
data = {
"phone": phone,
"password": password,
"appid": APP_ID,
"appsecret": APP_SECRET
}
try:
async with session.post(url=URL_LOGIN, data=json.dumps(data), headers=h) as response:
r = await response.read()
end_time = datetime.now()
cost = (end_time-start_time).seconds
msg = "number %d request,start_time:%s, cost_time: %d, response: %s\n" % (i, start_time, cost, r.decode())
print("running %d" % i, datetime.now())
except Exception as e:
print("running %d" % i)
msg = "number %d request raise error" % i+str(e)+"\n"
with open("log", "a+") as f:
f.write(msg)
async def bound_login(sem, phone, password, session, i):
async with sem:
await login_user(phone, password, session, i)
async def run_login(num):
tasks = []
sem = asyncio.Semaphore(10000)
async with ClientSession() as session:
for i in range(num):
task = asyncio.ensure_future(bound_login(sem, str(18300000000+i), "123456", session, i))
tasks.append(task)
responses = asyncio.gather(*tasks)
await responses
start = datetime.now()
number = 100000
loop = uvloop.new_event_loop()
asyncio.set_event_loop(loop)
future = asyncio.ensure_future(run_login(number))
When I print the start time of each request, I found that the 1 million requests are NOT start at a very closed time but in serval minutes.
Your code does issue a total of 1 million requests, but with the constraint that no more than 10 thousand of them runs in parallel at any given time. This is like having 10k request slots at your disposal - the first 10,000 requests will be started immediately, but the 10,001st will have to wait for a previous request to finish so it can get a free slot.
This is why 1 million requests cannot start instantaneously or near-instantaneously, most of them have to wait for some download to finish, and that takes time.
In my understanding, the 1 million requests will be sent without any wait
The current code explicitly makes the requests wait in order to prevent more than 10k of them running in parallel. If you really want to (try to) make a million parallel requests, remove the semaphore and create the ClientSession using a connector with limit set to None.
However, be aware that maintaining a million open connections will likely not work due to limits of the operating system and the hardware. (You should still be able to start the connections near-instantaneously, but I'd expect most of them to exit with an exception shortly afterwards.)

python grequets to get time for each http response individually

I have written a python script using grequests to send http requests to server. The problem is that I need to get response time of each request. I have used hooks but still i can't find a single method to display exact response time. I used time.time() but I cant keep track of each request.
Below is the code.
def do_something(response, *args, **kwargs):
print('Response: ', response.text)
roundtrip = time.time() - start
print (roundtrip)
urls = ["http://192.168.40.122:35357/v2.0/tokens"]*100
while True:
payload = {some_payload}
start = time.time()
unsent_request = (grequests.post(u, hooks={'response': do_something}, json=payload) for u in urls)
print(unsent_request)
print(grequests.map(unsent_request, size=100))
grequests is just a wrapper around requests library. Just use the .elapsed attribute from the latest library, this way:
response_list = grequests.map(unsent_request, size=100)
for response in response_list:
print(response.elapsed and response.elapsed.total_seconds() or "failed")

Resources