I'm trying to handle an asynchronous HTTP request. I call the async_provider() function from another module and with the resulting response.text() I perform subsequent tasks.
It only works if all requests are successful. But I can't handle any exceptions for failed requests (whatever the reason for the exception). Thank you for your help.
Here is the relevant part of the code:
import asyncio
import aiohttp
# i call this function from another module
def async_provider():
list_a, list_b = asyncio.run(main())
return list_a, list_b
async def fetch(session, url):
# session.post request cases
if url == "http://...1":
referer = "http://...referer"
user_agent = (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) "
"AppleWebKit/605.1.15 (KHTML, like Gecko) "
"Version/12.1 Safari/605.1.15"
)
payload = {'key1': 'value1', 'key2': 'value2'}
async with session.post(
url, data=payload, headers={"Referer": referer, "User-Agent": user_agent}
) as response:
if response.status != 200:
response.raise_for_status()
return await response.text()
# session.get request cases
else:
async with session.get(url) as response:
if response.status != 200:
response.raise_for_status()
return await response.text()
async def fetch_all(session, urls):
results = await asyncio.gather(
*[asyncio.create_task(fetch(session, url)) for url in urls]
)
return results
async def main():
urls = ["http://...1", "http://...2", "http://...3"]
async with aiohttp.ClientSession() as session:
response_text_1, response_text_2, response_text_3 = await fetch_all(
session, urls
)
# some task with response text
Any exception breaks all requests
Check "return_exceptions" flag on gather.
results = await asyncio.gather(
*[asyncio.create_task(fetch(session, url)) for url in urls],
return_exceptions=True
)
It will return you list of finished tasks. You can then use their Task.result() or
Task.exception() methods to reraise or check if there was exception.
Related
I am trying to use the RequestSetIntercept function to
quicken the loading of webpage with Pyppeteer.
However I am getting the warning:
RuntimeWarning: coroutine 'block_image' was never awaited
I can't figure out where I am missing an await.
I've added awaits withing the intercept function itself following a template I've found online. I am testing out the
setIntercept function with Pyppeeteer.
Thank you.
#utils.py
class MakeRequest():
ua = User_Agent()
async def _proxy_browser(self, url,
headless = False,
intercept_func = None,
proxy = True,
**kwargs):
if proxy:
args = [*proxy*
'--ignore-certificate-errors']
else:
args = ['--ignore-certificate-errors']
for i in range(3):
try:
browser = await launch(headless = headless,
args = args,
defaultViewport = None)
page = await browser.newPage()
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:83.0) Gecko/20100101 Firefox/83.0')
if intercept_func is not None:
await page.setRequestInterception(True)
page.on('request', intercept_func)
await page.goto(url, {'waitUntil' : 'load', 'timeout': 0 })
content = await page.content()
return content
except (pyppeteer.errors.PageError,
pyppeteer.errors.TimeoutError,
pyppeteer.errors.BrowserError,
pyppeteer.errors.NetworkError) as e:
print('error', e)
time.sleep(2)
continue
finally:
await browser.close()
return
scraper.py:
REQUESTER = MakeRequest()
async def block_image(request):
if request.url.endswith('.png') or request.url.endswith('.jpg'):
print(request.url)
await request.abort()
else:
await request.continue_()
def get_request(url):
for i in range(3):
response = REQUESTER.proxy_browser_request(url = url,
headless = False,
intercept_func = block_image)
if response:
return response
else:
print(f'Attempt {i +1} : {url}links not found')
print('retrying...')
time.sleep(3)
Your function block_image is a coroutine, but the callback passed to page.on is expected to be a normal function. Try writing a synchronous lambda function that wraps the coroutine in a Task (thus scheduling it on the current event loop):
if intercept_func is not None:
await page.setRequestInterception(True)
page.on('request', lambda request: asyncio.create_task(intercept_func(request)))
There's an example of this kind of code in the Pyppteer documentation here. If you're using an older version of Python (<3.7), use asyncio.ensure_future instead of asyncio.create_task (as is done in the example in the docs).
I am trying to understand aiohttp a little better. Can someone check why my code is not printing the response of the request, instead it just prints the coroutine.
import asyncio
import aiohttp
import requests
async def get_event_1(session):
url = "https://stackoverflow.com/"
headers = {
'content-Type': 'application/json'
}
response = await session.request('GET', url)
return response.json()
async def get_event_2(session):
url = "https://google.com"
headers = {
'content-Type': 'application/json'
}
response = await session.request('GET', url)
return response.json()
async def main():
async with aiohttp.ClientSession() as session:
return await asyncio.gather(
get_event_1(session),
get_event_2(session)
)
loop = asyncio.get_event_loop()
x = loop.run_until_complete(main())
loop.close()
print(x)
Output:
$ python async.py
[<coroutine object ClientResponse.json at 0x10567ae60>, <coroutine object ClientResponse.json at 0x10567aef0>]
sys:1: RuntimeWarning: coroutine 'ClientResponse.json' was never awaited
How do i print the responses instead?
The error message you received is informing you that a coroutine was never awaited.
You can see from the aiohttp documentation that response.json() is a also a coroutine and therefore must be awaited. https://docs.aiohttp.org/en/stable/client_quickstart.html#json-response-content
return await response.json()
I would simply like to associate responses from aiohttp asynchronous HTTP requests with an identifier. I am using the following code to hit the API and extract contactproperty object which requires an external field (contacid) in order to call its API:
def get_contact_properties(self, office_name, api_key, ids, chunk_size=100, **params):
properties_pages = []
batch = 0
while True:
chunk_ids = [ids[i] for i in range(batch * chunk_size + 1, chunk_size * (1 + batch) + 1)]
urls = ["{}/{}".format(self.__get_base_url(), "contacts/{}/properties?api_key={}".format(contactid, api_key))
for contactid in chunk_ids]
responses_raw = self.get_responses(urls, self.get_office_token(office_name), chunk_size)
try:
responses_json = [json.loads(response_raw) for response_raw in responses_raw]
except Exception as e:
print(e)
valid_responses = self.__get_valid_contact_properties_responses(responses_json)
properties_pages.append(valid_responses)
if len(valid_responses) < chunk_size: # this is how we know there are no more pages with data
break
else:
batch = batch + 1
ids is a list of ids. The problem is that I do not know which response corresponds to which id so that later I can link it to contact entity using contacid. This is my fetch() function so I was wondering how to edit this function to return the contactid along with output.
async def __fetch(self, url, params, session):
async with session.get(url, params=params) as response:
output = await response.read()
return (output)
async def __bound_fetch(self, sem, url, params, session):
# Getter function with semaphore.
async with sem:
output = await self.__fetch(url, params, session)
return output
You can return the url (or whatever key identifies your request) together with the output.
Regarding using the data, I think you should read the response directly as JSON, especially since aiohttp can do this for you automatically.
async def __fetch(self, url, params, session):
async with session.get(url, params=params) as response:
try:
data = await response.json()
except ValueError as exc:
print(exc)
return None
return data
async def __bound_fetch(self, sem, url, params, session):
# Getter function with semaphore.
async with sem:
output = await self.__fetch(url, params, session)
return {"url": url, "data": data}
You did not post the get_responses function but I'm guessing something like this should work:
responses = self.get_responses(urls, self.get_office_token(office_name), chunk_size)
Responses will be a list of {"url": url, data: "data"} (data can be None for invalid responses); however with the code above one invalid request will not affect the others.
I have 2 URLs and 60k+ requests. Basically, I need to post every request to both URLs, then compare their responses, but not to wait for the response to post another request.
I've tried to do it with aiohttp and asyncio
import asyncio
import time
import aiohttp
import os
from aiofile import AIOFile
testURL = ""
prodURL = ""
directoryWithRequests = ''
directoryToWrite = ''
headers = {'content-type': 'application/soap+xml'}
i = 1
async def fetch(session, url, reqeust):
global i
async with session.post(url=url, data=reqeust.encode('utf-8'), headers=headers) as response:
if response.status != 200:
async with AIOFile(directoryToWrite + str(i) + '.xml', 'w') as afp:
await afp.write(reqeust)
i += 1
return await response.text()
async def fetch_all(session, urls, request):
results = await asyncio.gather(*[asyncio.create_task(fetch(session, url, request)) for url in urls])
return results
async def asynchronousRequests(requestBody):
urls = [testURL, prodURL]
global i
with open(requestBody) as my_file:
body = my_file.read()
async with aiohttp.ClientSession() as session:
htmls = await fetch_all(session, urls, body)
# some conditions
async def asynchronous():
try:
start = time.time()
futures = [asynchronousRequests(directoryWithRequests + i) for i in os.listdir(directoryWithRequests)]
for future in asyncio.as_completed(futures):
result = await future
print("Process took: {:.2f} seconds".format(time.time() - start))
except Exception as e:
print(str(e))
if __name__ == '__main__':
try:
# AsyncronTest
ioloop = asyncio.ProactorEventLoop()
ioloop.run_until_complete(asynchronous())
ioloop.close()
if i == 1:
print('Regress is OK')
else:
print('Number of requests to check = {}'.format(i))
except Exception as e:
print(e)
I believe that the code above works, but it creates N futures, where the N equals to the number of request files. This brings to sort of ddos because the server can't response to that number of requests at the same time.
Found suitable solution. Basically it's just 2 async tasks:
tasks = [
postRequest(testURL, client, body),
postRequest(prodURL, client, body)
]
await asyncio.wait(tasks)
It's not the same performance as the code in the question with afortable number of requests, but as least it doesn't ddos the server that much.
How can I load Zip file with GET request?
I use asyncio and aiohttp in my Python app.
That's my code:
async def fetch_page(session, url):
with aiohttp.Timeout(10):
async with session.get(url) as response:
assert response.status == 200
return await response.read()
loop = asyncio.get_event_loop()
links = ['http://www2.census.gov/census_2010/04-Summary_File_1/Louisiana/la2010.sf1.zip']
for link in links:
with aiohttp.ClientSession(loop=loop) as session:
response = loop.run_until_complete(fetch_page(session, url=link))
print(type(response))
And then I get asyncio.TimeoutError