Getting a file without saving it aiohttp discord.py - python-3.x

So i have a command where it sends whatever the user said after the command to a website api and sends the file the site generates. However i'm changing over to aiohttp as it doesn't block like the standered requests functions
This is how i do it with normal requests and it works fine:
elif (data[0].lower() == ">signgirl"):
await bot.send_typing(message.channel)
tmp = message.content.replace(">signgirl", "")
m = hashlib.md5()
m.update(tmp.encode('utf-8'))
print(tmp, m.hexdigest())
r = requests.post("http://localhost/sign.php", stream=True, data={'text': tmp})
if (r.status_code() == 200):
await bot.send_file(destination=message.channel, filename=str(m.hexdigest()+".png"), fp=r.raw)
However when i try with aiohttp i have no idea how to actually get the raw file data..
So i made this function to get it. but it doesn't let me return an image and i cannot check the http status code without it causing an error.
async def post_data2(url, payload):
async with aiohttp.ClientSession() as session2:
async with session2.post(url, data=payload) as response2:
output = {}
output['data'] = await Image.open(BytesIO(response2.read()))
output['status'] = 200 #await str(response2.status()) #Why is this object not callable?
return output
How else could i do this? Is this possible? aiohttp doesn't seem as easy to understand.

Mister Day "V" Own from the discord.py discord server sent a perfect example of getting and sending the data
async with aiohttp.ClientSession() as session:
# or use a session you already have
async with session.get("http://example.com") as resp:
buffer = io.BytesIO(await resp.read())
# buffer is a file-like
await client.send_file(channel, fp=buffer, filename="whatever")

Related

using httpx to send 100K get requests

I'm using the httpx library and asyncio to try and send about 100K of get requests.
I ran the code and received httpx.ConnectError so I opened wireshark and saw that I was getting a lot of messages saying TCP Retransmission TCP Port numbers reused
when I saw the data in wireshark and the error httpx.ConnectError I added limits = httpx.Limits(max_connections=10000) to limit the amount of active connections to 10,000 but I still get that error.
my code:
import asyncio
import json
import httpx
SOME_URL = "some url"
ANOTHER_URL = "another url"
MAX = 10000
async def search():
guids = [guid for guid in range(688001, 800000)] # 688001 - 838611
timeout = httpx.Timeout(None)
limits = httpx.Limits(max_connections=MAX)
async with httpx.AsyncClient(timeout=timeout, limits=limits) as client:
tasks = [client.get(f"{SOME_URL}{guid}", timeout=timeout) for guid in guids]
blob_list = await asyncio.gather(*tasks) # <---- error from here !!!!!
blob_list = [(res, guid) for res, guid in zip(blob_list, guids)]
guids = [guid for res, guid in blob_list]
blob_list = [json.loads(res.text)["blob_name"] for res, guid in blob_list]
async with httpx.AsyncClient(timeout=timeout, limits=limits) as client:
tasks = [client.get(f"{ANOTHER_URL}{blob}", timeout=timeout) for blob in blob_list]
game_results = await asyncio.gather(*tasks) # <---- error from here !!!!!
game_results = [(res, guid) for res, guid in zip(game_results, guids)]
game_results = [guid for res, guid in game_results]
print(game_results)
def main():
asyncio.run(search())
if __name__ == '__main__':
main()
this is a minimal version of my code there some steps in between the requests that I deleted, but I didn't touch the code that made the trouble, there are comments on the lines that I receive the errors (# <---- error from here !!!!!).
does anyone know how to solve this? or another way to send about 100K of get requests fast?
I managed to solve my problem with the following code:
(this is not the entire code, only the parts needed to send the requests, I have some stuff in between)
import asyncio
from aiohttp import ClientSession
SOME_URL = "some url"
ANOTHER_URL = "another url"
MAX_SIM_CONNS = 50
worker_responses = []
async def fetch(url, session):
async with session.get(url) as response:
return await response.read()
async def fetch_worker(url_queue: asyncio.Queue):
global worker_responses
async with ClientSession() as session:
while True:
url = await url_queue.get()
try:
if url is None:
return
response = await fetch(url, session)
worker_responses.append(response)
finally:
url_queue.task_done()
# calling task_done() is necessary for the url_queue.join() to work correctly
async def fetch_all(base_url: str, range_: range):
url_queue = asyncio.Queue(maxsize=10000)
worker_tasks = []
for i in range(MAX_SIM_CONNS):
wt = asyncio.create_task(fetch_worker(url_queue))
worker_tasks.append(wt)
for i in range_:
await url_queue.put(f"{base_url}{i}")
for i in range(MAX_SIM_CONNS):
# tell the workers that the work is done
await url_queue.put(None)
await url_queue.join()
await asyncio.gather(*worker_tasks)
if __name__ == '__main__':
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
asyncio.run(fetch_all(SOME_URL, range(680_842, 840_423)))
print(worker_responses)
I used aiohttp instead of httpx and used asyncio.Queue to reduce RAM usage and it worked for me.

instead of returning an image praw returns r/memes/hot

I want my discord.py bot to send a meme from hot posts of r/memes via PRAW. After this issue, I tried searching in the web and in the documentations, but I didn't find any method to view the image. Here is my code:
import praw
import discord
from discord.ext import commands
from discord import client
reddit = praw.Reddit(client_id="d",
client_secret="d",
user_agent="automoderatoredj by /u/taskuratik")
#boot
print("il bot si sta avviando... ")
token = "token"
client = commands.Bot(command_prefix=("/"))
#bot online
#client.event
async def on_ready():
print("il bot e' ora online")
#client.command()
async def meme(submission):
if reddit:
channel = client.get_channel(722491234991472742)
submission = reddit.subreddit("memes").hot(limit=1)
await channel.send(submission.url)
client.run(token)
Your code says:
submission = reddit.subreddit("memes").hot(limit=1)
await channel.send(submission.url)
Here, you assign a listing of one post to submission. As listing is an iterable (somewhat like a list) that contains one submission, rather than the submission itself. Unlike a list, you can't use an index to access a specific item, but there are other ways to get it. One way to get the submission is
for submission in reddit.subreddit("memes").hot(limit=1):
await channel.send(submission.url)
This allows you to change the limit and send more posts if you want.
Or, you could use next() to get the next (and only) item from the post listing:
submission = next(reddit.subreddit("memes").hot(limit=1))
await channel.send(submission.url)
This will always send just one submission, even if you change the limit parameter.
PRAW is blocking, aiohttp is not and frankly discord.py comes with aiohttp. Reddit offers an endpoint to return json data that you can prase with the json.loads() method to get raw json.
This is something I wrote to fetch images from subreddits
from aiohttp import ClientSession
from random import choice as c
from json import loads
async def get(session: object, url: object) -> object:
async with session.get(url) as response:
return await response.text()
async def reddit(sub: str):
type = ['new', 'top', 'hot', 'rising']
url = f"https://www.reddit.com/r/{sub}/{c(type)}.json?sort={c(type)}&limit=10"
async with ClientSession() as session:
data = await get(session, url)
data = loads(data)
data = data['data']['children']
url = [d['data']['url'] for d in data]
return c(url)
All you need to do is await reddit(sub= 'memes') to get the required url.

aiohttp says undisclosed client session even after awaiting on close

Consider following code fragment.
async def f():
http_client_session = aiohttp.ClientSession()
headers = {"developerkey": "somekey"}
body = {
"password": "somepassword",
"username": "someemail#gmail.com",
}
url = "https://localhost/login"
response_body = None
async with http_client_session.post(url, json=body, headers=headers) as response:
assert response.status == 200
response_body = await response.json()
await http_client_session.close()
return response_body()
The function f is awaited in another function. aiohttp gives the warning 'Unclosed client session' but I do not understand this as I have already awaited for it to close the session.
I've seen a similar issue previously in another project. Turns out it may be just that you need to allow some time for the session to fully close. For my project I added time.sleep(5) after the close statement to let the connection end.
See this ticket on aiohttp: https://github.com/aio-libs/aiohttp/issues/1925

Handling ensure_future and its missing tasks

I have a streaming application that almost continuously takes the data given as input and sends an HTTP request using that value and does something with the returned value.
Obviously to speed things up I've used asyncio and aiohttp libraries in Python 3.7 to get the best performance, but it becomes hard to debug given how fast the data moves.
This is what my code looks like
'''
Gets the final requests
'''
async def apiRequest(info, url, session, reqType, post_data=''):
if reqType:
async with session.post(url, data = post_data) as response:
info['response'] = await response.text()
else:
async with session.get(url+post_data) as response:
info['response'] = await response.text()
logger.debug(info)
return info
'''
Loops through the batches and sends it for request
'''
async def main(data, listOfData):
tasks = []
async with ClientSession() as session:
for reqData in listOfData:
try:
task = asyncio.ensure_future(apiRequest(**reqData))
tasks.append(task)
except Exception as e:
print(e)
exc_type, exc_obj, exc_tb = sys.exc_info()
fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
print(exc_type, fname, exc_tb.tb_lineno)
responses = await asyncio.gather(*tasks)
return responses #list of APIResponses
'''
Streams data in and prepares batches to send for requests
'''
async def Kconsumer(data, loop, batchsize=100):
consumer = AIOKafkaConsumer(**KafkaConfigs)
await consumer.start()
dataPoints = []
async for msg in consumer:
try:
sys.stdout.flush()
consumedMsg = loads(msg.value.decode('utf-8'))
if consumedMsg['tid']:
dataPoints.append(loads(msg.value.decode('utf-8')))
if len(dataPoints)==batchsize or time.time() - startTime>5:
'''
#1: The task below goes and sends HTTP GET requests in bulk using aiohttp
'''
task = asyncio.ensure_future(getRequests(data, dataPoints))
res = await asyncio.gather(*[task])
if task.done():
outputs = []
'''
#2: Does some ETL on the returned values
'''
ids = await asyncio.gather(*[doSomething(**{'tid':x['tid'],
'cid':x['cid'], 'tn':x['tn'],
'id':x['id'], 'ix':x['ix'],
'ac':x['ac'], 'output':to_dict(xmltodict.parse(x['response'],encoding='utf-8')),
'loop':loop, 'option':1}) for x in res[0]])
simplySaveDataIntoDataBase(id) # This is where I see some missing data in the database
dataPoints = []
except Exception as e:
logger.error(e)
logger.error(traceback.format_exc())
exc_type, exc_obj, exc_tb = sys.exc_info()
fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
logger.error(str(exc_type) +' '+ str(fname) +' '+ str(exc_tb.tb_lineno))
if __name__ == '__main__':
loop = asyncio.get_event_loop()
asyncio.ensure_future(Kconsumer(data, loop, batchsize=100))
loop.run_forever()
Does the ensure_future need to be awaited ?
How does aiohttp handle requests that come a little later than the others? Shouldn't it hold the whole batch back instead of forgetting about it altoghter?
Does the ensure_future need to be awaited ?
Yes, and your code is doing that already. await asyncio.gather(*tasks) awaits the provided tasks and returns their results in the same order.
Note that await asyncio.gather(*[task]) doesn't make sense, because it is equivalent to await asyncio.gather(task), which is again equivalent to await task. In other words, when you need the result of getRequests(data, dataPoints), you can write res = await getRequests(data, dataPoints) without the ceremony of first calling ensure_future() and then calling gather().
In fact, you almost never need to call ensure_future yourself:
if you need to await multiple tasks, you can pass coroutine objects directly to gather, e.g. gather(coroutine1(), coroutine2()).
if you need to spawn a background task, you can call asyncio.create_task(coroutine(...))
How does aiohttp handle requests that come a little later than the others? Shouldn't it hold the whole batch back instead of forgetting about it altoghter?
If you use gather, all requests must finish before any of them return. (That is not aiohttp policy, it's how gather works.) If you need to implement a timeout, you can use asyncio.wait_for or similar.

How to gather task results in Trio?

I wrote a script that uses a nursery and the asks module to loop through and call an API based upon the loop variables. I get responses but don't know how to return the data like you would with asyncio.
I also have a question on limiting the APIs to 5 per second.
from datetime import datetime
import asks
import time
import trio
asks.init("trio")
s = asks.Session(connections=4)
async def main():
start_time = time.time()
api_key = 'API-KEY'
org_id = 'ORG-ID'
networkIds = ['id1','id2','idn']
url = 'https://api.meraki.com/api/v0/networks/{0}/airMarshal?timespan=3600'
headers = {'X-Cisco-Meraki-API-Key': api_key, 'Content-Type': 'application/json'}
async with trio.open_nursery() as nursery:
for i in networkIds:
nursery.start_soon(fetch, url.format(i), headers)
print("Total time:", time.time() - start_time)
async def fetch(url, headers):
print("Start: ", url)
response = await s.get(url, headers=headers)
print("Finished: ", url, len(response.content), response.status_code)
if __name__ == "__main__":
trio.run(main)
When I run nursery.start_soon(fetch...) , I am printing data within fetch, but how do I return the data? I didn't see anything similar to asyncio.gather(*tasks) function.
Also, I can limit the number of sessions to 1-4, which helps get down below the 5 API per second limit, but was wondering if there was a built in way to ensure that no more than 5 APIs get called in any given second?
Returning data: pass the networkID and a dict to the fetch tasks:
async def main():
…
results = {}
async with trio.open_nursery() as nursery:
for i in networkIds:
nursery.start_soon(fetch, url.format(i), headers, results, i)
## results are available here
async def fetch(url, headers, results, i):
print("Start: ", url)
response = await s.get(url, headers=headers)
print("Finished: ", url, len(response.content), response.status_code)
results[i] = response
Alternately, create a trio.Queue to which you put the results; your main task can then read the results from the queue.
API limit: create a trio.Queue(10) and start a task along these lines:
async def limiter(queue):
while True:
await trio.sleep(0.2)
await queue.put(None)
Pass that queue to fetch, as another argument, and call await limit_queue.get() before each API call.
Based on this answers, you can define the following function:
async def gather(*tasks):
async def collect(index, task, results):
task_func, *task_args = task
results[index] = await task_func(*task_args)
results = {}
async with trio.open_nursery() as nursery:
for index, task in enumerate(tasks):
nursery.start_soon(collect, index, task, results)
return [results[i] for i in range(len(tasks))]
You can then use trio in the exact same way as asyncio by simply patching trio (adding the gather function):
import trio
trio.gather = gather
Here is a practical example:
async def child(x):
print(f"Child sleeping {x}")
await trio.sleep(x)
return 2*x
async def parent():
tasks = [(child, t) for t in range(3)]
return await trio.gather(*tasks)
print("results:", trio.run(parent))
Technically, trio.Queue has been deprecated in trio 0.9. It has been replaced by trio.open_memory_channel.
Short example:
sender, receiver = trio.open_memory_channel(len(networkIds)
async with trio.open_nursery() as nursery:
for i in networkIds:
nursery.start_soon(fetch, sender, url.format(i), headers)
async for value in receiver:
# Do your job here
pass
And in your fetch function you should call async sender.send(value) somewhere.
When I run nursery.start_soon(fetch...) , I am printing data within fetch, but how do I return the data? I didn't see anything similar to asyncio.gather(*tasks) function.
You're asking two different questions, so I'll just answer this one. Matthias already answered your other question.
When you call start_soon(), you are asking Trio to run the task in the background, and then keep going. This is why Trio is able to run fetch() several times concurrently. But because Trio keeps going, there is no way to "return" the result the way a Python function normally would. where would it even return to?
You can use a queue to let fetch() tasks send results to another task for additional processing.
To create a queue:
response_queue = trio.Queue()
When you start your fetch tasks, pass the queue as an argument and send a sentintel to the queue when you're done:
async with trio.open_nursery() as nursery:
for i in networkIds:
nursery.start_soon(fetch, url.format(i), headers)
await response_queue.put(None)
After you download a URL, put the response into the queue:
async def fetch(url, headers, response_queue):
print("Start: ", url)
response = await s.get(url, headers=headers)
# Add responses to queue
await response_queue.put(response)
print("Finished: ", url, len(response.content), response.status_code)
With the changes above, your fetch tasks will put responses into the queue. Now you need to read responses from the queue so you can process them. You might add a new function to do this:
async def process(response_queue):
async for response in response_queue:
if response is None:
break
# Do whatever processing you want here.
You should start this process function as a background task before you start any fetch tasks so that it will process responses as soon as they are received.
Read more in the Synchronizing and Communicating Between Tasks section of the Trio documentation.

Resources