How do to pagination in Sanic? - pagination

I tried to get the request arguments and wanted to use pagination in sanic. It is sanic motor. Have used mongodb to find the details. i want to use limit and skip.
#app.route("/query", methods=["GET"])
async def query_params(request, count=10, page=1):
a = request.args
count = request.args['count'][0]
page = a.get('page')
students = await Student.find(as_raw=True).skip((page-1) * count).limit(count)
return json_response(students.objects)
It gives error AttributeError: 'coroutine' object has no attribute 'skip' while handling path /query
Same for limit i get this error

This is not really Sanic related. I think the problem is your await is being applied to the whole chain and not what you think. If you use some parentheses you can control it as below.
Note: I'm not familiar with the ORM so I'm not sure if skip and limit also need awaits or not.
students = await (Student.find(as_raw=True)).skip((page-1) * count).limit(count)

Related

RuntimeWarning: coroutine 'NewsExtraction.get_article_data_elements' was never awaited

I have always resisted using asyncio within my code, but using it might help with some performance issues that I'm having.
Here is my scenario:
An end user provides a list of news sites to scrape
Each element is passed to an Article Class
A valid article is passed to an Extraction Class
The Extraction Class passes data to a NewsExtraction Class
90% this of the time this flow is flawless, but on an occasion one of the 12 functions in the NewsExtraction Class fails to extract data, which exist in the HTML provide. It seems that my code is "stepping on itself," which cause the data element not to be parsed. When I rerun the code all the elements are parsed correctly.
The NewsExtraction Class has this function get_article_data_elements, which is called from the Extraction Class.
The function get_article_data_elements call these items:
published_date = self._extract_article_published_date()
modified_date = self._extract_article_modified_date()
title = self._extract_article_title()
description = self._extract_article_description()
keywords = self._extract_article_key_words()
tags = self._extract_article_tags()
authors = self._extract_article_author()
top_image = self._extract_top_image()
language = self._extract_article_language()
categories = self._extract_article_category()
text = self._extract_textual_content()
url = self._extract_article_url()
Each of these data elements are used to populate a Python Dictionary, which is eventually passed back to the End User.
I have been trying to add asyncio code to the NewsExtraction Class, but I kept getting this error message:
RuntimeWarning: coroutine 'NewsExtraction.get_article_data_elements' was never awaited
I have spent the last 3 days trying to figure this issue out. I have looked at dozens of questions on Stack Overflow on this error RuntimeWarning: coroutine never awaited. I have also looked at numerous articles on using asyncio, but I cannot figure out how to use asyncio with my NewsExtraction Class, which is called from the Extraction Class.
Can someone provide me some pointers to solve my issue?
class NewsExtraction(object):
"""
This class is used to extract common data elements from a news article
on xyz
"""
def __init__(self, url, soup):
self._url = url
self._raw_soup = soup
truncated...
async def _extract_article_published_date(self):
"""
This function is designed to extract the publish date for the article being parsed.
:return: date article was originally published
:rtype: string
"""
json_date_published = JSONExtraction(self._url, self._raw_soup).extract_article_published_date()
if json_date_published is not None:
if len(json_date_published) != 0:
return json_date_published
else:
return None
elif json_date_published is None:
if self._raw_soup.find(name='div', attrs={'class': regex.compile("--publishDate")}):
date_published = self._raw_soup.find(name='div', attrs={'class': regex.compile("--publishDate")})
if len(date_published) != 0:
return date_published.text
else:
logger.info('The HTML tag to extract the publish date for the following article was not found.')
logger.info(f'Article URL -- {self._url}')
return None
truncated...
async def get_article_data_elements(self):
"""
This function is designed to extract all the common data elements from a
news article on xyz.
:return: dictionary of data elements related to the article
:rtype: dict
"""
article_data_elements = {}
# I have tried this:
published_date = self._extract_article_published_date().__await__()
# and this
published_date = self.task(self._extract_article_published_date())
await published_date
truncated...
I have also tried to use:
if __name__ == "__main__":
asyncio.run(NewsExtraction.get_article_data_elements())
# asyncio.run(self.get_article_data_elements())
I'm really banging my head on the wall with using asyncio in my news extraction code.
If this question is off base, I will be happy to delete it and keep reading about how to use asyncio correctly.
Can someone provide me some pointers to solve my issue?
Thanks in advance for any guidance on using asyncio
Your are defining _extract_article_published_date and get_article_data_elements as coroutines, and this coroutines must be await-ed in your code to get the result of their execution in an asynchronous way.
You can do this creating an instance of type NewsExtraction and calling this methods with the keyword await in front, this await pass the execution to other task in the loop until his awaited task completes its execution. Note that there are no threads or process involved in this task execution, the execution is passed only if it is no using cpu-time (await-ing I/O operations or sleeping).
if __name__ == '__main__':
extractor = NewsExtraction(...)
# this creates the event loop and runs the coroutine
asyncio.run(extractor.get_article_data_elements())
Inside your _extract_article_published_date you must also await your coroutines that perform requests over the network, if you are using some library for the scraping make sure that uses async/await behind the scenes to get a real performance while using asyncio.
async def get_article_data_elements(self):
article_data_elements = {}
# note here that the instance is self
published_date = await self._extract_article_published_date()
truncated...
You must dive into the asyncio documentation to get a better understanding of these features of Python 3.7+.

Get Discord username and discriminator through a mention

I have tried reading the docs, but I don't understand what is going on here and how to fix it. I am trying to map a mention to its proper Name#NNNN form, but alas, it is proving to be a fruitless endeavor for me.
import discord
from discord.ext import commands
from collections import defaultdict
client = commands.Bot(command_prefix=">")
#client.event
async def on_ready():
print('Ready!')
jobz = {}
'''PART 1 v v v'''
#client.event
if message.content.startswith('>jobsched'):
author = message.author
jobz[author].append(...)
await channel.send(jobz[author])
'''PART 2 v v v'''
if message.content.startswith('>when '):
channel = message.channel
worker = list(filter(None, message.content[6:].split(' ')))[0]
uname = message.mentions[0].mention
await channel.send(jobz[uname])
PART 1:
I run this first, the send works as expected as seen below:
>jobsched 'a'
>jobsched 'b'
As seen in the last line, this spits out ['1a', '2b']
PART 2:
Here is where I have my issue.
>when #Name
I expected this to spit out ['1a', '2b'] because I expected it to look up or translate the mentioned name, find its respective name and discriminator. I thought this should happen since, in the above piece, that is how the name gets written into the dictionary is i.e. Name#1234: ['1a','2b']
Printing out .keys() shows that the key has the name and discriminator i.e. Name#1234 in the 'jobz' dictionary.
However, I can't seem to get the mention to give me the Name and Discriminator. I have tried doing mentions[0].mention from what I have seen here on stackoverflow, but it doesn't result in a Member class for me, just a string, presumably just '#Name'. If I leave it alone, as shown in my 'worker' variable, it passes an empty list. It should pull the list because when I override it to jobz['Name#1234'] it gives me the list I expect.
Can anyone please help?
just cast the member object to string to get the name and discriminator as it says in the discord.py docs. To mention someone, put the internal representation like this: f'<#{member.id}>'. To stop problems like this, use client.command() it's way easier to put in parameters, and easier to access info. So, here would be the code:
#client.command()
async def when(ctx, member: discord.Member):
await ctx.send(jobz[str(member)])
Also, if your worker variable is returning None, you're not passing a parameter at all
mentions is a list of Member objects, so when you do mentions[0] you are referencing a Member. Thus, mentions[0].mention is the formatted mention string for the first-mentioned (element 0) Member.
You probably want mentions[0].name and mentions[0].discriminator
See: https://discordpy.readthedocs.io/en/latest/api.html#discord.Message.mentions

Can not get client.command parameter to parse API response by key value in discord.py

I'm building a command onto an existing bot that will search an API and take a baseball player's name as a parameter to query a json response with. I've gotten everything to work correctly in test, only for the life of me I can't figure out how to restrict the results to only those that include the query parameter that is passed when the command is invoked within discord.
For example: a user will type !card Adam Dunn and only the value "Adam Dunn" for the key "name" will return. Currently, the entire first page of results is being sent no matter what is typed for the parameter, and with my embed logic running, each result gets a separate embed, which isn't ideal.
I've only included the pertinent lines of code and not included the massive embed of the results for readability's sake.
It's got to be something glaringly simple, but I think I've just been staring at it for too long to see it. Any help would be greatly appreciated, thank you!
Below is a console output when the command is run:
Here is the code I'm currently working with:
async def card(ctx, *, player_name: str):
async with ctx.channel.typing():
async with aiohttp.ClientSession() as cs:
async with cs.get("https://website.items.json") as r:
data = await r.json()
listings = data["items"]
for k in listings:
if player_name == k["name"]
print()```
I hope I understood you right. If the user did not give a player_name Then you will just keep searching for nothing, and you want to end if there is no player_name given. if that is the case then.
Set the default value of player_name: str=None to be None then check at the beginning of your code if it is there.
async def card(ctx, *, player_name: str=None):
if not player_name:
return await ctx.send('You must enter a player name')
# if there is a name do this
async with ctx.channel.typing():
async with aiohttp.ClientSession() as cs:
async with cs.get("https://theshownation.com/mlb20/apis/items.json") as r:
data = await r.json()
listings = data["items"]
for k in listings:
if player_name == k["name"]
print()```
Update:
I'm an idiot. Works as expected, but because the player_name I was searching for wasn't on the first page of results, it wasn't showing. When using a player_name that is on the first page of the API results, it works just fine.
This is a pagination issue, not a key value issue.

Using multiple parsers while scraping a page

I have searched some of the questions regarding this topic but i couldn't find a solution to my problem.
I'm currently trying to use multiple parsers on a site depending on the product I want to search. After trying some methods, I ended up with this:
With this start request:
def start_requests(self):
txtfile = open('productosABuscar.txt', 'r')
keywords = txtfile.readlines()
txtfile.close()
for keyword in keywords:
yield Request(self.search_url.format(keyword))
That gets into my normal parse_item.
What i want to do is, with this parse_item (by checking with the item category like laptop, tablet, etc):
def parse_item(self,response):
#I get the items category for the if/else
category = re.sub('Back to search results for |"','', response.xpath('normalize-space(//span[contains(#class, "a-list-item")]//a/text())').extract_first())
#Get the product link, for example (https://www.amazon.com/Lenovo-T430s-Performance-Professional-Refurbished/dp/B07L4FR92R/ref=sr_1_7?s=pc&ie=UTF8&qid=1545829464&sr=1-7&keywords=laptop)
urlProducto = response.request.url
#This can be done in a nicer way, just trying out if it works atm
if category == 'Laptop':
yield response.follow(urlProducto, callback = parse_laptop)
With:
def parse_laptop(self, response):
#Parse things
Any suggestions? The error i get when running this code is 'parse_laptop' is not defined. I have already tried putting the parse_laptop above the parse_item and i still get the same error.
You need to refer to a method and not a function, so just change it like this:
yield response.follow(urlProducto, callback = self.parse_laptop)
yield response.follow(urlProducto, callback = parse_laptop)
This is the request and here's you function def parse_laptop(self, response): you probably have noticed that you parse_laptop function requires self object.
so please modify you request to :
yield response.follow(urlProducto, callback = self.parse_laptop)
This should do the work.
Thanks.

Python 3.5 asyncio execute coroutine on event loop from synchronous code in different thread

I am hoping someone can help me here.
I have an object that has the ability to have attributes that return coroutine objects. This works beautifully, however I have a situation where I need to get the results of the coroutine object from synchronous code in a separate thread, while the event loop is currently running. The code I came up with is:
def get_sync(self, key: str, default: typing.Any=None) -> typing.Any:
"""
Get an attribute synchronously and safely.
Note:
This does nothing special if an attribute is synchronous. It only
really has a use for asynchronous attributes. It processes
asynchronous attributes synchronously, blocking everything until
the attribute is processed. This helps when running SQL code that
cannot run asynchronously in coroutines.
Args:
key (str): The Config object's attribute name, as a string.
default (Any): The value to use if the Config object does not have
the given attribute. Defaults to None.
Returns:
Any: The vale of the Config object's attribute, or the default
value if the Config object does not have the given attribute.
"""
ret = self.get(key, default)
if asyncio.iscoroutine(ret):
if loop.is_running():
loop2 = asyncio.new_event_loop()
try:
ret = loop2.run_until_complete(ret)
finally:
loop2.close()
else:
ret = loop.run_until_complete(ret)
return ret
What I am looking for is a safe way to synchronously get the results of a coroutine object in a multithreaded environment. self.get() can return a coroutine object, for attributes I have set to provide them. The issues I have found are: If the event loop is running or not. After searching for a few hours on stack overflow and a few other sites, my (broken) solution is above. If the loop is running, I make a new event loop and run my coroutine in the new event loop. This works, except that the code hangs forever on the ret = loop2.run_until_complete(ret) line.
Right now, I have the following scenarios with results:
results of self.get() is not a coroutine
Returns results. [Good]
results of self.get() is a coroutine & event loop is not running (basically in same thread as the event loop)
Returns results. [Good]
results of self.get() is a coroutine & event loop is running (basically in a different thread than the event loop)
Hangs forever waiting for results. [Bad]
Does anyone know how I can go about fixing the bad result so I can get the value I need? Thanks.
I hope I made some sense here.
I do have a good, and valid reason to be using threads; specifically I am using SQLAlchemy which is not async and I punt the SQLAlchemy code to a ThreadPoolExecutor to handle it safely. However, I need to be able to query these asynchronous attributes from within these threads for the SQLAlchemy code to get certain configuration values safely. And no, I won't switch away from SQLAlchemy to another system just in order to accomplish what I need, so please do not offer alternatives to it. The project is too far along to switch something so fundamental to it.
I tried using asyncio.run_coroutine_threadsafe() and loop.call_soon_threadsafe() and both failed. So far, this has gotten the farthest on making it work, I feel like I am just missing something obvious.
When I get a chance, I will write some code that provides an example of the problem.
Ok, I implemented an example case, and it worked the way I would expect. So it is likely my problem is elsewhere in the code. Leaving this open and will change the question to fit my real problem if I need.
Does anyone have any possible ideas as to why a concurrent.futures.Future from asyncio.run_coroutine_threadsafe() would hang forever rather than return a result?
My example code that does not duplicate my error, unfortunately, is below:
import asyncio
import typing
loop = asyncio.get_event_loop()
class ConfigSimpleAttr:
__slots__ = ('value', '_is_async')
def __init__(
self,
value: typing.Any,
is_async: bool=False
):
self.value = value
self._is_async = is_async
async def _get_async(self):
return self.value
def __get__(self, inst, cls):
if self._is_async and loop.is_running():
return self._get_async()
else:
return self.value
class BaseConfig:
__slots__ = ()
attr1 = ConfigSimpleAttr(10, True)
attr2 = ConfigSimpleAttr(20, True)
def get(self, key: str, default: typing.Any=None) -> typing.Any:
return getattr(self, key, default)
def get_sync(self, key: str, default: typing.Any=None) -> typing.Any:
ret = self.get(key, default)
if asyncio.iscoroutine(ret):
if loop.is_running():
fut = asyncio.run_coroutine_threadsafe(ret, loop)
print(fut, fut.running())
ret = fut.result()
else:
ret = loop.run_until_complete(ret)
return ret
config = BaseConfig()
def example_func():
return config.get_sync('attr1')
async def main():
a1 = await loop.run_in_executor(None, example_func)
a2 = await config.attr2
val = a1 + a2
print('{a1} + {a2} = {val}'.format(a1=a1, a2=a2, val=val))
return val
loop.run_until_complete(main())
This is the stripped down version of exactly what my code is doing, and the example works, even if my actual application doesn't. I am stuck as far as where to look for answers. Suggestions are welcome as to where to try to track down my "stuck forever" problem, even if my code above doesn't actually duplicate the problem.
It is very unlikely that you need to run several event loops at the same time, so this part looks quite wrong:
if loop.is_running():
loop2 = asyncio.new_event_loop()
try:
ret = loop2.run_until_complete(ret)
finally:
loop2.close()
else:
ret = loop.run_until_complete(ret)
Even testing whether the loop is running or not doesn't seem to be the right approach. It's probably better to give explicitly the (only) running loop to get_sync and schedule the coroutine using run_coroutine_threadsafe:
def get_sync(self, key, loop):
ret = self.get(key, default)
if not asyncio.iscoroutine(ret):
return ret
future = asyncio.run_coroutine_threadsafe(ret, loop)
return future.result()
EDIT: Hanging problems can be related to tasks being scheduled in the wrong loop (e.g. forgetting about the optional loop argument when calling a coroutine). This kind of problem should be easier to debug with the PR 303 (now merged): a RuntimeError is raised instead when the loop and the future don't match. So you might want to run your tests with the latest version of asyncio.
Ok, I got my code working, by taking a different approach to it. The problem was tied with using something that had file IO, which I was converting into a coroutine using loop.run_in_executor() on the file IO components. Then, I was trying to use this in a sync function being called from another thread, processed using another loop.run_in_executor() on that function. This is a very important routine in my code (called probably a million times or more during the execution of my short-running code), and I made a decision that my logic was just getting too complicated. So... I uncomplicated it. Now, if I want to use the file IO components asynchronously, I explicitly use my "get_async()" method, otherwise, I use my attribute through normal attribute access.
By removing the complexity of my logic, it made the code cleaner, easier to understand, and even more importantly, it actually works. While I am not 100% certain that I know the root cause of the issue (I believe it has something to do with a thread processing an attribute, which then in turn starts another thread that tries to read the attribute before it is processed, which caused something like a race condition and halting my code, but I could never duplicate the error outside of my application unfortunately to completely prove it out), I was able to get past it and continue with my development efforts.

Resources