404 when page number is too high - pagination

If this filter REST query results in no values it will return an HTTP 200 with empty results:
http://server/path/entities?field=value&page=1
This one will return an HTTP 404 instead
http://server/path/entities?field=value&page=2
Obviously, there is no second page of results. Can I configure django-rest to return an empty HTTP 200, rather than an HTTP 404 in this scenario?
The GUI allows the user to page forward, then change the filter criteria, which can request the second URL and trigger a HTTP 404 and a user error.
I can ask the GUI team to treat a 404 as an empty result set, but I would rather that this just return an empty HTTP 200 from the server.

You can create a custom pagination class that intercept NotFound exception raise in paginate_queryset() and return empty list instead
something like this
def paginate_queryset(self, queryset, request, view=None):
"""Checking NotFound exception"""
try:
return super(EmptyPagination, self).paginate_queryset(queryset, request, view=view)
except NotFound: # intercept NotFound exception
return list()
def get_paginated_response(self, data):
"""Avoid case when self does not have page properties for empty list"""
if hasattr(self, 'page') and self.page is not None:
return super(EmptyPagination, self).get_paginated_response(data)
else:
return Response(OrderedDict([
('count', None),
('next', None),
('previous', None),
('results', data)
]))
and in config file
just tell DRF to use this custom pagination class
'DEFAULT_PAGINATION_CLASS': 'apps.commons.serializers.EmptyPagination',

This is not (easily) possible, because the 404 is triggered as the result of a NotFound exception being raised, which will break out from the pagination logic. You could special case the NotFound exception in a custom exception handler, but you will be guessing based on the detail string. This isn't the best idea, as the message can change if
The message is changed in the DRF core translations
Your application is using translated strings
Which means that your application will suddenly return back to raising a 404 at some point in the future.
You're better off just having your GUI team treat a 404 as an empty result, or have them reset the page number when the filtering changes.

Related

Avoid multiple try catch in python3

I have created my custom exception and placed that in django settings file.
I have a created a create api operation.My api takes data as input and calls multiple third party create api's if any one fails it reverts create operation of all third party api's.
class InsertionOrderViewset(viewsets.ViewSet): #my api
''' Manages insertion order dsp operations'''
def create(self, request, format=None):
try:
// create api 1
except error as e:
return e
try:
// create api 2
except error as e:
// undo api 1
return e
try:
// create api 3
except error as e:
// undo api 1
// undo api 2
return e
Is there a way to avoid writing multiple try catch in such rollback scenerios?
Looking at your code I'm thinking this should give you the same output like that.
class InsertionOrderViewset(viewsets.ViewSet): # my api
"""Manages insertion order dsp operations"""
def create(self, request, format=None):
try:
# create api 1
except error as e:
# undo api 1
# create api 2
except error2 as e:
# undo api 2
# create api 3
except Exception as e:
return e

How can I keep sensitive data out of logs?

How can I keep sensitive data out of logs?
Currently I have an exception which is raised in a method A. In method B this exception is re-raised adding further information. In method C I want to log the exception with a further information.
My first attempt was to add a string replace method before logging the exception, but this does not affect the whole traceback. Especially because I call the Python library request in methodA. The first exception takes place in this library.
first excepton in requests: urllib3.exceptions.MaxRetryError
second exception in requests: requests.exceptions.ProxyError
Both exceptions within the request library contain already the sensitive data in the traceback.
def methodA():
try:
connect_to_http_with_request_lib()
except requests.exceptions.ProxyError as err:
raise MyExceptionA(f"this log contains sensitive info in err: {err}")
def methodB():
try:
methodA()
except MyExceptionA as err:
raise MyExceptionB (f"add some more info to: {err}")
def methodC():
try:
methodB()
return True
except MyExceptionB as err:
err = re.sub(r"(?is)password=.+", "password=xxxx", str(err))
logger.exception(f("methodB failed exception {err}")
return False
How can I parse the whole traceback before logging the exception in order to mask out sensitive data?
I use loguru as logging library.
The Django framework seems to address the same problem with their own methods. See here

How to transmit custom data with an exception?

I am validating json request data for my service. There are many fields, each with their own validators. My plan is to turn the request into a frozen dataclass, if all validations are successful. A validation can fail for reasons that require additional data to explain the cause. If the request is invalid, I want to report this data back to the client so that the user knows why the request was unsuccessful.
Example: The request has a field with an array of fruit.
request = {
'fruit': [{'type':'apple', 'price':5}, {'type': 'banana', 'price': 7}]
}
To validate the fruit field I use a function validate_fruit(fruit: list) which checks the types and values. If there is a fruit with price > 5 I say the request is invalid. I send a response with an error return code and want to specify which fruit is too expensive.
Example: Here, return code 12 means "there are fruit that are too expensive". Error data should give details.
response = {
'return_code': 12
'error_data': ['banana']
}
I would like to use exceptions to implement this. So validate_fruit can raise an Exception with a dict that specifies the return code and additional error data.
I am thinking about
def validate_fruit(fruit: list):
failures = [elem['type'] for elem in fruit if elem['price'] > 5]
if failures:
raise ValueError(data={'error_data': failures, 'return_code': 12})
try:
validate_fruit(fruit)
except Exception as error:
if error.return_code == 12:
...
Has anyone had the same idea? How do you do this?
you can use something like:
try:
raise ValueError({'error_data': ['banana'], 'return_code': 12})
except Exception as ex:
if ex.args[0]["return_code"] == 12:
print("error 12")

scrapy: restricting link extraction to the request domain

I have a scrapy project which uses a list of URLs from different domains as the seeds, but for any given page, I only want to follow links in the same domain as that page's URL (so the usual LinkExtractor(accept='example.com') approach wouldn't work. I'm surprised I couldn't find a solution on the web, as I'd expect this to be a common task. The best I could come up with was this in the spider file and refer to it in the Rules:
class CustomLinkExtractor(LinkExtractor):
def get_domain(self, url):
# https://stackoverflow.com/questions/9626535/get-protocol-host-name-from-url
return '.'.join(tldextract.extract(url)[1:])
def extract_links(self, response):
domain = self.get_domain(response.url)
# https://stackoverflow.com/questions/40701227/using-scrapy-linkextractor-to-locate-specific-domain-extensions
return list(
filter(
lambda link: self.get_domain(link.url) == domain,
super(CustomLinkExtractor, self).extract_links(response)
)
)
But that doesn't work (the spider goes off-domain).
Now I'm trying to use the process_request option in the Rule:
rules = (
Rule(LinkExtractor(deny_domains='twitter.com'),
callback='parse_response',
process_request='check_r_r_domains',
follow=True,
),
)
and
def check_r_r_domains(request, response):
domain0 = '.'.join(tldextract.extract(request.url)[1:])
domain1 = '.'.join(tldextract.extract(response.url)[1:])
log('TEST:', domain0, domain1)
if (domain0 == domain1) and (domain0 != 'twitter.com'):
return request
log(domain0, ' != ', domain1)
return None
but I get an exception because it's passing self to the method (the spider has no url attribute); when I add self to the method signature, I get an exception that the response positional argument is missing! If I change the callback to process_request=self.check_r_r_domains, I get an error because self isn't defined where I set the rules!
If you are using Scrapy 1.7.0 or later, you can pass Rule a callable, process_request, to check the URLs of both the request and the response, and drop the request (return None) if the domains do not match.
Oops, it turns out that conda on the server I'm using had installed a 1.6 version of scrapy. I've forced it to install 1.8.0 from conda-forge and I think it's working now.

Scrapy / Python - executing several yields

In my parse method, I'd like to call 3 methods from a SpiderClass that I inherit from.
At first, I'd like to parse the XPaths, then clean the data, then assign the data to an item instance and hand it over to the pipeline.
I'll try it with little code and just ask for the principles: cleanData and assignProductValues are never called - why?
def parse(self, response):
for href in response.xpath("//a[#class='product--title']/#href"):
url = href.extract()
yield scrapy.Request(url, callback=super(MyclassSpider, self).scrapeProduct)
yield scrapy.Request(url, callback=super(MyclassSpider, self).cleanData)
yield scrapy.Request(url, callback=super(MyclassSpider, self).assignProductValues)
I understand that I create a generator when using yield but I don't understand why the 2nd and 3rd yield are not being called after the first yield or how I can achieve them being called.
--
Then I tried another way: I don't want to do 3 requests towards the website - just one and work with the data.
def parse(self, response):
for href in response.xpath("//a[#class='product--title']/#href"):
url = href.extract()
item = MyItem()
response = scrapy.Request(url, meta={'item': item}, callback=super(MyclassSpider, self).scrapeProduct)
super(MyclassSpider, self).cleanData(response)
super(MyclassSpider, self).assignProductValues(response)
yield response
What happens here is, scrapeProduct is being called, that might take a while. (I've got a 5 seconds delay).
But then cleanData and assignProductValues are being called right away like 30 times (as often as the for is true/looped through).
How can I execute the three Methods one by one with only 1 request towards the website?
I guess that after you yield the first request, the other two are getting filtered by dupefilter. Check your log. If you don't want it to be filtered, pass dont_filter=True for the Request object.

Resources