Flask App Simulating blocking API to abstract from web hook based callback - python-3.x

I have Angular 8 web app. It needs to send some data for analysis to python flask App. This flask App will send it to 3rd party service and get response through webhook.
My need is to provide a clean interface to the client so that there is no need to provide webhook from client.
Hence I am trying to initiate a request from Flask app, wait until I get data from webhook and then return.
Here is the code.
#In autoscaled micro-service this will not work. Right now, the scaling is manual and set to 1 instance
#Hence keeping this sliding window list in RAM is okay.
reqSlidingWindow =[]
#app.route('/handlerequest',methods = ['POST'])
def Process_client_request_and_respond(param1,param2,param3):
#payload code removed
#CORS header part removed
querystring={APIKey, 'https://mysvc.mycloud.com/postthirdpartyresponsehook'}
response = requests.post(thirdpartyurl, json=payload, headers=headers, params=querystring)
if(response.status_code == SUCCESS):
respdata = response.json()
requestId = respdata["request_id"]
requestobject = {}
requestobject['requestId'] = requestId
reqSlidingWindow.append(requestobject)
#Now wait for the response to arrive through webhook
#Keep checking the list if reqRecord["AllAnalysisDoneFlag"] is set to True.
#If set, read reqRecord["AnalysisResult"] = jsondata
jsondata = None
while jsondata is None:
time.sleep(2)
for reqRecord in reqSlidingWindow:
if(reqRecord["requestId"] == da_analysis_req_Id ):
print("Found matching req id in req list. Checking if analysis is done.")
print(reqRecord)
if(reqRecord["AllAnalysisDoneFlag"] == True):
jsondata = reqRecord["AnalysisResult"]
return jsonify({"AnalysisResult": "Success", "AnalysisData":jsondata})
#app.route('/webhookforresponse',methods = ['POST'])
def process_thirdparty_svc_response(reqIdinResp):
print(request.data)
print("response receieved at")
data = request.data.decode('utf-8')
jsondata = json.loads(data)
#
for reqRecord in reqSlidingWindow:
#print("In loop. current analysis type" + reqRecord["AnalysisType"] )
if(reqRecord["requestId"] == reqIdinResp ):
reqRecord["AllAnalysisDoneFlag"] = True
reqRecord["AnalysisResult"] = jsondata
return
I'm trying to maintain sliding window of requests in list. Upon the
Observations so far:
First, this does not work. The function of 'webhookforresponse' does not seem to run until my request function comes out. i.e. my while() loop appears to block everything though I have a time.sleep(). My assumption is that flask framework would ensure that callback is called since sleep in another 'route' gives it adequate time and internally flask would be multithreaded?
I tried running python threads for the sliding window data structure and also used RLocks. This does not alter behavior. I have not tried flask specific threading.
Questions:
What is the right architecture of the above need with flask? I need clean REST interface without callback for angular client. Everything else can change.
If the above code to be used, what changes should I make? Is threading required at all?
Though I can use database and then pick from there, it still requires polling the sliding window.
Should I use multithreading specific to flask? Is there any specific example with skeletal design, threading library choices?
Please suggest the right way to proceed to achieve abstract REST API for angular client which hides the back end callbacks and other complexities.

Related

Boto3 client in multiprocessing pool fails with "botocore.exceptions.NoCredentialsError: Unable to locate credentials"

I'm using boto3 to connect to s3, download objects and do some processing. I'm using a multiprocessing pool to do the above.
Following is a synopsis of the code I'm using:
session = None
def set_global_session():
global session
if not session:
session = boto3.Session(region_name='us-east-1')
def function_to_be_sent_to_mp_pool():
s3 = session.client('s3', region_name='us-east-1')
list_of_b_n_o = list_of_buckets_and_objects
for bucket, object in list_of_b_n_o:
content = s3.get_object(Bucket=bucket, Key=key)
data = json.loads(content['Body'].read().decode('utf-8'))
write_processed_data_to_a_location()
def main():
pool = mp.Pool(initializer=set_global_session, processes=40)
pool.starmap(function_to_be_sent_to_mp_pool, list_of_b_n_o_i)
Now, when processes=40, everything works good. When processes = 64, still good.
However, when I increases to processes=128, I get the following error:
botocore.exceptions.NoCredentialsError: Unable to locate credentials
Our machine has the required IAM roles for accessing S3. Moreover, the weird thing that happens is that for some processes, it works fine, whereas for some others, it throws the credentials error. Why is this happening, and how to resolve this?
Another weird thing that happens is that I'm able to trigger two jobs in 2 separate terminal tabs (each of which has a separate ssh login shell to the machine). Each job spawns 64 processes, and that works fine as well, which means there are 128 processes running simultaneously. But 80 processes in one login shell fails.
Follow up:
I tried creating separate sessions for separate processes in one approach. In the other, I directly created s3-client using boto3.client. However, both of them throw the same error with 80 processes.
I also created separate clients with the following extra config:
Config(retries=dict(max_attempts=40), max_pool_connections=800)
This allowed me to use 80 processes at once, but anything > 80 fails with the same error.
Post follow up:
Can someone confirm if they've been able to use boto3 in multiprocessing with 128 processes?
This is actually a race condition on fetching the credentials. I'm not sure how fetching credentials under the hood works, but the I saw this question in Stack Overflow and this ticket in github.
I was able to resolve this by keeping a random wait time for each of the processes. The following is the updated code which works for me:
client_config = Config(retries=dict(max_attempts=400), max_pool_connections=800)
time.sleep(random.randint(0, num_processes*10)/1000) # random sleep time in milliseconds
s3 = boto3.client('s3', region_name='us-east-1', config=client_config)
I tried keeping the range for sleep time lesser than num_processes*10, but that failed again with the same issue.
#DenisDmitriev, since you are getting the credentials and storing them explicitly, I think that resolves the race condition and hence the issue is resolved.
PS: values for max_attempts and max_pool_connections don't have a logic. I was plugging several values until the race condition was figured out.
I suspect that AWS recently reduced throttling limits for metadata requests because I suddenly started running into the same issue. The solution that appears to work is to query credentials once before creating the pool and have the processes in the pool use them explicitly instead of making them query credentials again.
I am using fsspec with s3fs, and here's what my code for this looks like:
def get_aws_credentials():
'''
Retrieve current AWS credentials.
'''
import asyncio, s3fs
fs = s3fs.S3FileSystem()
# Try getting credentials
num_attempts = 5
for attempt in range(num_attempts):
credentials = asyncio.run(fs.session.get_credentials())
if credentials is not None:
if attempt > 0:
log.info('received credentials on attempt %s', 1 + attempt)
return asyncio.run(credentials.get_frozen_credentials())
time.sleep(15 * (random.random() + 0.5))
raise RuntimeError('failed to request AWS credentials '
'after %d attempts' % num_attempts)
def process_parallel(fn_d, max_processes):
# [...]
c = get_aws_credentials()
# Cache credentials
import fsspec.config
prev_s3_cfg = fsspec.config.conf.get('s3', {})
try:
fsspec.config.conf['s3'] = dict(prev_s3_cfg,
key=c.access_key,
secret=c.secret_key)
num_processes = min(len(fn_d), max_processes)
from concurrent.futures import ProcessPoolExecutor
with ProcessPoolExecutor(max_workers=num_processes) as pool:
for data in pool.map(process_file, fn_d, chunksize=10):
yield data
finally:
fsspec.config.conf['s3'] = prev_s3_cfg
Raw boto3 code will look essentially the same, except instead of the whole fs.session and asyncio.run() song and dance, you'll work with boto3.Session itself and call its get_credentials() and get_frozen_credentials() methods directly.
I get the same problem with multi process situation. I guess there is a client init problem when you use multi process. So I suggest that you can use get function to get s3 client. It works for me.
g_s3_cli = None
def get_s3_client(refresh=False):
global g_s3_cli
if not g_s3_cli or refresh:
g_s3_cli = boto3.client('s3')
return g_s3_cli

Request using trio asks returns a different response than with requests and aiohttp

Right, hello, so I'm trying to implement opticard (loyality card services) with my webapp using trio and asks (https://asks.readthedocs.io/).
So I'd like to send a request to their inquiry api:
Here goes using requests:
import requests
r = requests.post("https://merchant.opticard.com/public/do_inquiry.asp", data={'company_id':'Dt', 'FirstCardNum1':'foo', 'g-recaptcha-response':'foo','submit1':'Submit'})
This will return "Invalid ReCaptcha" and this is normal, and what I want
Same thing using aiohttp:
import asyncio
import aiohttp
async def fetch(session, url):
async with session.post(url, data={'company_id':'Dt', 'FirstCardNum1':'foo', 'g-recaptcha-response':'foo','submit1':'Submit'} ) as response:
return await response.text()
async def main():
async with aiohttp.ClientSession() as session:
html = await fetch(session, 'https://merchant.opticard.com/public/do_inquiry.asp')
print(html)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Now this also returns "Invalid ReCaptcha", so that's all nice and good.
However now, using trio/asks:
import asks
import trio
async def example():
r = await asks.post('https://merchant.opticard.com/public/do_inquiry.asp', data={'company_id':'Dt', 'FirstCardNum1':'foo', 'g-recaptcha-response':'foo','submit1':'Submit'})
print(r.text)
trio.run(example)
This returns a completely different response with 'Your session has expired to protect your account. Please login again.', this error/message can be accessed normally when inputting an invalid url such as 'https://merchant.opticard.com/do_inquiry.asp' instead of 'https://merchant.opticard.com/public/do_inquiry.asp'.
I have no idea where this error is coming from, I tried setting headers, cookies, encoding, nothing seems to make it work. I tried replicating the issue, but the only way I managed to replicate the result with aiohttp and requests is by setting an incorrect url like 'https://merchant.opticard.com/do_inquiry.asp' instead of 'https://merchant.opticard.com/public/do_inquiry.asp'.
This must be an issue from asks, maybe due to encoding or formatting, but I've been using asks for over a year and never had an issue where a simple post request with data would return differently on asks compared to everywhere else. And I'm baffled as I can't understand why this is happening, it couldn't possibly be a formatting error on asks' part because if so how come this is the first time something like this has ever happened after using it for over a year?
This is a bug how asks handles redirection when a non-standard location is received.
The server returns a 302 redirection with Location: inquiry.asp?... while asks expects it to be a full URL. You may want to file a bug report to asks.
How did I find this? A good way to go is to use a proxy (e.g. mitmproxy) to inspect the traffic. However asks doesn't support proxies. So I turned to wireshark instead and use a program to extract TLS keys so wireshark can decrypt the traffic.

Return from before_request() in flask

I'm new to flask and currently converting an existing WSGI application to run through flask as long term it'll make life easier.
All requests are POST to specific routes however the current application inspects the post data prior to executing the route to see if the request needs to be run at all or not (i.e. if an identifier supplied in the post data already exists in our database or not).
If it does exist a 200 code and json is returned "early" and no other action is taken; if not the application continues to route as normal.
I think I can replicate the activity at the right point by calling before_request() but I'm not sure if returning a flask Response object from before_request() would terminate the request adequately at that point? Or if there's a better way of doing this?
NB: I must return this as a 200 - other examples I've seen result in a redirect or 4xx error handling (as a close parallel to this activity is authentication) so ultimately I'm doing this at the end of before_request():
if check_request_in_progress(post_data) is True:
response = jsonify({'request_status': 'already_running'})
response.status_code = 200
return response
else:
add_to_requests_in_progress(post_data)
Should this work (return and prevent further routing)?
If not how can I prevent further routing after calling before_request()?
Is there a better way?
Based on what they have said in the documents, it should do what you want it to do.
The function will be called without any arguments. If the function returns a non-None value, it’s handled as if it was the return value from the view and further request handling is stopped.
(source)
#app.route("/<name>")
def index(name):
return f"hello {name}"
#app.before_request
def thing():
if "john" in request.path:
return "before ran"
with the above code, if there is a "john" in the url_path, we will see the before ran in the output, not the actual intended view. you will see hello X for other string.
so yes, using before_request and returning something, anything other than None will stop flask from serving your actual view. you can redirect the user or send them a proper response.

Configuring my Python Flask app to call a function every 30 seconds

I have a flask app that returns a JSON response. However, I want it to call that function every 30 seconds without clicking the refresh button on the browser. Here is what I did
Using apscheduler
. This code in application.py
from apscheduler.schedulers.background import BachgroundScheduler
def create_app(config_filname):
con = redis.StrictRedis(host= "localhost", port=6379, charset ="utf-8", decode_responses=True, db=0)
application = Flask(__name__)
CORS(application)
sched = BackgroundScheduler()
#application.route('/users')
#cross_origin()
#sched.scheduled_job('interval', seconds = 20)
def get_users():
//Some code...
return jsonify(users)
sched.start()
return application
Then in my wsgi.py
from application import create_app
application = create_app('application.cfg')
with application.app_context():
if __name__ == "__main__":
application.run()
When I run this appliaction, I get the json output but it does not refresh instead after 20 seconds it throws
RuntimeError: Working outside of application context.
This typically means that you attempted to use functionality that needed
to interface with the current application object in some way. To solve
this, set up an application context with app.app_context(). See the
documentation for more information.
What am I doing wrong? I would appreciate any advise.
Apologies if this in a way subverting the question, but if you want the users to be sent every 30 seconds, this probably shouldn't be done in the backend. The backend should only ever send out data when a request is made. In order for the data to be sent at regular intervals the frontend needs to be configured to make requests at regular intervals
Personally I'd recommend doing this with a combination of i-frames and javascript, as described in this stack overflow question:
Auto Refresh IFrame HTML
Lastly, when it comes to your actual code, it seems like there is an error here:
if __name__ == "__main__":
application.run()
The "application.run()" line should be indented as it is inside the if statement

Is there a better design for 'as needed' web interface with flask

I have a python program which is doing millions of comparisons across records. Occasionally a comparison fails and I need to have a user (me) step and update a record. Today I do this by calling a function which:
creates a flask 'app'
creates and populates a wtform form to collect the necessary information
instantiates the flask app (e.g. app.run() and webbrowser.open() call to pull up the form)
I update the data in the form, when the form is submitted, the handler puts the updated data into a variable and then shuts down the flask app returning the
data to the caller
This seems kludgy. Is there a cleaner way of doing this recognizing that this is not a typical client-driven web application?
The minimal problem is how best to programmatically launch an 'as needed' web-based UI which presents a form, and then return back the submitted data to the caller.
My method works, but seems a poor design to meet the goal.
As I have not yet found a better way - in case someone else has a similar need, here is how I'm meeting the goal:
...
if somedata_needs_review:
review_status = user_review(somedata)
update_data(review_status)
...
def user_review(data_to_review):
""" Present web form to user and receive their input """
returnstruct = {}
app = Flask(__name__)
#app.route('/', methods=['GET'])
def show_review_form():
form = create_review_form(data_to_review)
return render_template('reviewtemplate.tpl', form=form)
# TODO - I currently split the handling into a different route because
# when using the same route previously Safari would warn of resubmitting data.
# This has the slightly unfortunate effect of creating multiple tabs.
#app.route('/process_compare', methods=['POST'])
def process_review_form():
# this form object will be populated with the submitted information
form = create_review_form(request.form, matchrecord=matchrecord)
# handle submitted updates however necessary
returnstruct['matched'] = form.process_changes.data
shutdown_flask_server()
return "Changes submitted, you can close this tab"
webbrowser.open('http://localhost:5000/', autoraise=True)
app.run(debug=True, use_reloader=False)
# The following will execute after the app is shutdown.
print('Finished manual review, returning {}'.format(returnstruct))
return(returnstruct)
def shutdown_flask_server():
func = request.environ.get('werkzeug.server.shutdown')
if func is None:
raise RuntimeError('Not running with the Werkzeug Server')
func()

Resources