Flask_Sqlalchemy with multithreaded Apache. Sessions out of sync with database - multithreading

Background: Apache server using mod_wsgi to serve a Flask app using Flask_Sqlalchemy connecting to MySQL. This is a full stack application so it is nearly impossible to create a minimal example but I have tried.
My problem is that when I make some change that should modify the database subsequent requests don't always seem to reflect that change. For example if I create an object, then try to edit that same object, the edit will sometimes fail.
Most of the time if I create an object then go to the page listing all the objects, it will not show up on the list. Sometimes it will show up until I refresh, when it will disappear, and with another refresh it shows up again.
The same happens with edits. Example code:
bp = Blueprint('api_region', __name__, url_prefix='/app/region')
#bp.route('/rename/<int:region_id>/<string:name>', methods=['POST'])
def change_name(region_id, name):
region = Region.query.get(region_id)
try:
region.name = name
except AttributeError:
abort(404)
db.session.add(region)
db.session.commit()
return "Success"
#bp.route('/name/<int:region_id>/', methods=['GET'])
def get_name(region_id):
region = Region.query.get(region_id)
try:
name = region.name
except AttributeError:
abort(404)
return name
After object is created send a POST
curl -X POST https://example.com/app/region/rename/5/Europe
Then several GETs
curl -X GET https://example.com/app/region/name/5/
Sometimes, the GET will return the correct info, but every now and then it will return whatever it was before. Further example output https://pastebin.com/s8mqRHSR it happens at varying frequency but about one in 25 will fail, and it isn't always the "last" value either, when testing it seems to get 'stuck' at a certain value no matter how many times I change it up.
I am using the "dynamically bound" example of Flask_Sqlalchemy
db = SQLAlchemy()
def create_app():
app = Flask(__name__)
db.init_app(app)
... snip ...
return app
Which creates a scoped_session accessible in db.session.
Apache config is long and complicated but includes the line
WSGIDaemonProcess pixel processes=5 threads=5 display-name='%{GROUP}'
I can post more information if required.

For reference if anyone finds this thread with the same issue, I fixed my problem.
My Flask App factory function had the line app.app_context().push() leftover from the early days when it was based off a Flask tutorial. Unfortunately snipped out of the example code otherwise it might have been spotted by someone. During a restructuring of the project this line was left out and the problem fixed itself. Not sure why or how this line would cause this issue, and only for some but not all requests.

Related

Flask List Routes on Init is really Flaky

I'm trying to get all the routes for Flask when it's initially loaded but it seems to be super flaky. It'll give me all the routes half the time and the other half the time it'll give me this:
['/static/path:filename HEAD,GET,OPTIONS /static/path:filename']
I'm using this codeblock in the before_first_request and in the init constructor the Flask app. Any ideas how I can make this consistent? I want to do this as soon as the app is fully loaded automatically.
output = []
for rule in self.url_map.iter_rules():
try:
methods = ','.join(rule.methods)
line = urllib.parse.unquote("{:50s} {:20s} {}".format(str(rule), methods, rule))
output.append(line)
except Exception as e:
print("error with rule: " + str(rule))
dct = {"endpoints": output}
you can iterate over the flask dictionary using this app property view_functions
something like this:
app = Flask(__name__)
... #your code here
for str_function, function_object in app.view_functions:
print(f"the function url is {str_function}")
something I've read from the Flask doc documentation on flask

possible weird bug in pyramid web framework

I was following pyramid web framework tutorial steps given in the link:
https://docs.pylonsproject.org/projects/pyramid/en/latest/quick_tutorial/cookiecutters.html
After setting it up and visiting http://localhost:6543/
everything works as expected with the project name "Pyramid scaffold" in the route name showing properly.
Then I added a second view function and added it to the route. But then the home route starts showing 404.
The second route works, but the first route and view stop working and gives 404 when loaded in the browser.
I cannot find what the issue is. After adding several functions and routes, I was not able to find the issue.
I am thinking this is some issue with provided cookiecutter or pyramid framework itself.
This never used to happen with pyramid version less than 2. Also tried adding different views and routes. Only one route seems to work and all others return 404 exception.
No files were deleted or edited other than the ones listed here.
Can someone please help me with this?
Original contents of files
# File location 'views/default.py"
from pyramid.view import view_config
#view_config(route_name='home', renderer='pyramid_scaffold:templates/mytemplate.jinja2')
def my_view(request):
return {'project': 'Pyramid Scaffold'}
and
# File location 'routes.py"
def includeme(config):
config.add_static_view('static', 'static', cache_max_age=3600)
config.add_route('home', '/')
After my changes
# File location 'views/default.py"
from pyramid.view import view_config
#view_config(route_name='home', renderer='pyramid_scaffold:templates/mytemplate.jinja2')
def my_view(request):
return {'project': 'Pyramid Scaffold'}
#view_config(route_name='second', renderer='pyramid_scaffold:templates/mytemplate.jinja2')
def my_view(request):
return {'project': 'this works'}
and
# File location 'routes.py"
def includeme(config):
config.add_static_view('static', 'static', cache_max_age=3600)
config.add_route('home', '/')
config.add_route('second', '/second')
error log in terminal:
2021-10-01 02:41:29,880 INFO [pyramid_debugtoolbar:287][waitress-0] Squashed pyramid.httpexceptions.HTTPNotFound at http://localhost:6543/
traceback url: http://localhost:6543/_debug_toolbar/313430323330373531313831343038/exception
When visiting traceback URL, no helpful info other than saying
env/lib/python3.8/site-packages/pyramid/router.py", line 169, in handle_request
raise HTTPNotFound(msg)
Bugs are almost always in the developer's code, and rarely in a mature package such as Pyramid.
In your case, you defined two methods with the same name, overriding the first with the second. Therefore the view for the first route home was removed.
To remedy the situation, give the second view function a unique name.
#view_config(route_name='second', renderer='pyramid_scaffold:templates/mytemplate.jinja2')
def my_second_view(request):
return {'project': 'this works'}

Boto3 client in multiprocessing pool fails with "botocore.exceptions.NoCredentialsError: Unable to locate credentials"

I'm using boto3 to connect to s3, download objects and do some processing. I'm using a multiprocessing pool to do the above.
Following is a synopsis of the code I'm using:
session = None
def set_global_session():
global session
if not session:
session = boto3.Session(region_name='us-east-1')
def function_to_be_sent_to_mp_pool():
s3 = session.client('s3', region_name='us-east-1')
list_of_b_n_o = list_of_buckets_and_objects
for bucket, object in list_of_b_n_o:
content = s3.get_object(Bucket=bucket, Key=key)
data = json.loads(content['Body'].read().decode('utf-8'))
write_processed_data_to_a_location()
def main():
pool = mp.Pool(initializer=set_global_session, processes=40)
pool.starmap(function_to_be_sent_to_mp_pool, list_of_b_n_o_i)
Now, when processes=40, everything works good. When processes = 64, still good.
However, when I increases to processes=128, I get the following error:
botocore.exceptions.NoCredentialsError: Unable to locate credentials
Our machine has the required IAM roles for accessing S3. Moreover, the weird thing that happens is that for some processes, it works fine, whereas for some others, it throws the credentials error. Why is this happening, and how to resolve this?
Another weird thing that happens is that I'm able to trigger two jobs in 2 separate terminal tabs (each of which has a separate ssh login shell to the machine). Each job spawns 64 processes, and that works fine as well, which means there are 128 processes running simultaneously. But 80 processes in one login shell fails.
Follow up:
I tried creating separate sessions for separate processes in one approach. In the other, I directly created s3-client using boto3.client. However, both of them throw the same error with 80 processes.
I also created separate clients with the following extra config:
Config(retries=dict(max_attempts=40), max_pool_connections=800)
This allowed me to use 80 processes at once, but anything > 80 fails with the same error.
Post follow up:
Can someone confirm if they've been able to use boto3 in multiprocessing with 128 processes?
This is actually a race condition on fetching the credentials. I'm not sure how fetching credentials under the hood works, but the I saw this question in Stack Overflow and this ticket in github.
I was able to resolve this by keeping a random wait time for each of the processes. The following is the updated code which works for me:
client_config = Config(retries=dict(max_attempts=400), max_pool_connections=800)
time.sleep(random.randint(0, num_processes*10)/1000) # random sleep time in milliseconds
s3 = boto3.client('s3', region_name='us-east-1', config=client_config)
I tried keeping the range for sleep time lesser than num_processes*10, but that failed again with the same issue.
#DenisDmitriev, since you are getting the credentials and storing them explicitly, I think that resolves the race condition and hence the issue is resolved.
PS: values for max_attempts and max_pool_connections don't have a logic. I was plugging several values until the race condition was figured out.
I suspect that AWS recently reduced throttling limits for metadata requests because I suddenly started running into the same issue. The solution that appears to work is to query credentials once before creating the pool and have the processes in the pool use them explicitly instead of making them query credentials again.
I am using fsspec with s3fs, and here's what my code for this looks like:
def get_aws_credentials():
'''
Retrieve current AWS credentials.
'''
import asyncio, s3fs
fs = s3fs.S3FileSystem()
# Try getting credentials
num_attempts = 5
for attempt in range(num_attempts):
credentials = asyncio.run(fs.session.get_credentials())
if credentials is not None:
if attempt > 0:
log.info('received credentials on attempt %s', 1 + attempt)
return asyncio.run(credentials.get_frozen_credentials())
time.sleep(15 * (random.random() + 0.5))
raise RuntimeError('failed to request AWS credentials '
'after %d attempts' % num_attempts)
def process_parallel(fn_d, max_processes):
# [...]
c = get_aws_credentials()
# Cache credentials
import fsspec.config
prev_s3_cfg = fsspec.config.conf.get('s3', {})
try:
fsspec.config.conf['s3'] = dict(prev_s3_cfg,
key=c.access_key,
secret=c.secret_key)
num_processes = min(len(fn_d), max_processes)
from concurrent.futures import ProcessPoolExecutor
with ProcessPoolExecutor(max_workers=num_processes) as pool:
for data in pool.map(process_file, fn_d, chunksize=10):
yield data
finally:
fsspec.config.conf['s3'] = prev_s3_cfg
Raw boto3 code will look essentially the same, except instead of the whole fs.session and asyncio.run() song and dance, you'll work with boto3.Session itself and call its get_credentials() and get_frozen_credentials() methods directly.
I get the same problem with multi process situation. I guess there is a client init problem when you use multi process. So I suggest that you can use get function to get s3 client. It works for me.
g_s3_cli = None
def get_s3_client(refresh=False):
global g_s3_cli
if not g_s3_cli or refresh:
g_s3_cli = boto3.client('s3')
return g_s3_cli

Configuring my Python Flask app to call a function every 30 seconds

I have a flask app that returns a JSON response. However, I want it to call that function every 30 seconds without clicking the refresh button on the browser. Here is what I did
Using apscheduler
. This code in application.py
from apscheduler.schedulers.background import BachgroundScheduler
def create_app(config_filname):
con = redis.StrictRedis(host= "localhost", port=6379, charset ="utf-8", decode_responses=True, db=0)
application = Flask(__name__)
CORS(application)
sched = BackgroundScheduler()
#application.route('/users')
#cross_origin()
#sched.scheduled_job('interval', seconds = 20)
def get_users():
//Some code...
return jsonify(users)
sched.start()
return application
Then in my wsgi.py
from application import create_app
application = create_app('application.cfg')
with application.app_context():
if __name__ == "__main__":
application.run()
When I run this appliaction, I get the json output but it does not refresh instead after 20 seconds it throws
RuntimeError: Working outside of application context.
This typically means that you attempted to use functionality that needed
to interface with the current application object in some way. To solve
this, set up an application context with app.app_context(). See the
documentation for more information.
What am I doing wrong? I would appreciate any advise.
Apologies if this in a way subverting the question, but if you want the users to be sent every 30 seconds, this probably shouldn't be done in the backend. The backend should only ever send out data when a request is made. In order for the data to be sent at regular intervals the frontend needs to be configured to make requests at regular intervals
Personally I'd recommend doing this with a combination of i-frames and javascript, as described in this stack overflow question:
Auto Refresh IFrame HTML
Lastly, when it comes to your actual code, it seems like there is an error here:
if __name__ == "__main__":
application.run()
The "application.run()" line should be indented as it is inside the if statement

Is there a better design for 'as needed' web interface with flask

I have a python program which is doing millions of comparisons across records. Occasionally a comparison fails and I need to have a user (me) step and update a record. Today I do this by calling a function which:
creates a flask 'app'
creates and populates a wtform form to collect the necessary information
instantiates the flask app (e.g. app.run() and webbrowser.open() call to pull up the form)
I update the data in the form, when the form is submitted, the handler puts the updated data into a variable and then shuts down the flask app returning the
data to the caller
This seems kludgy. Is there a cleaner way of doing this recognizing that this is not a typical client-driven web application?
The minimal problem is how best to programmatically launch an 'as needed' web-based UI which presents a form, and then return back the submitted data to the caller.
My method works, but seems a poor design to meet the goal.
As I have not yet found a better way - in case someone else has a similar need, here is how I'm meeting the goal:
...
if somedata_needs_review:
review_status = user_review(somedata)
update_data(review_status)
...
def user_review(data_to_review):
""" Present web form to user and receive their input """
returnstruct = {}
app = Flask(__name__)
#app.route('/', methods=['GET'])
def show_review_form():
form = create_review_form(data_to_review)
return render_template('reviewtemplate.tpl', form=form)
# TODO - I currently split the handling into a different route because
# when using the same route previously Safari would warn of resubmitting data.
# This has the slightly unfortunate effect of creating multiple tabs.
#app.route('/process_compare', methods=['POST'])
def process_review_form():
# this form object will be populated with the submitted information
form = create_review_form(request.form, matchrecord=matchrecord)
# handle submitted updates however necessary
returnstruct['matched'] = form.process_changes.data
shutdown_flask_server()
return "Changes submitted, you can close this tab"
webbrowser.open('http://localhost:5000/', autoraise=True)
app.run(debug=True, use_reloader=False)
# The following will execute after the app is shutdown.
print('Finished manual review, returning {}'.format(returnstruct))
return(returnstruct)
def shutdown_flask_server():
func = request.environ.get('werkzeug.server.shutdown')
if func is None:
raise RuntimeError('Not running with the Werkzeug Server')
func()

Resources