Boto3 client in multiprocessing pool fails with "botocore.exceptions.NoCredentialsError: Unable to locate credentials" - python-3.x

I'm using boto3 to connect to s3, download objects and do some processing. I'm using a multiprocessing pool to do the above.
Following is a synopsis of the code I'm using:
session = None
def set_global_session():
global session
if not session:
session = boto3.Session(region_name='us-east-1')
def function_to_be_sent_to_mp_pool():
s3 = session.client('s3', region_name='us-east-1')
list_of_b_n_o = list_of_buckets_and_objects
for bucket, object in list_of_b_n_o:
content = s3.get_object(Bucket=bucket, Key=key)
data = json.loads(content['Body'].read().decode('utf-8'))
write_processed_data_to_a_location()
def main():
pool = mp.Pool(initializer=set_global_session, processes=40)
pool.starmap(function_to_be_sent_to_mp_pool, list_of_b_n_o_i)
Now, when processes=40, everything works good. When processes = 64, still good.
However, when I increases to processes=128, I get the following error:
botocore.exceptions.NoCredentialsError: Unable to locate credentials
Our machine has the required IAM roles for accessing S3. Moreover, the weird thing that happens is that for some processes, it works fine, whereas for some others, it throws the credentials error. Why is this happening, and how to resolve this?
Another weird thing that happens is that I'm able to trigger two jobs in 2 separate terminal tabs (each of which has a separate ssh login shell to the machine). Each job spawns 64 processes, and that works fine as well, which means there are 128 processes running simultaneously. But 80 processes in one login shell fails.
Follow up:
I tried creating separate sessions for separate processes in one approach. In the other, I directly created s3-client using boto3.client. However, both of them throw the same error with 80 processes.
I also created separate clients with the following extra config:
Config(retries=dict(max_attempts=40), max_pool_connections=800)
This allowed me to use 80 processes at once, but anything > 80 fails with the same error.
Post follow up:
Can someone confirm if they've been able to use boto3 in multiprocessing with 128 processes?

This is actually a race condition on fetching the credentials. I'm not sure how fetching credentials under the hood works, but the I saw this question in Stack Overflow and this ticket in github.
I was able to resolve this by keeping a random wait time for each of the processes. The following is the updated code which works for me:
client_config = Config(retries=dict(max_attempts=400), max_pool_connections=800)
time.sleep(random.randint(0, num_processes*10)/1000) # random sleep time in milliseconds
s3 = boto3.client('s3', region_name='us-east-1', config=client_config)
I tried keeping the range for sleep time lesser than num_processes*10, but that failed again with the same issue.
#DenisDmitriev, since you are getting the credentials and storing them explicitly, I think that resolves the race condition and hence the issue is resolved.
PS: values for max_attempts and max_pool_connections don't have a logic. I was plugging several values until the race condition was figured out.

I suspect that AWS recently reduced throttling limits for metadata requests because I suddenly started running into the same issue. The solution that appears to work is to query credentials once before creating the pool and have the processes in the pool use them explicitly instead of making them query credentials again.
I am using fsspec with s3fs, and here's what my code for this looks like:
def get_aws_credentials():
'''
Retrieve current AWS credentials.
'''
import asyncio, s3fs
fs = s3fs.S3FileSystem()
# Try getting credentials
num_attempts = 5
for attempt in range(num_attempts):
credentials = asyncio.run(fs.session.get_credentials())
if credentials is not None:
if attempt > 0:
log.info('received credentials on attempt %s', 1 + attempt)
return asyncio.run(credentials.get_frozen_credentials())
time.sleep(15 * (random.random() + 0.5))
raise RuntimeError('failed to request AWS credentials '
'after %d attempts' % num_attempts)
def process_parallel(fn_d, max_processes):
# [...]
c = get_aws_credentials()
# Cache credentials
import fsspec.config
prev_s3_cfg = fsspec.config.conf.get('s3', {})
try:
fsspec.config.conf['s3'] = dict(prev_s3_cfg,
key=c.access_key,
secret=c.secret_key)
num_processes = min(len(fn_d), max_processes)
from concurrent.futures import ProcessPoolExecutor
with ProcessPoolExecutor(max_workers=num_processes) as pool:
for data in pool.map(process_file, fn_d, chunksize=10):
yield data
finally:
fsspec.config.conf['s3'] = prev_s3_cfg
Raw boto3 code will look essentially the same, except instead of the whole fs.session and asyncio.run() song and dance, you'll work with boto3.Session itself and call its get_credentials() and get_frozen_credentials() methods directly.

I get the same problem with multi process situation. I guess there is a client init problem when you use multi process. So I suggest that you can use get function to get s3 client. It works for me.
g_s3_cli = None
def get_s3_client(refresh=False):
global g_s3_cli
if not g_s3_cli or refresh:
g_s3_cli = boto3.client('s3')
return g_s3_cli

Related

How to shut down CherryPy in no incoming connections for specified time?

I am using CherryPy to speak to an authentication server. The script runs fine if all the inputted information is fine. But if they make an mistake typing their ID the internal HTTP error screen fires ok, but the server keeps running and nothing else in the script will run until the CherryPy engine is closed so I have to manually kill the script. Is there some code I can put in the index along the lines of
if timer >10 and connections == 0:
close cherrypy (< I have a method for this already)
Im mostly a data mangler, so not used to web servers. Googling shows lost of hits for closing CherryPy when there are too many connections but not when there have been no connections for a specified (short) time. I realise the point of a web server is usually to hang around waiting for connections, so this may be an odd case. All the same, any help welcome.
Interesting use case, you can use the CherryPy plugins infrastrcuture to do something like that, take a look at this ActivityMonitor plugin implementation, it shutdowns the server if is not handling anything and haven't seen any request in a specified amount of time (in this case 10 seconds).
Maybe you have to adjust the logic on how to shut it down or do anything else in the _verify method.
If you want to read a bit more about the publish/subscribe architecture take a look at the CherryPy Docs.
import time
import threading
import cherrypy
from cherrypy.process.plugins import Monitor
class ActivityMonitor(Monitor):
def __init__(self, bus, wait_time, monitor_time=None):
"""
bus: cherrypy.engine
wait_time: Seconds since last request that we consider to be active.
monitor_time: Seconds that we'll wait before verifying the activity.
If is not defined, wait half the `wait_time`.
"""
if monitor_time is None:
# if monitor time is not defined, then verify half
# the wait time since the last request
monitor_time = wait_time / 2
super().__init__(
bus, self._verify, monitor_time, self.__class__.__name__
)
# use a lock to make sure the thread that triggers the before_request
# and after_request does not collide with the monitor method (_verify)
self._active_request_lock = threading.Lock()
self._active_requests = 0
self._wait_time = wait_time
self._last_request_ts = time.time()
def _verify(self):
# verify that we don't have any active requests and
# shutdown the server in case we haven't seen any activity
# since self._last_request_ts + self._wait_time
with self._active_request_lock:
if (not self._active_requests and
self._last_request_ts + self._wait_time < time.time()):
self.bus.exit() # shutdown the engine
def before_request(self):
with self._active_request_lock:
self._active_requests += 1
def after_request(self):
with self._active_request_lock:
self._active_requests -= 1
# update the last time a request was served
self._last_request_ts = time.time()
class Root:
#cherrypy.expose
def index(self):
return "Hello user: current time {:.0f}".format(time.time())
def main():
# here is how to use the plugin:
ActivityMonitor(cherrypy.engine, wait_time=10, monitor_time=5).subscribe()
cherrypy.quickstart(Root())
if __name__ == '__main__':
main()

How to lock virtualbox to get a screenshot through SOAP API

I'm trying to use the SOAP interface of Virtualbox 6.1 from Python to get a screenshot of a machine. I can start the machine but get locking errors whenever I try to retrieve the screen layout.
This is the code:
import zeep
# helper to show the session lock status
def show_lock_state(session_id):
session_state = service.ISession_getState(session_id)
print('current session state:', session_state)
# connect
client = zeep.Client('http://127.0.0.1:18083?wsdl')
service = client.create_service("{http://www.virtualbox.org/}vboxBinding", 'http://127.0.0.1:18083?wsdl')
manager_id = service.IWebsessionManager_logon('fakeuser', 'fakepassword')
session_id = service.IWebsessionManager_getSessionObject(manager_id)
# get the machine id and start it
machine_id = service.IVirtualBox_findMachine(manager_id, 'Debian')
progress_id = service.IMachine_launchVMProcess(machine_id, session_id, 'gui')
service.IProgress_waitForCompletion(progress_id, -1)
print('Machine has been started!')
show_lock_state(session_id)
# unlock and then lock to be sure, doesn't have any effect apparently
service.ISession_unlockMachine(session_id)
service.IMachine_lockMachine(machine_id, session_id, 'Shared')
show_lock_state(session_id)
console_id = service.ISession_getConsole(session_id)
display_id = service.IConsole_getDisplay(console_id)
print(service.IDisplay_getGuestScreenLayout(display_id))
The machine is started properly but the last line gives the error VirtualBox error: rc=0x80004001 which from what I read around means locked session.
I tried to release and acquire the lock again, but even though it succeeds the error remains. I went through the documentation but cannot find other types of locks that I'm supposed to use, except the Write lock which is not usable here since the machine is running. I could not find any example in any language.
I found an Android app called VBoxManager with this SOAP screenshot capability.
Running it through a MITM proxy I reconstructed the calls it performs and wrote them as the Zeep equivalent. In case anyone is interested in the future, the last lines of the above script are now:
console_id = service.ISession_getConsole(session_id)
display_id = service.IConsole_getDisplay(console_id)
resolution = service.IDisplay_getScreenResolution(display_id, 0)
print(f'display data: {resolution}')
image_data = service.IDisplay_takeScreenShotToArray(
display_id,
0,
resolution['width'],
resolution['height'],
'PNG')
with open('screenshot.png', 'wb') as f:
f.write(base64.b64decode(image_data))

SqlAlchemy sessions get stuck when not using them

I'm having a hard time implementing a "MySqlClient" class for my application. My application consists of several modules which have to make use of my database & some of the modules are running on other threads.
My intention is to make an instance for every module that needs to communicate with my MySql database. For example: every client connecting to a websocket server creates his own instance, a telegram bot client has its own instance, ..
I've been searching for days now, I've read the docs, searched the forums .. but somehow I'm missing something or I'm not implementing it the right way.
This is my class:
class MySqlClient():
engine = None
Session = None
def __init__(self):
# create engine
if MySqlClient.engine == None:
MySqlClient.engine = sqlalchemy.create_engine("mysql+mysqlconnector://{0}:{1}#{2}/{3}".format(
state.config["mysql"]["user"],
state.config["mysql"]["password"],
state.config["mysql"]["host"],
state.config["mysql"]["database"]
))
MySqlClient.Session = scoped_session(sessionmaker(bind=MySqlClient.engine))
Base.metadata.create_all(MySqlClient.engine)
self.session = MySqlClient.Session()
def get_budget(self, budget_id):
try:
q = self.session.query(
(Budget.amount).label("budgetAmount"),
func.sum(BudgetRecord.amount).label("total")
).all().filter(Budget.id == budget_id).join(BudgetRecord).filter(extract("month", BudgetRecord.ts) == datetime.datetime.now().month)
self.session.close()
return { "budgetAmount": q[0].budgetAmount, "total": 0.0 if q[0].total == None else q[0].total }
except Exception as ex:
logging.error(ex)
return None
When I start my application everything runs fine, I can execute the method "get_budget" returning the data. However, if after this I wait for 5 minutes, the method won't run again (if I don't wait, it still works). After about 15 minutes after I made the call, the query finally fails saying the MySql connection has dropped:
(mysql.connector.errors.OperationalError) MySQL Connection not available.
I also tried getting a new session before executing new queries. That didn't help either.
I've done things like this before but it's the first time I'm using an ORM & I'd like to keep the benefits of using ORM.
Any help would be greatly appreciated,
Regards

Configuring my Python Flask app to call a function every 30 seconds

I have a flask app that returns a JSON response. However, I want it to call that function every 30 seconds without clicking the refresh button on the browser. Here is what I did
Using apscheduler
. This code in application.py
from apscheduler.schedulers.background import BachgroundScheduler
def create_app(config_filname):
con = redis.StrictRedis(host= "localhost", port=6379, charset ="utf-8", decode_responses=True, db=0)
application = Flask(__name__)
CORS(application)
sched = BackgroundScheduler()
#application.route('/users')
#cross_origin()
#sched.scheduled_job('interval', seconds = 20)
def get_users():
//Some code...
return jsonify(users)
sched.start()
return application
Then in my wsgi.py
from application import create_app
application = create_app('application.cfg')
with application.app_context():
if __name__ == "__main__":
application.run()
When I run this appliaction, I get the json output but it does not refresh instead after 20 seconds it throws
RuntimeError: Working outside of application context.
This typically means that you attempted to use functionality that needed
to interface with the current application object in some way. To solve
this, set up an application context with app.app_context(). See the
documentation for more information.
What am I doing wrong? I would appreciate any advise.
Apologies if this in a way subverting the question, but if you want the users to be sent every 30 seconds, this probably shouldn't be done in the backend. The backend should only ever send out data when a request is made. In order for the data to be sent at regular intervals the frontend needs to be configured to make requests at regular intervals
Personally I'd recommend doing this with a combination of i-frames and javascript, as described in this stack overflow question:
Auto Refresh IFrame HTML
Lastly, when it comes to your actual code, it seems like there is an error here:
if __name__ == "__main__":
application.run()
The "application.run()" line should be indented as it is inside the if statement

Flask_Sqlalchemy with multithreaded Apache. Sessions out of sync with database

Background: Apache server using mod_wsgi to serve a Flask app using Flask_Sqlalchemy connecting to MySQL. This is a full stack application so it is nearly impossible to create a minimal example but I have tried.
My problem is that when I make some change that should modify the database subsequent requests don't always seem to reflect that change. For example if I create an object, then try to edit that same object, the edit will sometimes fail.
Most of the time if I create an object then go to the page listing all the objects, it will not show up on the list. Sometimes it will show up until I refresh, when it will disappear, and with another refresh it shows up again.
The same happens with edits. Example code:
bp = Blueprint('api_region', __name__, url_prefix='/app/region')
#bp.route('/rename/<int:region_id>/<string:name>', methods=['POST'])
def change_name(region_id, name):
region = Region.query.get(region_id)
try:
region.name = name
except AttributeError:
abort(404)
db.session.add(region)
db.session.commit()
return "Success"
#bp.route('/name/<int:region_id>/', methods=['GET'])
def get_name(region_id):
region = Region.query.get(region_id)
try:
name = region.name
except AttributeError:
abort(404)
return name
After object is created send a POST
curl -X POST https://example.com/app/region/rename/5/Europe
Then several GETs
curl -X GET https://example.com/app/region/name/5/
Sometimes, the GET will return the correct info, but every now and then it will return whatever it was before. Further example output https://pastebin.com/s8mqRHSR it happens at varying frequency but about one in 25 will fail, and it isn't always the "last" value either, when testing it seems to get 'stuck' at a certain value no matter how many times I change it up.
I am using the "dynamically bound" example of Flask_Sqlalchemy
db = SQLAlchemy()
def create_app():
app = Flask(__name__)
db.init_app(app)
... snip ...
return app
Which creates a scoped_session accessible in db.session.
Apache config is long and complicated but includes the line
WSGIDaemonProcess pixel processes=5 threads=5 display-name='%{GROUP}'
I can post more information if required.
For reference if anyone finds this thread with the same issue, I fixed my problem.
My Flask App factory function had the line app.app_context().push() leftover from the early days when it was based off a Flask tutorial. Unfortunately snipped out of the example code otherwise it might have been spotted by someone. During a restructuring of the project this line was left out and the problem fixed itself. Not sure why or how this line would cause this issue, and only for some but not all requests.

Resources