concurrent queries using sqlAlchemy 'scoped_session' raising exception 'This transaction is inactive' - python-3.x

Sorry if this was asked before, but I search all over and couldn't find a solution to my problem.
I'm trying to achieve a very simple task, which is sending queries to postgres DB from multiple threads at the same time.
this is my setup:
engine = create_engine(
f'postgresql://postgres:{password}#{host}:5432/dbname',
pool_pre_ping=True).connect().execution_options(
schema_translate_map={None: "my_db"})
Session = scoped_session(sessionmaker(bind=engine))
#contextmanager
def db_session():
session = Session()
try:
yield session
session.commit()
except:
session.rollback()
raise
finally:
session.close()
This is what I'm trying to run from each thread(tried using native python Thread and also using APScheduler:
def query_build():
with db_session() as session:
session.query(Build).filter(Build.number == -1).all()
If I'm running the above method from 2 threads or more, I'm constantly receiving the below exception from each thread, while sending commit():
sqlalchemy.exc.InvalidRequestError: This transaction is inactive
As I read in many places, scoped_session is meant to be thread-safe, but from my experience it simply doesn't work.

apparently, the location of connect() is what caused this behavior.
removing the connect call solved the problem:
engine = create_engine(
f'postgresql://postgres:{password}#{host}:5432/dbname',
pool_pre_ping=True).execution_options(
schema_translate_map={None: "my_db"})
Session = scoped_session(sessionmaker(bind=engine))

Related

Boto3 client in multiprocessing pool fails with "botocore.exceptions.NoCredentialsError: Unable to locate credentials"

I'm using boto3 to connect to s3, download objects and do some processing. I'm using a multiprocessing pool to do the above.
Following is a synopsis of the code I'm using:
session = None
def set_global_session():
global session
if not session:
session = boto3.Session(region_name='us-east-1')
def function_to_be_sent_to_mp_pool():
s3 = session.client('s3', region_name='us-east-1')
list_of_b_n_o = list_of_buckets_and_objects
for bucket, object in list_of_b_n_o:
content = s3.get_object(Bucket=bucket, Key=key)
data = json.loads(content['Body'].read().decode('utf-8'))
write_processed_data_to_a_location()
def main():
pool = mp.Pool(initializer=set_global_session, processes=40)
pool.starmap(function_to_be_sent_to_mp_pool, list_of_b_n_o_i)
Now, when processes=40, everything works good. When processes = 64, still good.
However, when I increases to processes=128, I get the following error:
botocore.exceptions.NoCredentialsError: Unable to locate credentials
Our machine has the required IAM roles for accessing S3. Moreover, the weird thing that happens is that for some processes, it works fine, whereas for some others, it throws the credentials error. Why is this happening, and how to resolve this?
Another weird thing that happens is that I'm able to trigger two jobs in 2 separate terminal tabs (each of which has a separate ssh login shell to the machine). Each job spawns 64 processes, and that works fine as well, which means there are 128 processes running simultaneously. But 80 processes in one login shell fails.
Follow up:
I tried creating separate sessions for separate processes in one approach. In the other, I directly created s3-client using boto3.client. However, both of them throw the same error with 80 processes.
I also created separate clients with the following extra config:
Config(retries=dict(max_attempts=40), max_pool_connections=800)
This allowed me to use 80 processes at once, but anything > 80 fails with the same error.
Post follow up:
Can someone confirm if they've been able to use boto3 in multiprocessing with 128 processes?
This is actually a race condition on fetching the credentials. I'm not sure how fetching credentials under the hood works, but the I saw this question in Stack Overflow and this ticket in github.
I was able to resolve this by keeping a random wait time for each of the processes. The following is the updated code which works for me:
client_config = Config(retries=dict(max_attempts=400), max_pool_connections=800)
time.sleep(random.randint(0, num_processes*10)/1000) # random sleep time in milliseconds
s3 = boto3.client('s3', region_name='us-east-1', config=client_config)
I tried keeping the range for sleep time lesser than num_processes*10, but that failed again with the same issue.
#DenisDmitriev, since you are getting the credentials and storing them explicitly, I think that resolves the race condition and hence the issue is resolved.
PS: values for max_attempts and max_pool_connections don't have a logic. I was plugging several values until the race condition was figured out.
I suspect that AWS recently reduced throttling limits for metadata requests because I suddenly started running into the same issue. The solution that appears to work is to query credentials once before creating the pool and have the processes in the pool use them explicitly instead of making them query credentials again.
I am using fsspec with s3fs, and here's what my code for this looks like:
def get_aws_credentials():
'''
Retrieve current AWS credentials.
'''
import asyncio, s3fs
fs = s3fs.S3FileSystem()
# Try getting credentials
num_attempts = 5
for attempt in range(num_attempts):
credentials = asyncio.run(fs.session.get_credentials())
if credentials is not None:
if attempt > 0:
log.info('received credentials on attempt %s', 1 + attempt)
return asyncio.run(credentials.get_frozen_credentials())
time.sleep(15 * (random.random() + 0.5))
raise RuntimeError('failed to request AWS credentials '
'after %d attempts' % num_attempts)
def process_parallel(fn_d, max_processes):
# [...]
c = get_aws_credentials()
# Cache credentials
import fsspec.config
prev_s3_cfg = fsspec.config.conf.get('s3', {})
try:
fsspec.config.conf['s3'] = dict(prev_s3_cfg,
key=c.access_key,
secret=c.secret_key)
num_processes = min(len(fn_d), max_processes)
from concurrent.futures import ProcessPoolExecutor
with ProcessPoolExecutor(max_workers=num_processes) as pool:
for data in pool.map(process_file, fn_d, chunksize=10):
yield data
finally:
fsspec.config.conf['s3'] = prev_s3_cfg
Raw boto3 code will look essentially the same, except instead of the whole fs.session and asyncio.run() song and dance, you'll work with boto3.Session itself and call its get_credentials() and get_frozen_credentials() methods directly.
I get the same problem with multi process situation. I guess there is a client init problem when you use multi process. So I suggest that you can use get function to get s3 client. It works for me.
g_s3_cli = None
def get_s3_client(refresh=False):
global g_s3_cli
if not g_s3_cli or refresh:
g_s3_cli = boto3.client('s3')
return g_s3_cli

Multithreading with Selenium using Python and Telpot

I'm coding my first telegram bot, but now I have to serve multiple user at the same time.
This code it's just a little part, but it should help me to use multithread with selenium
class MessageCounter(telepot.helper.ChatHandler):
def __init__(self, *args, **kwargs):
super(MessageCounter, self).__init__(*args, **kwargs)
def on_chat_message(self, msg):
content_type, chat_type, chat_id = telepot.glance(msg)
chat_id = str(chat_id)
browser = browserSelenium.start_browser(chat_id)
userIsLogged = igLogin.checkAlreadyLoggedIn(browser, chat_id)
print(userIsLogged)
TOKEN = "***"
bot = telepot.DelegatorBot(TOKEN, [
pave_event_space()(
per_chat_id(), create_open, MessageCounter, timeout=10),
])
MessageLoop(bot).run_as_thread()
while 1:
time.sleep(10)
when the bot recive any message it starts a selenium session calling this function:
def start_browser(chat_id):
global browser
try:
browser.get('https://www.google.com')
#igLogin.checkAlreadyLoggedIn(browser)
#links = telegram.getLinks(24)
#instagramLikes(browser, links)
except Exception as e:
print("type error: " + str(e))
print('No such session! starting webDivers!')
sleep(3)
# CLIENT CONNECTION !!
chrome_options = Options()
chrome_options.add_argument('user-data-dir=/home/ale/botTelegram/users/'+ chat_id +'/cookies')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--lang=en')
print("Starting WebDrivers")
browser = webdriver.Chrome(options=chrome_options)
start_browser(chat_id)
return browser
and then this one check if the user is logged:
def checkAlreadyLoggedIn(browser, chat_id):
browser.get('https://www.instagram.com/instagram/')
try:
WebDriverWait(browser, 5).until(EC.element_to_be_clickable(
(By.XPATH, instagramClicks.buttonGoToProfile))).click()
print('User already Logged')
return True
except:
print('User not Logged')
userLogged = login(browser, chat_id)
return userLogged
and if the user is not logged it try to log the user in whit username and password
so, basically, when I write at the bot with one account everithing works fine, but if I write to the bot from two different account it opens two browser, but it controll just one.
What I mean it's that for example, one window remain over the google page, and then the other one recive two times the comand, so, even when it has to write the username, it writes the username two times
How can I interract with multiple sessions?
WebDriver is not thread-safe. Having said that, if you can serialise access to the underlying driver instance, you can share a reference in more than one thread. This is not advisable. But you can always instantiate one WebDriver instance for each thread.
Ideally the issue of thread-safety isn't in your code but in the actual browser bindings. They all assume there will only be one command at a time (e.g. like a real user). But on the other hand you can always instantiate one WebDriver instance for each thread which will launch multiple browsing tabs/windows. Till this point it seems your program is perfect.
Now, different threads can be run on same Webdriver, but then the results of the tests would not be what you expect. The reason behind is, when you use multi-threading to run different tests on different tabs/windows a little bit of thread safety coding is required or else the actions you will perform like click() or send_keys() will go to the opened tab/window that is currently having the focus regardless of the thread you expect to be running. Which essentially means all the test will run simultaneously on the same tab/window that has focus but not on the intended tab/window.
Reference
You can find a relevant detailed discussion in:
Chrome crashes after several hours while multiprocessing using Selenium through Python

track (un)succesfull object initialisation

I have a python script that controls different test-instruments (signal generator, amplifier, spectrum analyzer...) to automate a test.
These devices communicate over ethernet or serial with the pc running this python script.
I wrote a class for each device that I use. The script starts with initializing an instance of those classes. Something like this:
multimeter = Multimeter(192.168.1.5,5025)
amplifier = Amplifier(192.168.1.9,5025)
stirrer = Stirrer('COM4',9600)
.....
This can co wrong in many ways (battery is low, device not turned on, cable not connected ... )
It's possible to catch the errors with try/catch - try-except:
try:
multimeter = Multimeter(192.168.1.5,5025)
amplifier = Amplifier(192.168.1.9,5025)
stirrer = Stirrer('COM4',9600)
.....
except:
multimeter.close()
amplifier.close()
stirrer.close()
But now the problem is inside the except code block... We are not sure if the initialization of the objects succeeded and if they exist. They may not exist and so we can't call the close() method.
Because creating the instances is just normal sequential code, I know that when creating an instance of one of my classes fails, all the instances of the other classes previous to that line of code succeed. So you can try to create an instance of every class and check if that fails or not, and if it fails closing the connections of all previous objects.
try:
multimeter = Multimeter(192.168.1.5,5025)
except:
#problem with the multimeter
print('error')
try:
amplifier = Amplifier(192.168.1.9,5025)
except:
#problem with the amplifier, but we can close the multimeter
multimeter.close()
try:
stirrer = Stirrer('COM4',9600)
except:
#problem with the stirrer, but we can close the multimeter and the
amplifier
multimeter.close()
amplifier.close()
....
But I think this is ugly code? In particular when the number of objects (here test instruments grows, this becomes unmanageable. And it's sensitive for errors when you want to add or remove an object... Is there a better way to be sure that all connections are closed? Sockets should be closed on failure so we can assign the ip-address and port to a socket the next time the script is executed. Same with the serial interfaces, if it's not closed, it will raise an error to because you can't connect to a serial interface that already is open...
Use a container to store already created instruments, and split your code in short, independent, manageable parts:
def create_instruments(defs):
instruments = {}
for key, cls, params in instruments_defs:
try:
instruments[key] = cls(*params)
except Exception as e:
print("failed to instanciate '{}': {}".format(key, e))
close_instruments(instruments)
raise
return instruments
def close_instruments(intruments):
for key, instrument in intruments.items():
try:
instrument.close()
except Exception as e:
# just mention it - we can't do much more anyway
print("got error {} when closing {}".format(e, key))
instruments_defs = [
#(key, classname, (param1, ...)
("multimeter", Multimeter, ("192.168.1.5", 5025)),
("amplifier", Amplifier, ("192.168.1.9" ,5025)),
("stirrer", Stirrer, ('COM4',9600)),
]
instruments = create_instruments(instruments_defs)
You may also want to have a look at context managers (making sure resources are properly released is the main reason of context managers) but it might not necessarily be the best choice here (depends on how you use those objects, how your code is structured etc).
In fact, the solution that I'm suggesting in my question is the easiest way to solve this issue. In the try block, the script tries to initialize the instances one by one.
If you close the objects in the same order that they're created in the try block, then closing the connection will succeed for every test instrument, except for the instruments that where not initialized because of the error that happened in the try block.
(see comments in code snippet)
try:
multimeter = Multimeter(192.168.1.5,5025) #succes
amplifier = Amplifier(192.168.1.9,5025) #succes
stirrer = Stirrer('COM4',9600) # error COM4 is not available --> jump to except
generator = Generator() #not initialized because of error in stirrer init
otherTestInstrument = OtherTestInsrument() #not initialized because of error in stirrer init
.....
except:
multimeter.close() #initialized in try, so close() works
amplifier.close() #initialized in try, so close() works
stirrer.close() #probably initialized in try, so close() works probably
generator.close() #not initialized, will raise error, but doesn't matter.
otherTestInstrument.close() #also not initialized. No need to close it too.

SqlAlchemy sessions get stuck when not using them

I'm having a hard time implementing a "MySqlClient" class for my application. My application consists of several modules which have to make use of my database & some of the modules are running on other threads.
My intention is to make an instance for every module that needs to communicate with my MySql database. For example: every client connecting to a websocket server creates his own instance, a telegram bot client has its own instance, ..
I've been searching for days now, I've read the docs, searched the forums .. but somehow I'm missing something or I'm not implementing it the right way.
This is my class:
class MySqlClient():
engine = None
Session = None
def __init__(self):
# create engine
if MySqlClient.engine == None:
MySqlClient.engine = sqlalchemy.create_engine("mysql+mysqlconnector://{0}:{1}#{2}/{3}".format(
state.config["mysql"]["user"],
state.config["mysql"]["password"],
state.config["mysql"]["host"],
state.config["mysql"]["database"]
))
MySqlClient.Session = scoped_session(sessionmaker(bind=MySqlClient.engine))
Base.metadata.create_all(MySqlClient.engine)
self.session = MySqlClient.Session()
def get_budget(self, budget_id):
try:
q = self.session.query(
(Budget.amount).label("budgetAmount"),
func.sum(BudgetRecord.amount).label("total")
).all().filter(Budget.id == budget_id).join(BudgetRecord).filter(extract("month", BudgetRecord.ts) == datetime.datetime.now().month)
self.session.close()
return { "budgetAmount": q[0].budgetAmount, "total": 0.0 if q[0].total == None else q[0].total }
except Exception as ex:
logging.error(ex)
return None
When I start my application everything runs fine, I can execute the method "get_budget" returning the data. However, if after this I wait for 5 minutes, the method won't run again (if I don't wait, it still works). After about 15 minutes after I made the call, the query finally fails saying the MySql connection has dropped:
(mysql.connector.errors.OperationalError) MySQL Connection not available.
I also tried getting a new session before executing new queries. That didn't help either.
I've done things like this before but it's the first time I'm using an ORM & I'd like to keep the benefits of using ORM.
Any help would be greatly appreciated,
Regards

Setting and Getting Variables within an imported Class

All, I'm implementing websockets using flask/uWSGI. This is relegated to a module that's instantiated in the main application. Redacted code for the server and module:
main.py
from WSModule import WSModule
app = Flask(__name__)
wsmodule = WSModule()
websock = WebSocket(app)
#websock.route('/websocket')
def echo(ws):
wsmodule.register(ws)
print("websock clients", wsmodule.clients)
while True: # This while loop is related to the uWSGI websocket implementation
msg = ws.receive()
if msg is not None:
ws.send(msg)
else: return
#app.before_request
def before_request():
print ("app clients:",wsmodule.clients)
and WSModule.py:
class WSModule(object):
def __init__(self):
self.clients = list()
def register(self, client):
self.clients.append(client)
Problem: When a user connects using websockets (into the '/websocket' route), the wsmodule.register appends their connection socket, this works fine- printout 'websocket clients' shows the appended connection.
The issue is that I can't access those sockets from the main application. This is seen by the 'app clients' printout which never updates (list stays empty). Something is clearly updating, but how to access those changes?
It sounds like your program is being run with either threads or processes, and a wsmodule exists for each thread/process that is running.
So one wsmodule is being updated with the client info, while a different one is being asked for clients... but the one being asked is still empty.
If you are using threads, check out thread local storage.

Resources