Joblib threading with sqlalchemy not running parallel - multithreading
I am attempting to use joblib to run multiple inserts to my AWS Aurora serverless database. However, my joblib set up is evidently not running in parallel.
def insert_delay(chunk):
# time.sleep(delay)
print("woke up")
t1 = time.time()
engine, connection = form_target_connection() # forms a sqlalchemy connection to a database
table = Table('table', MetaData(), autoload_with=engine, autoload=True)
t2 = time.time()
engine.execute(table.insert().values(chunk))
print("time to form engine: {0}".format(t2-t1))
print("time to insert: {0}".format(time.time()-t2))
print("total time: {0}".format(time.time()-t1))
table = {
'id': [x for x in range(0,200000)],
'field_1': ["A"]*200000,
'field_2': ["A"]*200000,
'field_3': [None]*200000,
'field_4': [None]*200000,
'plan_id': [None]*200000
}
rows = pd.DataFrame(table )
rows = rows.to_dict("records")
split_list = [rows[i:i+10000] for i in range(0, len(rows), 10000)] # chunked list of records to insert
t1 = time.time()
Parallel(n_jobs=len(split_list),prefer="threads")(delayed(insert_delay)(
chunk=chunk) for chunk in split_list)
t2 = time.time()
print("total time taken {0} seconds".format(t2-t1))
It appears as though all of the threads are waiting for an unknown reason until they all decide it is time to run, and then the threads are dispatched in a sequential manner. THe database receives the initial query early on, at the start of the process, and waits to commit the insert.
Database pg_stat:
27881 | 2022-12-26 09:47:44.946317+00 | usename | COMMIT
27931 | 2022-12-26 09:47:12.779723+00 | usename | ROLLBACK
27935 | 2022-12-26 09:47:12.861166+00 | usename | ROLLBACK
27951 | 2022-12-26 09:47:44.633009+00 | usename | COMMIT
27953 | 2022-12-26 09:47:45.757837+00 | usename | COMMIT
27967 | 2022-12-26 09:47:13.604694+00 | usename | ROLLBACK
27968 | 2022-12-26 09:47:44.337865+00 | usename | COMMIT
28083 | 2022-12-26 09:47:14.296074+00 | usename | ROLLBACK
28085 | 2022-12-26 09:47:14.820437+00 | usename | SELECT 1
28094 | 2022-12-26 09:47:15.092353+00 | usename | ROLLBACK
28156 | 2022-12-26 09:47:15.375427+00 | usename | SELECT 1
28189 | 2022-12-26 09:47:15.958231+00 | usename | ROLLBACK
28191 | 2022-12-26 09:47:16.23949+00 | usename | SELECT 1
28258 | 2022-12-26 09:47:16.574562+00 | usename | ROLLBACK
28259 | 2022-12-26 09:47:16.643866+00 | usename | ROLLBACK
28260 | 2022-12-26 09:47:45.44661+00 | usename | COMMIT
The resulting stout in the console:
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
time to form engine: 2.7899487018585205
time to insert: 47.19905924797058
total time: 49.996002197265625
time to form engine: 1.9699492454528809
time to insert: 47.956050157547
total time: 49.95300889015198
time to form engine: 4.682957887649536
time to insert: 48.341052532196045
total time: 53.034003019332886
time to form engine: 3.8029818534851074
time to insert: 51.04203963279724
total time: 54.90102791786194
time to form engine: 2.8709421157836914
time to insert: 52.00807189941406
total time: 54.89199662208557
time to form engine: 4.640952110290527
time to insert: 51.26903581619263
total time: 55.92101573944092
time to form engine: 6.852921009063721
time to insert: 49.272080421447754
total time: 56.13800382614136
time to form engine: 3.828951120376587
time to insert: 53.08606839179993
total time: 56.93200707435608
time to form engine: 3.1449291706085205
time to insert: 53.941075563430786
total time: 57.09799885749817
time to form engine: 3.44292950630188
time to insert: 54.4070725440979
total time: 57.856006145477295
time to form engine: 6.421938180923462
time to insert: 51.53507709503174
total time: 57.98200750350952
time to form engine: 2.355935573577881
time to insert: 56.50207161903381
total time: 58.868988037109375
time to form engine: 4.254934549331665
time to insert: 54.78908324241638
total time: 59.05601978302002
time to form engine: 5.564959287643433
time to insert: 53.25906705856323
total time: 58.838005781173706
time to form engine: 5.875969171524048
time to insert: 53.78607487678528
total time: 59.75204014778137
time to form engine: 5.7959606647491455
time to insert: 54.870073556900024
total time: 60.672032833099365
time to form engine: 4.455930233001709
time to insert: 56.447890758514404
total time: 60.90981578826904
time to form engine: 4.040930271148682
time to insert: 57.660929441452026
total time: 61.713823080062866
time to form engine: 27.4959716796875
time to insert: 35.441853284835815
total time: 62.942832469940186
time to form engine: 59.9250111579895
time to insert: 4.208839178085327
total time: 64.13585066795349
total time taken 64.68781971931458 seconds
Related
Application Insights Query to display Total Request vs Total Passed vs Total Failed
Can any one share Azure Application Insights Query to display Total Request vs Total Passed vs Total Failed for a given test duration: Operation Totalcount TotalPassed TotalFailed Request1 10 5 5 Request2 10 7 3
Thanks to # lubumbax your answer helped a lot to improve my query knowledge. Here I am using a query to fetch the Total count, Success, and Failure response from Application insights. The Query follows: let TOTAL = requests | where timestamp > ago(1d) | summarize TotalRequests=sum(itemCount) | extend Foo=1; let Req_TOTAL = materialize(TOTAL); let FAILED = requests | where timestamp > ago(1d) | where resultCode hasprefix "5" | summarize Failed=sum(itemCount) | extend Foo=1; let Req_FAILED = materialize(FAILED); let SUCCESS = requests | where timestamp > ago(1d) | where resultCode hasprefix "2" | summarize Success=sum(itemCount) | extend Foo=1; let Req_SUCCESSED = materialize(SUCCESS); Req_FAILED | join kind=inner Req_TOTAL on Foo | join kind=inner Req_SUCCESSED on Foo | extend PercentFailed = round(todouble(Failed * 100) / TotalRequests, 2) | extend PercentSuccess = round(todouble(Success * 100)/ TotalRequests, 2) | project TotalRequests, Failed, Success, PercentFailed, PercentSuccess; availabilityResults The Result :
want to create a timestamp using GPS data with python3
python3 Newby here. I am trying to create a variable that I can use to make a GPS timestamp from an adafruit GPS sensor. I eventually want to store this in a db. The mysql database has a timestamp feature when data is inserted into a table so I want to have that and the UTC time and date that comes from the GPS device be stored as well. It seems I have something wrong and can not figure it out. The code is hanging on this: def gpstsRead(): gps_ts = '{}/{}/{} {:02}:{:02}:{:02}'.format( gps.timestamp_utc.tm_mon, gps.timestamp_utc.tm_mday, gps.timestamp_utc.tm_year, gps.timestamp_utc.tm_hour, gps.timestamp_utc.tm_min, gps.timestamp_utc.tm_sec, ) return gps_ts I am trying to put all of these into a timestamp like format. The error is this: Traceback (most recent call last): File "/home/pi/ek9/Sensors/GPS/gps-db-insert.py", line 57, in <module> gps_ts = gpstsRead() File "/home/pi/ek9/Sensors/GPS/gps-db-insert.py", line 20, in gpstsRead gps.timestamp_utc.tm_mon, AttributeError: 'NoneType' object has no attribute 'tm_mon' I have made sure I use spaces instead of tabs as that has caused me grief in the past. Beyond that I really don't know. I have been putzing with this for hours to no avail. Any ideas? thanks for any suggestions.
Thanks all for the input. After reading these I decided to try it a little different. Instead of defining a variable with "def" i just decided to eliminate the "def" and just create the variable itself. Like this: # define values gps_ts = ('{}-{}-{} {:02}:{:02}:{:02}'.format( gps.timestamp_utc.tm_year, gps.timestamp_utc.tm_mon, gps.timestamp_utc.tm_mday, gps.timestamp_utc.tm_hour, gps.timestamp_utc.tm_min, gps.timestamp_utc.tm_sec,) ) And that seemed to work. For anyone that is doing something similar I will include the complete code. I also understand that like most languages, there is always more than one way to get the job done, some better than others, some not. I am still learning. If anyone cares to point out how I could accomplish the same task by doing something different or more efficient, please feel free to provide me the opportunity to learn. thanks again! #!/usr/bin/python3 import pymysql import time import board from busio import I2C import adafruit_gps i2c = I2C(board.SCL, board.SDA) gps = adafruit_gps.GPS_GtopI2C(i2c) # Use I2C interface gps.send_command(b"PMTK314,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0") gps.send_command(b"PMTK220,1000") last_print = time.monotonic() # Open database connection db = pymysql.connect("localhost", "database", "password", "table") # prepare a cursor object using cursor() method cursor = db.cursor() while True: gps.update() current = time.monotonic() if current - last_print >= 1.0: # update rate last_print = current if not gps.has_fix: # Try again if we don't have a fix yet. print("Waiting for a satellite fix...") continue # define values gps_ts = ('{}-{}-{} {:02}:{:02}:{:02}'.format( gps.timestamp_utc.tm_year, gps.timestamp_utc.tm_mon, gps.timestamp_utc.tm_mday, gps.timestamp_utc.tm_hour, gps.timestamp_utc.tm_min, gps.timestamp_utc.tm_sec,) ) gps_lat = '{}'.format(gps.latitude) gps_long = '{}'.format(gps.longitude) gps_fix = '{}'.format(gps.fix_quality) gps_sat = '{}'.format(gps.satellites) gps_alt = '{}'.format(gps.altitude_m) gps_speed = '{}'.format(gps.speed_knots) gps_track = '{}'.format(gps.track_angle_deg) sql = "INSERT into ek9_gps(gps_timestamp_utc, latitude, \ longitude, fix_quality, number_satellites, gps_altitude, \ gps_speed, gps_track_angle) \ values (%s,%s,%s,%s,%s,%s,%s,%s)" arg = (gps_ts, gps_lat, gps_long, gps_fix, gps_sat, gps_alt, gps_speed, gps_track) try: # Execute the SQL command cursor.execute(sql, arg) # Commit your changes in the database db.commit() except: print('There was an error on input into the database') # Rollback in case there is any error db.rollback() # disconnect from server cursor.close() db.close() And this is what the mariadb shows: +----+---------------------+---------------------+----------+-----------+-------------+-------------------+--------------+-----------+-----------------+ | id | datetime | gps_timestamp_utc | latitude | longitude | fix_quality | number_satellites | gps_altitude | gps_speed | gps_track_angle | +----+---------------------+---------------------+----------+-----------+-------------+-------------------+--------------+-----------+-----------------+ | 11 | 2020-12-30 14:14:42 | 2020-12-30 20:14:42 | xx.xxxx | -xx.xxxx | 1 | 10 | 232 | 0 | 350 | | 10 | 2020-12-30 14:14:41 | 2020-12-30 20:14:41 | xx.xxxx | -xx.xxxx | 1 | 10 | 232 | 0 | 350 | | 9 | 2020-12-30 14:14:39 | 2020-12-30 20:14:39 | xx.xxxx | -xx.xxxx | 1 | 10 | 232 | 0 | 350 | | 8 | 2020-12-30 14:14:38 | 2020-12-30 20:14:38 | xx.xxxx | -xx.xxxx | 1 | 10 | 232 | 0 | 350 | Success!!! Thanks again!
how to resolve time difference of 8 hours mariadb and nodejs on same linux server
Would like advise on why there is time difference of 8 hours and how to resolve it? Thanks The actual date and time of execution is 2019-08-16 10:37:41 MariaDB table pure insert SQL statement : INSERT INTO ib_system_log (login, ACTION, action_type) VALUES ('test', 'test', 'test'); Result : | login | action | create_date_time | action_type | | test | test | 2019-08-16 10:37:41 | test | Sequelize insert via NodeJS server SQL statement: sequelize.query ("INSERT INTO ib_system_log (`login`, `action`, `action_type`) VALUES (:login, :action, :action_type)",{ replacements: { login: "test", action: "test", action_type: "test" }, type: sequelize.QueryTypes.INSERT }) Result : | login | action | create_date_time | action_type | | test | test | 2019-08-16 02:37:41 | test |
Set "process.env.TZ" in program that runs Sequelize process.env.TZ = 'Europe/Amsterdam'; Of course, you must adapt timezone to your location. Check here for list of timezones
Azure shared dashboard is not be updated
I have such a query: let start=datetime("2019-06-22T01:44:00.000"); let end=datetime("2019-06-22T07:44:00.000"); let timeGrain=5m; let dataset1= requests | where timestamp > start and timestamp < end ; dataset1 | summarize Gesamt=sum(itemCount) , Durchschnittsdauer=round(avg(duration /1000),2), Instanz=dcount(cloud_RoleInstance) by Funktionsname=name | join kind= inner ( exceptions | where timestamp > start and timestamp < end | summarize Fehler=count() by Funktionsname=operation_Name ) on Funktionsname | project Funktionsname ,Gesamt , Erfolgreich=Gesamt - Fehler, Fehler, Durchschnittsdauer If I test it in Application insight query manager, I get data. But after I pin it to the share dashboard, and changing the Time (local and UTC) the dashboard shows me no results. Do you know how can I solve this problem?
I got it I should change starttime and endtime to: let start=datetime("2019-06-24 13:44:00.000Z"); let end=datetime("2019-06-24 19:44:00.000Z");
Python memory related and time consumption related
I have a huge file of around 80mb from which I am generating one other file which orders the data. But as the file size is very huge , the program wrote by me is taking a lot of time (around 1Hr).So how to reduce the time duration? Below is the content of the file. Logging at 1/20/2019 12:00:00 AM Test Manager | 1706 TestStandEngineWrapper | 1403 Logging at 1/20/2019 12:00:01 AM Test Manager | 1706 TestStandEngineWrapper | 1403 Like this there are thousands of entries which I am trying to order in the below format. I am arranging them in the below format. Test Manager | 1706 | Logging at 1/20/2019 12:00:00 AM Test Manager | 1706 | Logging at 1/20/2019 12:00:01 AM TestStandEngineWrapper | 1403 | Logging at 1/20/2019 12:00:00 AM TestStandEngineWrapper | 1403 | Logging at 1/20/2019 12:00:01 AM import re file=open("C:\\Users\\puru\\Desktop\\xyz.txt","rt") file1=open("C:\\Users\\puru\\Desktop\\xyz1.txt","wt") file1.write("") arr1=file.readlines() str1="" str2="" arr2=[] arr3=[] arr4=[] #for j in iter(file.readline, ''): for i,j in enumerate (arr1): if "Logging" in j: str1=j elif "Logging" not in j: arr3.append(j.split("|")[0]) str2=j.rstrip()+" | "+str1 arr2.append(str2) str2="" for i in arr3: if i not in arr4: arr4.append(i) for j in arr4: for k in arr2: if re.match(j,k): file1=open("C:\\Users\\puru\\Desktop\\xyz1.txt","at") file1.write(k) file1.close() file.close() Though I am getting the desired output ,as it takes a lot of time ,it is not that useful. Could you please suggest something to reduce the time?