Joblib threading with sqlalchemy not running parallel

Joblib threading with sqlalchemy not running parallel - multithreading

I am attempting to use joblib to run multiple inserts to my AWS Aurora serverless database. However, my joblib set up is evidently not running in parallel.
def insert_delay(chunk):
# time.sleep(delay)
print("woke up")
t1 = time.time()
engine, connection = form_target_connection() # forms a sqlalchemy connection to a database
table = Table('table', MetaData(), autoload_with=engine, autoload=True)
t2 = time.time()
engine.execute(table.insert().values(chunk))
print("time to form engine: {0}".format(t2-t1))
print("time to insert: {0}".format(time.time()-t2))
print("total time: {0}".format(time.time()-t1))
table = {
'id': [x for x in range(0,200000)],
'field_1': ["A"]*200000,
'field_2': ["A"]*200000,
'field_3': [None]*200000,
'field_4': [None]*200000,
'plan_id': [None]*200000
}
rows = pd.DataFrame(table )
rows = rows.to_dict("records")
split_list = [rows[i:i+10000] for i in range(0, len(rows), 10000)] # chunked list of records to insert
t1 = time.time()
Parallel(n_jobs=len(split_list),prefer="threads")(delayed(insert_delay)(
chunk=chunk) for chunk in split_list)
t2 = time.time()
print("total time taken {0} seconds".format(t2-t1))
It appears as though all of the threads are waiting for an unknown reason until they all decide it is time to run, and then the threads are dispatched in a sequential manner. THe database receives the initial query early on, at the start of the process, and waits to commit the insert.
Database pg_stat:
27881 | 2022-12-26 09:47:44.946317+00 | usename | COMMIT
27931 | 2022-12-26 09:47:12.779723+00 | usename | ROLLBACK
27935 | 2022-12-26 09:47:12.861166+00 | usename | ROLLBACK
27951 | 2022-12-26 09:47:44.633009+00 | usename | COMMIT
27953 | 2022-12-26 09:47:45.757837+00 | usename | COMMIT
27967 | 2022-12-26 09:47:13.604694+00 | usename | ROLLBACK
27968 | 2022-12-26 09:47:44.337865+00 | usename | COMMIT
28083 | 2022-12-26 09:47:14.296074+00 | usename | ROLLBACK
28085 | 2022-12-26 09:47:14.820437+00 | usename | SELECT 1
28094 | 2022-12-26 09:47:15.092353+00 | usename | ROLLBACK
28156 | 2022-12-26 09:47:15.375427+00 | usename | SELECT 1
28189 | 2022-12-26 09:47:15.958231+00 | usename | ROLLBACK
28191 | 2022-12-26 09:47:16.23949+00 | usename | SELECT 1
28258 | 2022-12-26 09:47:16.574562+00 | usename | ROLLBACK
28259 | 2022-12-26 09:47:16.643866+00 | usename | ROLLBACK
28260 | 2022-12-26 09:47:45.44661+00 | usename | COMMIT
The resulting stout in the console:
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
woke up
time to form engine: 2.7899487018585205
time to insert: 47.19905924797058
total time: 49.996002197265625
time to form engine: 1.9699492454528809
time to insert: 47.956050157547
total time: 49.95300889015198
time to form engine: 4.682957887649536
time to insert: 48.341052532196045
total time: 53.034003019332886
time to form engine: 3.8029818534851074
time to insert: 51.04203963279724
total time: 54.90102791786194
time to form engine: 2.8709421157836914
time to insert: 52.00807189941406
total time: 54.89199662208557
time to form engine: 4.640952110290527
time to insert: 51.26903581619263
total time: 55.92101573944092
time to form engine: 6.852921009063721
time to insert: 49.272080421447754
total time: 56.13800382614136
time to form engine: 3.828951120376587
time to insert: 53.08606839179993
total time: 56.93200707435608
time to form engine: 3.1449291706085205
time to insert: 53.941075563430786
total time: 57.09799885749817
time to form engine: 3.44292950630188
time to insert: 54.4070725440979
total time: 57.856006145477295
time to form engine: 6.421938180923462
time to insert: 51.53507709503174
total time: 57.98200750350952
time to form engine: 2.355935573577881
time to insert: 56.50207161903381
total time: 58.868988037109375
time to form engine: 4.254934549331665
time to insert: 54.78908324241638
total time: 59.05601978302002
time to form engine: 5.564959287643433
time to insert: 53.25906705856323
total time: 58.838005781173706
time to form engine: 5.875969171524048
time to insert: 53.78607487678528
total time: 59.75204014778137
time to form engine: 5.7959606647491455
time to insert: 54.870073556900024
total time: 60.672032833099365
time to form engine: 4.455930233001709
time to insert: 56.447890758514404
total time: 60.90981578826904
time to form engine: 4.040930271148682
time to insert: 57.660929441452026
total time: 61.713823080062866
time to form engine: 27.4959716796875
time to insert: 35.441853284835815
total time: 62.942832469940186
time to form engine: 59.9250111579895
time to insert: 4.208839178085327
total time: 64.13585066795349
total time taken 64.68781971931458 seconds

Related

Application Insights Query to display Total Request vs Total Passed vs Total Failed

Can any one share Azure Application Insights Query to display Total Request vs Total Passed vs Total Failed for a given test duration:
Operation Totalcount TotalPassed TotalFailed
Request1 10 5 5
Request2 10 7 3

Thanks to # lubumbax your answer helped a lot to improve my query knowledge.
Here I am using a query to fetch the Total count, Success, and Failure response from Application insights.
The Query follows:
let TOTAL = requests | where timestamp > ago(1d) | summarize TotalRequests=sum(itemCount) | extend Foo=1;
let Req_TOTAL = materialize(TOTAL);
let FAILED = requests
| where timestamp > ago(1d)
| where resultCode hasprefix "5"
| summarize Failed=sum(itemCount)
| extend Foo=1;
let Req_FAILED = materialize(FAILED);
let SUCCESS = requests
| where timestamp > ago(1d)
| where resultCode hasprefix "2"
| summarize Success=sum(itemCount)
| extend Foo=1;
let Req_SUCCESSED = materialize(SUCCESS);
Req_FAILED
| join kind=inner Req_TOTAL on Foo
| join kind=inner Req_SUCCESSED on Foo
| extend PercentFailed = round(todouble(Failed * 100) / TotalRequests, 2)
| extend PercentSuccess = round(todouble(Success * 100)/ TotalRequests, 2)
| project TotalRequests, Failed, Success, PercentFailed, PercentSuccess; availabilityResults
The Result :

want to create a timestamp using GPS data with python3

python3 Newby here. I am trying to create a variable that I can use to make a GPS timestamp from an adafruit GPS sensor. I eventually want to store this in a db. The mysql database has a timestamp feature when data is inserted into a table so I want to have that and the UTC time and date that comes from the GPS device be stored as well.
It seems I have something wrong and can not figure it out. The code is hanging on this:
def gpstsRead():
gps_ts = '{}/{}/{} {:02}:{:02}:{:02}'.format(
gps.timestamp_utc.tm_mon,
gps.timestamp_utc.tm_mday,
gps.timestamp_utc.tm_year,
gps.timestamp_utc.tm_hour,
gps.timestamp_utc.tm_min,
gps.timestamp_utc.tm_sec,
)
return gps_ts
I am trying to put all of these into a timestamp like format. The error is this:
Traceback (most recent call last):
File "/home/pi/ek9/Sensors/GPS/gps-db-insert.py", line 57, in <module>
gps_ts = gpstsRead()
File "/home/pi/ek9/Sensors/GPS/gps-db-insert.py", line 20, in gpstsRead
gps.timestamp_utc.tm_mon,
AttributeError: 'NoneType' object has no attribute 'tm_mon'
I have made sure I use spaces instead of tabs as that has caused me grief in the past. Beyond that I really don't know. I have been putzing with this for hours to no avail. Any ideas? thanks for any suggestions.

Thanks all for the input. After reading these I decided to try it a little different. Instead of defining a variable with "def" i just decided to eliminate the "def" and just create the variable itself.
Like this:
# define values
gps_ts = ('{}-{}-{} {:02}:{:02}:{:02}'.format(
gps.timestamp_utc.tm_year,
gps.timestamp_utc.tm_mon,
gps.timestamp_utc.tm_mday,
gps.timestamp_utc.tm_hour,
gps.timestamp_utc.tm_min,
gps.timestamp_utc.tm_sec,)
)
And that seemed to work. For anyone that is doing something similar I will include the complete code. I also understand that like most languages, there is always more than one way to get the job done, some better than others, some not. I am still learning. If anyone cares to point out how I could accomplish the same task by doing something different or more efficient, please feel free to provide me the opportunity to learn. thanks again!
#!/usr/bin/python3
import pymysql
import time
import board
from busio import I2C
import adafruit_gps
i2c = I2C(board.SCL, board.SDA)
gps = adafruit_gps.GPS_GtopI2C(i2c) # Use I2C interface
gps.send_command(b"PMTK314,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0")
gps.send_command(b"PMTK220,1000")
last_print = time.monotonic()
# Open database connection
db = pymysql.connect("localhost", "database", "password", "table")
# prepare a cursor object using cursor() method
cursor = db.cursor()
while True:
gps.update()
current = time.monotonic()
if current - last_print >= 1.0: # update rate
last_print = current
if not gps.has_fix:
# Try again if we don't have a fix yet.
print("Waiting for a satellite fix...")
continue
# define values
gps_ts = ('{}-{}-{} {:02}:{:02}:{:02}'.format(
gps.timestamp_utc.tm_year,
gps.timestamp_utc.tm_mon,
gps.timestamp_utc.tm_mday,
gps.timestamp_utc.tm_hour,
gps.timestamp_utc.tm_min,
gps.timestamp_utc.tm_sec,)
)
gps_lat = '{}'.format(gps.latitude)
gps_long = '{}'.format(gps.longitude)
gps_fix = '{}'.format(gps.fix_quality)
gps_sat = '{}'.format(gps.satellites)
gps_alt = '{}'.format(gps.altitude_m)
gps_speed = '{}'.format(gps.speed_knots)
gps_track = '{}'.format(gps.track_angle_deg)
sql = "INSERT into ek9_gps(gps_timestamp_utc, latitude, \
longitude, fix_quality, number_satellites, gps_altitude, \
gps_speed, gps_track_angle) \
values (%s,%s,%s,%s,%s,%s,%s,%s)"
arg = (gps_ts, gps_lat, gps_long, gps_fix, gps_sat,
gps_alt, gps_speed, gps_track)
try:
# Execute the SQL command
cursor.execute(sql, arg)
# Commit your changes in the database
db.commit()
except:
print('There was an error on input into the database')
# Rollback in case there is any error
db.rollback()
# disconnect from server
cursor.close()
db.close()
And this is what the mariadb shows:
+----+---------------------+---------------------+----------+-----------+-------------+-------------------+--------------+-----------+-----------------+
| id | datetime | gps_timestamp_utc | latitude | longitude | fix_quality | number_satellites | gps_altitude | gps_speed | gps_track_angle |
+----+---------------------+---------------------+----------+-----------+-------------+-------------------+--------------+-----------+-----------------+
| 11 | 2020-12-30 14:14:42 | 2020-12-30 20:14:42 | xx.xxxx | -xx.xxxx | 1 | 10 | 232 | 0 | 350 |
| 10 | 2020-12-30 14:14:41 | 2020-12-30 20:14:41 | xx.xxxx | -xx.xxxx | 1 | 10 | 232 | 0 | 350 |
| 9 | 2020-12-30 14:14:39 | 2020-12-30 20:14:39 | xx.xxxx | -xx.xxxx | 1 | 10 | 232 | 0 | 350 |
| 8 | 2020-12-30 14:14:38 | 2020-12-30 20:14:38 | xx.xxxx | -xx.xxxx | 1 | 10 | 232 | 0 | 350 |
Success!!! Thanks again!

how to resolve time difference of 8 hours mariadb and nodejs on same linux server

Would like advise on why there is time difference of 8 hours and how to resolve it? Thanks
The actual date and time of execution is 2019-08-16 10:37:41
MariaDB table pure insert
SQL statement :
INSERT INTO ib_system_log (login, ACTION, action_type) VALUES ('test', 'test', 'test');
Result :
| login | action | create_date_time | action_type |
| test | test | 2019-08-16 10:37:41 | test |
Sequelize insert via NodeJS server
SQL statement:
sequelize.query ("INSERT INTO ib_system_log (`login`, `action`, `action_type`)
VALUES (:login, :action, :action_type)",{ replacements: { login: "test", action: "test", action_type: "test" }, type: sequelize.QueryTypes.INSERT })
Result :
| login | action | create_date_time | action_type |
| test | test | 2019-08-16 02:37:41 | test |

Set "process.env.TZ" in program that runs Sequelize
process.env.TZ = 'Europe/Amsterdam';
Of course, you must adapt timezone to your location. Check here for list of timezones

Azure shared dashboard is not be updated

I have such a query:
let start=datetime("2019-06-22T01:44:00.000");
let end=datetime("2019-06-22T07:44:00.000");
let timeGrain=5m;
let dataset1= requests
| where timestamp > start and timestamp < end ;
dataset1
| summarize Gesamt=sum(itemCount) , Durchschnittsdauer=round(avg(duration /1000),2), Instanz=dcount(cloud_RoleInstance) by Funktionsname=name
| join kind= inner
(
exceptions
| where timestamp > start and timestamp < end
| summarize Fehler=count() by Funktionsname=operation_Name
) on Funktionsname
| project Funktionsname ,Gesamt , Erfolgreich=Gesamt - Fehler, Fehler, Durchschnittsdauer
If I test it in Application insight query manager, I get data. But after I pin it to the share dashboard, and changing the Time (local and UTC) the dashboard shows me no results. Do you know how can I solve this problem?

I got it
I should change starttime and endtime to:
let start=datetime("2019-06-24 13:44:00.000Z");
let end=datetime("2019-06-24 19:44:00.000Z");

Python memory related and time consumption related

I have a huge file of around 80mb from which I am generating one other file which orders the data. But as the file size is very huge , the program wrote by me is taking a lot of time (around 1Hr).So how to reduce the time duration?
Below is the content of the file.
Logging at 1/20/2019 12:00:00 AM
Test Manager | 1706
TestStandEngineWrapper | 1403
Logging at 1/20/2019 12:00:01 AM
Test Manager | 1706
TestStandEngineWrapper | 1403
Like this there are thousands of entries which I am trying to order in the below format.
I am arranging them in the below format.
Test Manager | 1706 | Logging at 1/20/2019 12:00:00 AM
Test Manager | 1706 | Logging at 1/20/2019 12:00:01 AM
TestStandEngineWrapper | 1403 | Logging at 1/20/2019 12:00:00 AM
TestStandEngineWrapper | 1403 | Logging at 1/20/2019 12:00:01 AM
import re
file=open("C:\\Users\\puru\\Desktop\\xyz.txt","rt")
file1=open("C:\\Users\\puru\\Desktop\\xyz1.txt","wt")
file1.write("")
arr1=file.readlines()
str1=""
str2=""
arr2=[]
arr3=[]
arr4=[]
#for j in iter(file.readline, ''):
for i,j in enumerate (arr1):
if "Logging" in j:
str1=j
elif "Logging" not in j:
arr3.append(j.split("|")[0])
str2=j.rstrip()+" | "+str1
arr2.append(str2)
str2=""
for i in arr3:
if i not in arr4:
arr4.append(i)
for j in arr4:
for k in arr2:
if re.match(j,k):
file1=open("C:\\Users\\puru\\Desktop\\xyz1.txt","at")
file1.write(k)
file1.close()
file.close()
Though I am getting the desired output ,as it takes a lot of time ,it is not that useful. Could you please suggest something to reduce the time?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Joblib threading with sqlalchemy not running parallel - multithreading

Related

Application Insights Query to display Total Request vs Total Passed vs Total Failed

want to create a timestamp using GPS data with python3

how to resolve time difference of 8 hours mariadb and nodejs on same linux server

Azure shared dashboard is not be updated

Python memory related and time consumption related

Categories

Resources