Waitforsingleobject is decreasing my program perfomance

Waitforsingleobject is decreasing my program perfomance - python-3.x

I have tested the same code in python in two diferent computers. In the first one the code is 9s longer and in the second one(a more powerfull machine with 16MRAM x 8MRAM of first one) is 185s longer. Analising in cProfile, the most critical process in both case is the waitforsingleobject. Analisyng a specific function, i can see that the critical part is the OCR with tesserecat. why so diferrent perfomance in this two machines?
The main lines from cProfile of this specific function is:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.002 0.002 115.398 115.398 bpl-Redonda4.py:261(pega_stack_nome_jogadores)
18 0.000 0.000 0.001 0.000 pytesseract.py:106(prepare)
18 0.000 0.000 0.118 0.007 pytesseract.py:116(save_image)
18 0.000 0.000 0.000 0.000 pytesseract.py:140(subprocess_args)
18 0.000 0.000 115.186 6.399 pytesseract.py:162(run_tesseract)
18 0.001 0.000 115.373 6.410 pytesseract.py:199(run_and_get_output)
12 0.000 0.000 76.954 6.413 pytesseract.py:295(image_to_string)
12 0.000 0.000 76.954 6.413 pytesseract.py:308()
6 0.000 0.000 38.419 6.403 pytesseract.py:328(image_to_boxes)
6 0.000 0.000 38.419 6.403 pytesseract.py:345()
18 0.000 0.000 0.060 0.003 pytesseract.py:97(cleanup)
18 0.000 0.000 115.096 6.394 subprocess.py:979(wait)
18 115.096 6.394 115.096 6.394 {built-in method_winapi.WaitForSingleObject}

Related

Reduce number of calls for the {method 'acquire' of '_thread.lock' objects} python

Hi There i'm struggling with my I/O bound app to make it fast enough for potential users
im fetching an X number of urls say 10 for example, using MULTI THREADING with 1 thread for each URL
but that takes too long i've ran Cprofile on my code and i see that the bottleneck is in
"{method 'acquire' of '_thread.lock' objects} "
in the Cprofile result i noticed that the method 'acquire' is called 9 Times per Thread
Can anybody please shed some light on how i can reduce the number of calls per Thread
here is a sample code:
url_to_get = ["https://api.myip.com","https://api.myip.com","https://api.myip.com","https://api.myip.com",
"https://api.myip.com","https://api.myip.com","https://api.myip.com","https://api.myip.com",
"https://api.myip.com","https://api.myip.com"]
def fetch(url):
with requests.get(url,proxies=proxy) as response:
print(response.text)
def main():
with ThreadPoolExecutor(max_workers=10) as executor:
executor.map(fetch, url_to_get)
if __name__ == '__main__':
import cProfile, pstats
profiler = cProfile.Profile()
profiler.enable()
main()
profiler.disable()
stats = pstats.Stats(profiler).sort_stats('tottime')
stats.print_stats(10)
Cprofile Results:
ncalls tottime percall cumtime percall filename:lineno(function)
90 3.581 0.040 3.581 0.040 {method 'acquire' of '_thread.lock' objects}
10 0.001 0.000 0.001 0.000 C:\Users\MINOUSAT\AppData\Local\Programs\Python\Python38-32\lib\threading.py:1177(_make_invoke_excepthook)
10 0.001 0.000 0.001 0.000 {built-in method _thread.start_new_thread}
10 0.000 0.000 0.028 0.003 C:\Users\MINOUSAT\AppData\Local\Programs\Python\Python38-32\lib\concurrent\futures\thread.py:193(_adjust_thread_count)
20 0.000 0.000 0.025 0.001 C:\Users\MINOUSAT\AppData\Local\Programs\Python\Python38-32\lib\threading.py:270(wait)
21 0.000 0.000 0.000 0.000 C:\Users\MINOUSAT\AppData\Local\Programs\Python\Python38-32\lib\threading.py:222(__init__)
10 0.000 0.000 0.028 0.003 C:\Users\MINOUSAT\AppData\Local\Programs\Python\Python38-32\lib\concurrent\futures\thread.py:158(submit)
32 0.000 0.000 0.000 0.000 {built-in method _thread.allocate_lock}
10 0.000 0.000 0.001 0.000 C:\Users\MINOUSAT\AppData\Local\Programs\Python\Python38-32\lib\threading.py:761(__init__)
10 0.000 0.000 0.025 0.002 C:\Users\MINOUSAT\AppData\Local\Programs\Python\Python38-32\lib\threading.py:540(wait)
Thank you so much

cProfile not showing any (real) results--all 000's--in Python

In running cProfile on anything (for example a mergeSort) I'm getting all 000's in the runtimes, and key lines/vars/methods not listed/tested in the process. Only seems to test under methods, internals. Please advise.
below are my results for a mergeSort, I've tried running the
python -m cProfile [mergeSort(lst)] w/ and w/out brackets--saw both in documentation.
Only version I can get to work is the:
import cProfile
cProfile.run(mergeSort(lst))
or the enable() disable() method shown.
formatting doesn't turn out well, so attached image.
cProfile Results
results:
'''
[17, 20, 26, 31, 44, 54, 55, 77, 93]
127 function calls (111 primitive calls) in 0.000 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
17/1 0.000 0.000 0.000 0.000 :1(mergeSort)
1 0.000 0.000 0.000 0.000 :36()
1 0.000 0.000 0.000 0.000 :37()
2 0.000 0.000 0.000 0.000 codeop.py:132(call)
2 0.000 0.000 0.000 0.000 hooks.py:142(call)
2 0.000 0.000 0.000 0.000 hooks.py:207(pre_run_code_hook)
2 0.000 0.000 0.000 0.000 interactiveshell.py:1104(user_global_ns)
2 0.000 0.000 0.000 0.000 interactiveshell.py:2933(run_code)
2 0.000 0.000 0.000 0.000 ipstruct.py:125(getattr)
2 0.000 0.000 0.000 0.000 {built-in method builtins.compile}
2 0.000 0.000 0.000 0.000 {built-in method builtins.exec}
91 0.000 0.000 0.000 0.000 {built-in method builtins.len}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
'''

Performance difference between Linux and Windows when using Python Process

I’m trying to speed up some code that should run fast in both Linux and Windows. However, the same code run in a Fedora 25 takes 131 seconds, while in a Windows 7 only 90 seconds (both computers with 8Gb of RAM and i7 and i5 processor, respectively). I’m using Python 3.5 in Fedora and 3.6 in Windows.
The code is the following:
nprocs = cpu_count()
chunksize = ceil(nrFrames / nprocs)
queue = Queue()
jobs = []
for i in range(nprocs):
start = chunksize * i
if i == nprocs - 1:
end = nrFrames
else:
end = chunksize * (i + 1)
trjCoordsProcess = DAH_Coords[start:end]
p = Process(target=is_hbond, args=(queue, trjCoordsProcess, distCutOff,
angleCutOff, AList, DList, HList))
jobs.append(p)
HbondFreqMatrix = queue.get()
for k in range(nprocs-1):
HbondFreqMatrix = np.add(HbondFreqMatrix, queue.get())
for proc in jobs:
proc.join()
def is_hbond(queue, processCoords, distCutOff, angleCutOff,
possibleAPosList, donorsList, HCovBoundPosList):
for frame in range(len(processCoords)):
# do stuff
queue.put(HbondProcessFreqMatrix)
The start method of each process is actually considerably faster in Linux than in Windows. However, each iteration inside the is_hbond function takes 2.5 times longer in Linux (0.5 vs 0.2s).
The profiler gives the following information:
Windows
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.167 0.167 84.139 84.139 calculateHbonds
4 0.000 0.000 52.039 13.010 \Python36\lib\multiprocessing\ queues.py:91(get)
4 0.000 0.000 51.928 12.982 \Python36\lib\multiprocessing\ connection.py:208(recv_bytes)
4 0.018 0.004 51.928 12.982 \Python36\lib\multiprocessing\ connection.py:294(_recv_bytes)
4 51.713 12.928 51.713 12.928 {built-in method _winapi.WaitForMultipleObjects}
4 0.000 0.000 30.811 7.703 \Python36\lib\multiprocessing\ process.py:95(start)
4 0.000 0.000 30.811 7.703 \Python36\lib\multiprocessing\ context.py:221(_Popen)
4 0.000 0.000 30.811 7.703 \Python36\lib\multiprocessing\ context.py:319(_Popen)
4 0.000 0.000 30.809 7.702 popen_spawn_win32.py:32(__init__)
8 1.958 0.245 30.804 3.851 \Python36\lib\multiprocessing\ reduction.py:58(dump)
8 28.846 3.606 28.846 3.606 {method 'dump' of '_pickle.Pickler' objects}
Linux
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.203 0.203 123.169 123.169 calculateHbonds
4 0.000 0.000 121.450 30.362 /python3.5/multiprocessing/ queues.py:91(get)
4 0.000 0.000 121.300 30.325 /python3.5/multiprocessing/ connection.py:208(recv_bytes)
4 0.019 0.005 121.300 30.325 /python3.5/multiprocessing/ connection.py:406(_recv_bytes)
8 0.000 0.000 121.281 15.160 /python3.5/multiprocessing/ connection.py:374(_recv)
8 121.088 15.136 121.088 15.136 {built-in method posix.read}
1 0.000 0.000 0.082 0.082 /python3.5/multiprocessing/ context.py:98(Queue)
17/4 0.000 0.000 0.082 0.021 <frozen importlib._bootstrap>: 939(_find_and_load_unlocked)
16/4 0.000 0.000 0.082 0.020 <frozen importlib._bootstrap>: 659(_load_unlocked)
4 0.000 0.000 0.052 0.013 /python3.5/multiprocessing/ process.py:95(start)
4 0.000 0.000 0.052 0.013 /python3.5/multiprocessing/ context.py:210(_Popen)
4 0.000 0.000 0.052 0.013 /python3.5/multiprocessing/ context.py:264(_Popen)
4 0.000 0.000 0.051 0.013 /python3.5/multiprocessing/ popen_fork.py:16(__init__)
4 0.000 0.000 0.051 0.013 /python3.5/multiprocessing/ popen_fork.py:64(_launch)
4 0.050 0.013 0.050 0.013 {built-in method posix.fork}
Is there a reason why this might be the case? I know the multiprocessing module works differently in Linux and Windows due to the lack of os.fork in Windows, but I thought Linux should be faster.
Any ideas on how to make it faster in Linux?
Thank you!

verctorizing loop of single array numpy

Hello I have an (numpy) optimizing problem.
Below i have writen an piece of code that's quite common for my type of calculations.
The caclulation take always some time that i think should be shorter.
I think the problem is the loop. I have looked at the linalg part of numpy but i can't find an solution there. I also searched for a method vectorize the data but since i haven't much experience with that... i can't find any solution...
I hope somebody can help me...
import numpy as np
from scipy import signal
from scipy.fftpack import fft
fs = 44100 # frequency sample
T = 5 # time max
t = np.arange(0, T*fs)/fs # time array
x = np.sin(2 * np.pi * 100 * t) + 0.7 * np.sin(2 * np.pi * 880 * t) + 0.2 * np.sin(2 * np.pi * 2400 * t)
# Define Window length and window:
wl = 4 # window lenght
overlap = 0.5
W = signal.get_window('hanning', wl) # window
Wx = np.zeros(len(x))
ul = wl
# loop added for window
if (len(x) / wl) % wl == 0:
while ul <= len(Wx):
Wx[ul-wl:ul] += x[ul-wl:ul] * W
ul += wl * overlap
else:
dsample = (len(x)/wl) % wl # delta in samples between mod (x/windw length)
x = np.append(x, np.zeros(wl - dsample))
while ul <= len(Wx):
Wx[ul-wl:ul] += x[ul-wl:ul] * W
ul += wl * overlap
NFFT = np.int(2 ** np.ceil(np.log2(len(x))))
NFFW = np.int(2 ** np.ceil(np.log2(len(Wx))))
# Frequency spectrums
X = fft(x, NFFT)
WX = fft(Wx, NFFW)
Profiler:
%run -p example.py
110367 function calls (110366 primitive calls) in 19.998 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 19.561 19.561 19.994 19.994 example.py:6(<module>)
110258 0.233 0.000 0.233 0.000 {built-in method len}
2 0.181 0.091 0.189 0.095 basic.py:169(fft)
2 0.008 0.004 0.008 0.004 basic.py:131(_fix_shape)
2 0.008 0.004 0.008 0.004 {built-in method concatenate}
1 0.003 0.003 0.003 0.003 {built-in method compile}
2 0.002 0.001 0.002 0.001 {built-in method arange}
2 0.001 0.000 0.001 0.000 {built-in method open}
4 0.000 0.000 0.000 0.000 {built-in method zeros}
1 0.000 0.000 19.998 19.998 interactiveshell.py:2496(safe_execfile)
2/1 0.000 0.000 19.998 19.998 {built-in method exec}
1 0.000 0.000 0.000 0.000 windows.py:615(hann)
1 0.000 0.000 19.997 19.997 py3compat.py:108(execfile)
1 0.000 0.000 0.000 0.000 {method 'read' of '_io.BufferedReader' objects}
2 0.000 0.000 0.008 0.004 function_base.py:3503(append)
1 0.000 0.000 0.000 0.000 posixpath.py:318(normpath)
1 0.000 0.000 0.000 0.000 windows.py:1380(get_window)
1 0.000 0.000 0.000 0.000 posixpath.py:145(dirname)
4 0.000 0.000 0.000 0.000 {built-in method array}
2 0.000 0.000 0.000 0.000 {built-in method round}
1 0.000 0.000 0.000 0.000 {built-in method getcwd}
2 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:2264(_handle_fromlist)
2 0.000 0.000 0.000 0.000 basic.py:116(_asfarray)
4 0.000 0.000 0.000 0.000 basic.py:24(istype)
2 0.000 0.000 0.000 0.000 fromnumeric.py:1281(ravel)
8 0.000 0.000 0.000 0.000 {built-in method isinstance}
1 0.000 0.000 0.000 0.000 posixpath.py:70(join)
2 0.000 0.000 0.000 0.000 numeric.py:462(asanyarray)
1 0.000 0.000 0.000 0.000 posixpath.py:355(abspath)
8 0.000 0.000 0.000 0.000 {built-in method hasattr}
1 0.000 0.000 19.998 19.998 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 syspathcontext.py:64(__exit__)
1 0.000 0.000 0.000 0.000 posixpath.py:221(expanduser)
1 0.000 0.000 0.000 0.000 _bootlocale.py:23(getpreferredencoding)
1 0.000 0.000 0.000 0.000 syspathcontext.py:57(__enter__)
1 0.000 0.000 0.000 0.000 syspathcontext.py:54(__init__)
4 0.000 0.000 0.000 0.000 {built-in method issubclass}
3 0.000 0.000 0.000 0.000 posixpath.py:38(_get_sep)
2 0.000 0.000 0.000 0.000 {method 'ravel' of 'numpy.ndarray' objects}
2 0.000 0.000 0.000 0.000 numeric.py:392(asarray)
7 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {built-in method nl_langinfo}
5 0.000 0.000 0.000 0.000 {method 'startswith' of 'str' objects}
1 0.000 0.000 0.000 0.000 codecs.py:306(__init__)
1 0.000 0.000 0.000 0.000 posixpath.py:60(isabs)
1 0.000 0.000 0.000 0.000 {method 'split' of 'str' objects}
1 0.000 0.000 0.000 0.000 codecs.py:257(__init__)
2 0.000 0.000 0.000 0.000 {method 'setdefault' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {method 'rfind' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'remove' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'rstrip' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'endswith' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'insert' of 'list' objects}
1 0.000 0.000 0.000 0.000 {built-in method getdefaultencoding}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.000 0.000 0.000 0.000 py3compat.py:13(no_code)

Precalculating static values shortens my loop from ~4s to 0.7s execution time:
nEntries = len(Wx)
step = int(wl * overlap)
while ul <= nEntries:
Wx[ul-wl:ul] += x[ul-wl:ul] * W
ul += step

ANTLR4 very slow, the SLL trick didn't change anything

I have a grammar that is an extension of Python grammar. And small programs parse about 2 seconds on a Macbook Pro. I have taken the SLL trick and applied it:
# Set up the lexer
inputStream = InputStream(s)
lexer = CustomLexer(inputStream)
stream = CommonTokenStream(lexer)
# Set up the error handling stuff
error_handler = CustomErrorStrategy()
error_listener = CustomErrorListener()
buffered_errors = BufferedErrorListener()
error_listener.addDelegatee(buffered_errors)
# Set up the fast parser
parser = PythonQLParser(stream)
parser._interp.predictionMode = PredictionMode.SLL
parser.removeErrorListeners()
parser.errHandler = BailErrorStrategy()
try:
tree = parser.file_input()
return (tree,parser)
But it didn't do the trick, the time didn't change significantly. Any hints on what to do?
I'm using Python3 with antlr4-python3-runtime-4.5.3
The grammar file is here: Grammar File
And the project github page is here: Github
I have also ran a profiler, here are significant entries from the parser:
ncalls tottime percall cumtime percall filename:lineno(function)
21 0.000 0.000 0.094 0.004 PythonQLParser.py:7483(argument)
8 0.000 0.000 0.195 0.024 PythonQLParser.py:7379(arglist)
9 0.000 0.000 0.196 0.022 PythonQLParser.py:6836(trailer)
5/3 0.000 0.000 0.132 0.044 PythonQLParser.py:6765(testlist_comp)
1 0.000 0.000 0.012 0.012 PythonQLParser.py:6154(window_end_cond)
1 0.000 0.000 0.057 0.057 PythonQLParser.py:6058(sliding_window)
1 0.000 0.000 0.057 0.057 PythonQLParser.py:5941(window_clause)
1 0.000 0.000 0.004 0.004 PythonQLParser.py:5807(for_clause_entry)
1 0.000 0.000 0.020 0.020 PythonQLParser.py:5752(for_clause)
2/1 0.000 0.000 0.068 0.068 PythonQLParser.py:5553(query_expression)
48/10 0.000 0.000 0.133 0.013 PythonQLParser.py:5370(atom)
48/7 0.000 0.000 0.315 0.045 PythonQLParser.py:5283(power)
48/7 0.000 0.000 0.315 0.045 PythonQLParser.py:5212(factor)
48/7 0.000 0.000 0.331 0.047 PythonQLParser.py:5132(term)
47/7 0.000 0.000 0.346 0.049 PythonQLParser.py:5071(arith_expr)
47/7 0.000 0.000 0.361 0.052 PythonQLParser.py:5010(shift_expr)
47/7 0.000 0.000 0.376 0.054 PythonQLParser.py:4962(and_expr)
47/7 0.000 0.000 0.390 0.056 PythonQLParser.py:4914(xor_expr)
47/7 0.000 0.000 0.405 0.058 PythonQLParser.py:4866(expr)
44/7 0.000 0.000 0.405 0.058 PythonQLParser.py:4823(star_expr)
43/7 0.000 0.000 0.422 0.060 PythonQLParser.py:4615(not_test)
43/7 0.000 0.000 0.438 0.063 PythonQLParser.py:4563(and_test)
43/7 0.000 0.000 0.453 0.065 PythonQLParser.py:4509(or_test)
43/7 0.000 0.000 0.467 0.067 PythonQLParser.py:4293(old_test)
43/7 0.000 0.000 0.467 0.067 PythonQLParser.py:4179(try_catch_expr)
43/7 0.000 0.000 0.482 0.069 PythonQLParser.py:3978(test)
1 0.000 0.000 0.048 0.048 PythonQLParser.py:2793(import_from)
1 0.000 0.000 0.048 0.048 PythonQLParser.py:2702(import_stmt)
7 0.000 0.000 1.728 0.247 PythonQLParser.py:2251(testlist_star_expr)
4 0.000 0.000 1.770 0.443 PythonQLParser.py:2161(expr_stmt)
5 0.000 0.000 1.822 0.364 PythonQLParser.py:2063(small_stmt)
5 0.000 0.000 1.855 0.371 PythonQLParser.py:1980(simple_stmt)
5 0.000 0.000 1.859 0.372 PythonQLParser.py:1930(stmt)
1 0.000 0.000 1.898 1.898 PythonQLParser.py:1085(file_input)
176 0.002 0.000 0.993 0.006 Lexer.py:127(nextToken)
420 0.000 0.000 0.535 0.001 ParserATNSimulator.py:1120(closure)
705 0.003 0.000 1.642 0.002 ParserATNSimulator.py:315(adaptivePredict)
The PythonQL program that I was parsing is this one:
# This example illustrates the window query in PythonQL
from collections import namedtuple
trade = namedtuple('Trade', ['day','ammount', 'stock_id'])
trades = [ trade(1, 15.34, 'APPL'),
trade(2, 13.45, 'APPL'),
trade(3, 8.34, 'APPL'),
trade(4, 9.87, 'APPL'),
trade(5, 10.99, 'APPL'),
trade(6, 76.16, 'APPL') ]
# Maximum 3-day sum
res = (select win
for sliding window win in ( select t.ammount for t in trades )
start at s when True
only end at e when (e-s == 2))
print (res)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Waitforsingleobject is decreasing my program perfomance - python-3.x

Related

Reduce number of calls for the {method 'acquire' of '_thread.lock' objects} python

cProfile not showing any (real) results--all 000's--in Python

Performance difference between Linux and Windows when using Python Process

verctorizing loop of single array numpy

ANTLR4 very slow, the SLL trick didn't change anything

Categories

Resources