I have tested the same code in python in two diferent computers. In the first one the code is 9s longer and in the second one(a more powerfull machine with 16MRAM x 8MRAM of first one) is 185s longer. Analising in cProfile, the most critical process in both case is the waitforsingleobject. Analisyng a specific function, i can see that the critical part is the OCR with tesserecat. why so diferrent perfomance in this two machines?
The main lines from cProfile of this specific function is:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.002 0.002 115.398 115.398 bpl-Redonda4.py:261(pega_stack_nome_jogadores)
18 0.000 0.000 0.001 0.000 pytesseract.py:106(prepare)
18 0.000 0.000 0.118 0.007 pytesseract.py:116(save_image)
18 0.000 0.000 0.000 0.000 pytesseract.py:140(subprocess_args)
18 0.000 0.000 115.186 6.399 pytesseract.py:162(run_tesseract)
18 0.001 0.000 115.373 6.410 pytesseract.py:199(run_and_get_output)
12 0.000 0.000 76.954 6.413 pytesseract.py:295(image_to_string)
12 0.000 0.000 76.954 6.413 pytesseract.py:308()
6 0.000 0.000 38.419 6.403 pytesseract.py:328(image_to_boxes)
6 0.000 0.000 38.419 6.403 pytesseract.py:345()
18 0.000 0.000 0.060 0.003 pytesseract.py:97(cleanup)
18 0.000 0.000 115.096 6.394 subprocess.py:979(wait)
18 115.096 6.394 115.096 6.394 {built-in method_winapi.WaitForSingleObject}
Related
Hi There i'm struggling with my I/O bound app to make it fast enough for potential users
im fetching an X number of urls say 10 for example, using MULTI THREADING with 1 thread for each URL
but that takes too long i've ran Cprofile on my code and i see that the bottleneck is in
"{method 'acquire' of '_thread.lock' objects} "
in the Cprofile result i noticed that the method 'acquire' is called 9 Times per Thread
Can anybody please shed some light on how i can reduce the number of calls per Thread
here is a sample code:
url_to_get = ["https://api.myip.com","https://api.myip.com","https://api.myip.com","https://api.myip.com",
"https://api.myip.com","https://api.myip.com","https://api.myip.com","https://api.myip.com",
"https://api.myip.com","https://api.myip.com"]
def fetch(url):
with requests.get(url,proxies=proxy) as response:
print(response.text)
def main():
with ThreadPoolExecutor(max_workers=10) as executor:
executor.map(fetch, url_to_get)
if __name__ == '__main__':
import cProfile, pstats
profiler = cProfile.Profile()
profiler.enable()
main()
profiler.disable()
stats = pstats.Stats(profiler).sort_stats('tottime')
stats.print_stats(10)
Cprofile Results:
ncalls tottime percall cumtime percall filename:lineno(function)
90 3.581 0.040 3.581 0.040 {method 'acquire' of '_thread.lock' objects}
10 0.001 0.000 0.001 0.000 C:\Users\MINOUSAT\AppData\Local\Programs\Python\Python38-32\lib\threading.py:1177(_make_invoke_excepthook)
10 0.001 0.000 0.001 0.000 {built-in method _thread.start_new_thread}
10 0.000 0.000 0.028 0.003 C:\Users\MINOUSAT\AppData\Local\Programs\Python\Python38-32\lib\concurrent\futures\thread.py:193(_adjust_thread_count)
20 0.000 0.000 0.025 0.001 C:\Users\MINOUSAT\AppData\Local\Programs\Python\Python38-32\lib\threading.py:270(wait)
21 0.000 0.000 0.000 0.000 C:\Users\MINOUSAT\AppData\Local\Programs\Python\Python38-32\lib\threading.py:222(__init__)
10 0.000 0.000 0.028 0.003 C:\Users\MINOUSAT\AppData\Local\Programs\Python\Python38-32\lib\concurrent\futures\thread.py:158(submit)
32 0.000 0.000 0.000 0.000 {built-in method _thread.allocate_lock}
10 0.000 0.000 0.001 0.000 C:\Users\MINOUSAT\AppData\Local\Programs\Python\Python38-32\lib\threading.py:761(__init__)
10 0.000 0.000 0.025 0.002 C:\Users\MINOUSAT\AppData\Local\Programs\Python\Python38-32\lib\threading.py:540(wait)
Thank you so much
In running cProfile on anything (for example a mergeSort) I'm getting all 000's in the runtimes, and key lines/vars/methods not listed/tested in the process. Only seems to test under methods, internals. Please advise.
below are my results for a mergeSort, I've tried running the
python -m cProfile [mergeSort(lst)] w/ and w/out brackets--saw both in documentation.
Only version I can get to work is the:
import cProfile
cProfile.run(mergeSort(lst))
or the enable() disable() method shown.
formatting doesn't turn out well, so attached image.
cProfile Results
results:
'''
[17, 20, 26, 31, 44, 54, 55, 77, 93]
127 function calls (111 primitive calls) in 0.000 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
17/1 0.000 0.000 0.000 0.000 :1(mergeSort)
1 0.000 0.000 0.000 0.000 :36()
1 0.000 0.000 0.000 0.000 :37()
2 0.000 0.000 0.000 0.000 codeop.py:132(call)
2 0.000 0.000 0.000 0.000 hooks.py:142(call)
2 0.000 0.000 0.000 0.000 hooks.py:207(pre_run_code_hook)
2 0.000 0.000 0.000 0.000 interactiveshell.py:1104(user_global_ns)
2 0.000 0.000 0.000 0.000 interactiveshell.py:2933(run_code)
2 0.000 0.000 0.000 0.000 ipstruct.py:125(getattr)
2 0.000 0.000 0.000 0.000 {built-in method builtins.compile}
2 0.000 0.000 0.000 0.000 {built-in method builtins.exec}
91 0.000 0.000 0.000 0.000 {built-in method builtins.len}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
'''
I’m trying to speed up some code that should run fast in both Linux and Windows. However, the same code run in a Fedora 25 takes 131 seconds, while in a Windows 7 only 90 seconds (both computers with 8Gb of RAM and i7 and i5 processor, respectively). I’m using Python 3.5 in Fedora and 3.6 in Windows.
The code is the following:
nprocs = cpu_count()
chunksize = ceil(nrFrames / nprocs)
queue = Queue()
jobs = []
for i in range(nprocs):
start = chunksize * i
if i == nprocs - 1:
end = nrFrames
else:
end = chunksize * (i + 1)
trjCoordsProcess = DAH_Coords[start:end]
p = Process(target=is_hbond, args=(queue, trjCoordsProcess, distCutOff,
angleCutOff, AList, DList, HList))
jobs.append(p)
HbondFreqMatrix = queue.get()
for k in range(nprocs-1):
HbondFreqMatrix = np.add(HbondFreqMatrix, queue.get())
for proc in jobs:
proc.join()
def is_hbond(queue, processCoords, distCutOff, angleCutOff,
possibleAPosList, donorsList, HCovBoundPosList):
for frame in range(len(processCoords)):
# do stuff
queue.put(HbondProcessFreqMatrix)
The start method of each process is actually considerably faster in Linux than in Windows. However, each iteration inside the is_hbond function takes 2.5 times longer in Linux (0.5 vs 0.2s).
The profiler gives the following information:
Windows
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.167 0.167 84.139 84.139 calculateHbonds
4 0.000 0.000 52.039 13.010 \Python36\lib\multiprocessing\ queues.py:91(get)
4 0.000 0.000 51.928 12.982 \Python36\lib\multiprocessing\ connection.py:208(recv_bytes)
4 0.018 0.004 51.928 12.982 \Python36\lib\multiprocessing\ connection.py:294(_recv_bytes)
4 51.713 12.928 51.713 12.928 {built-in method _winapi.WaitForMultipleObjects}
4 0.000 0.000 30.811 7.703 \Python36\lib\multiprocessing\ process.py:95(start)
4 0.000 0.000 30.811 7.703 \Python36\lib\multiprocessing\ context.py:221(_Popen)
4 0.000 0.000 30.811 7.703 \Python36\lib\multiprocessing\ context.py:319(_Popen)
4 0.000 0.000 30.809 7.702 popen_spawn_win32.py:32(__init__)
8 1.958 0.245 30.804 3.851 \Python36\lib\multiprocessing\ reduction.py:58(dump)
8 28.846 3.606 28.846 3.606 {method 'dump' of '_pickle.Pickler' objects}
Linux
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.203 0.203 123.169 123.169 calculateHbonds
4 0.000 0.000 121.450 30.362 /python3.5/multiprocessing/ queues.py:91(get)
4 0.000 0.000 121.300 30.325 /python3.5/multiprocessing/ connection.py:208(recv_bytes)
4 0.019 0.005 121.300 30.325 /python3.5/multiprocessing/ connection.py:406(_recv_bytes)
8 0.000 0.000 121.281 15.160 /python3.5/multiprocessing/ connection.py:374(_recv)
8 121.088 15.136 121.088 15.136 {built-in method posix.read}
1 0.000 0.000 0.082 0.082 /python3.5/multiprocessing/ context.py:98(Queue)
17/4 0.000 0.000 0.082 0.021 <frozen importlib._bootstrap>: 939(_find_and_load_unlocked)
16/4 0.000 0.000 0.082 0.020 <frozen importlib._bootstrap>: 659(_load_unlocked)
4 0.000 0.000 0.052 0.013 /python3.5/multiprocessing/ process.py:95(start)
4 0.000 0.000 0.052 0.013 /python3.5/multiprocessing/ context.py:210(_Popen)
4 0.000 0.000 0.052 0.013 /python3.5/multiprocessing/ context.py:264(_Popen)
4 0.000 0.000 0.051 0.013 /python3.5/multiprocessing/ popen_fork.py:16(__init__)
4 0.000 0.000 0.051 0.013 /python3.5/multiprocessing/ popen_fork.py:64(_launch)
4 0.050 0.013 0.050 0.013 {built-in method posix.fork}
Is there a reason why this might be the case? I know the multiprocessing module works differently in Linux and Windows due to the lack of os.fork in Windows, but I thought Linux should be faster.
Any ideas on how to make it faster in Linux?
Thank you!
Hello I have an (numpy) optimizing problem.
Below i have writen an piece of code that's quite common for my type of calculations.
The caclulation take always some time that i think should be shorter.
I think the problem is the loop. I have looked at the linalg part of numpy but i can't find an solution there. I also searched for a method vectorize the data but since i haven't much experience with that... i can't find any solution...
I hope somebody can help me...
import numpy as np
from scipy import signal
from scipy.fftpack import fft
fs = 44100 # frequency sample
T = 5 # time max
t = np.arange(0, T*fs)/fs # time array
x = np.sin(2 * np.pi * 100 * t) + 0.7 * np.sin(2 * np.pi * 880 * t) + 0.2 * np.sin(2 * np.pi * 2400 * t)
# Define Window length and window:
wl = 4 # window lenght
overlap = 0.5
W = signal.get_window('hanning', wl) # window
Wx = np.zeros(len(x))
ul = wl
# loop added for window
if (len(x) / wl) % wl == 0:
while ul <= len(Wx):
Wx[ul-wl:ul] += x[ul-wl:ul] * W
ul += wl * overlap
else:
dsample = (len(x)/wl) % wl # delta in samples between mod (x/windw length)
x = np.append(x, np.zeros(wl - dsample))
while ul <= len(Wx):
Wx[ul-wl:ul] += x[ul-wl:ul] * W
ul += wl * overlap
NFFT = np.int(2 ** np.ceil(np.log2(len(x))))
NFFW = np.int(2 ** np.ceil(np.log2(len(Wx))))
# Frequency spectrums
X = fft(x, NFFT)
WX = fft(Wx, NFFW)
Profiler:
%run -p example.py
110367 function calls (110366 primitive calls) in 19.998 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 19.561 19.561 19.994 19.994 example.py:6(<module>)
110258 0.233 0.000 0.233 0.000 {built-in method len}
2 0.181 0.091 0.189 0.095 basic.py:169(fft)
2 0.008 0.004 0.008 0.004 basic.py:131(_fix_shape)
2 0.008 0.004 0.008 0.004 {built-in method concatenate}
1 0.003 0.003 0.003 0.003 {built-in method compile}
2 0.002 0.001 0.002 0.001 {built-in method arange}
2 0.001 0.000 0.001 0.000 {built-in method open}
4 0.000 0.000 0.000 0.000 {built-in method zeros}
1 0.000 0.000 19.998 19.998 interactiveshell.py:2496(safe_execfile)
2/1 0.000 0.000 19.998 19.998 {built-in method exec}
1 0.000 0.000 0.000 0.000 windows.py:615(hann)
1 0.000 0.000 19.997 19.997 py3compat.py:108(execfile)
1 0.000 0.000 0.000 0.000 {method 'read' of '_io.BufferedReader' objects}
2 0.000 0.000 0.008 0.004 function_base.py:3503(append)
1 0.000 0.000 0.000 0.000 posixpath.py:318(normpath)
1 0.000 0.000 0.000 0.000 windows.py:1380(get_window)
1 0.000 0.000 0.000 0.000 posixpath.py:145(dirname)
4 0.000 0.000 0.000 0.000 {built-in method array}
2 0.000 0.000 0.000 0.000 {built-in method round}
1 0.000 0.000 0.000 0.000 {built-in method getcwd}
2 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:2264(_handle_fromlist)
2 0.000 0.000 0.000 0.000 basic.py:116(_asfarray)
4 0.000 0.000 0.000 0.000 basic.py:24(istype)
2 0.000 0.000 0.000 0.000 fromnumeric.py:1281(ravel)
8 0.000 0.000 0.000 0.000 {built-in method isinstance}
1 0.000 0.000 0.000 0.000 posixpath.py:70(join)
2 0.000 0.000 0.000 0.000 numeric.py:462(asanyarray)
1 0.000 0.000 0.000 0.000 posixpath.py:355(abspath)
8 0.000 0.000 0.000 0.000 {built-in method hasattr}
1 0.000 0.000 19.998 19.998 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 syspathcontext.py:64(__exit__)
1 0.000 0.000 0.000 0.000 posixpath.py:221(expanduser)
1 0.000 0.000 0.000 0.000 _bootlocale.py:23(getpreferredencoding)
1 0.000 0.000 0.000 0.000 syspathcontext.py:57(__enter__)
1 0.000 0.000 0.000 0.000 syspathcontext.py:54(__init__)
4 0.000 0.000 0.000 0.000 {built-in method issubclass}
3 0.000 0.000 0.000 0.000 posixpath.py:38(_get_sep)
2 0.000 0.000 0.000 0.000 {method 'ravel' of 'numpy.ndarray' objects}
2 0.000 0.000 0.000 0.000 numeric.py:392(asarray)
7 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {built-in method nl_langinfo}
5 0.000 0.000 0.000 0.000 {method 'startswith' of 'str' objects}
1 0.000 0.000 0.000 0.000 codecs.py:306(__init__)
1 0.000 0.000 0.000 0.000 posixpath.py:60(isabs)
1 0.000 0.000 0.000 0.000 {method 'split' of 'str' objects}
1 0.000 0.000 0.000 0.000 codecs.py:257(__init__)
2 0.000 0.000 0.000 0.000 {method 'setdefault' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {method 'rfind' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'remove' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'rstrip' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'endswith' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'insert' of 'list' objects}
1 0.000 0.000 0.000 0.000 {built-in method getdefaultencoding}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.000 0.000 0.000 0.000 py3compat.py:13(no_code)
Precalculating static values shortens my loop from ~4s to 0.7s execution time:
nEntries = len(Wx)
step = int(wl * overlap)
while ul <= nEntries:
Wx[ul-wl:ul] += x[ul-wl:ul] * W
ul += step
I have a grammar that is an extension of Python grammar. And small programs parse about 2 seconds on a Macbook Pro. I have taken the SLL trick and applied it:
# Set up the lexer
inputStream = InputStream(s)
lexer = CustomLexer(inputStream)
stream = CommonTokenStream(lexer)
# Set up the error handling stuff
error_handler = CustomErrorStrategy()
error_listener = CustomErrorListener()
buffered_errors = BufferedErrorListener()
error_listener.addDelegatee(buffered_errors)
# Set up the fast parser
parser = PythonQLParser(stream)
parser._interp.predictionMode = PredictionMode.SLL
parser.removeErrorListeners()
parser.errHandler = BailErrorStrategy()
try:
tree = parser.file_input()
return (tree,parser)
But it didn't do the trick, the time didn't change significantly. Any hints on what to do?
I'm using Python3 with antlr4-python3-runtime-4.5.3
The grammar file is here: Grammar File
And the project github page is here: Github
I have also ran a profiler, here are significant entries from the parser:
ncalls tottime percall cumtime percall filename:lineno(function)
21 0.000 0.000 0.094 0.004 PythonQLParser.py:7483(argument)
8 0.000 0.000 0.195 0.024 PythonQLParser.py:7379(arglist)
9 0.000 0.000 0.196 0.022 PythonQLParser.py:6836(trailer)
5/3 0.000 0.000 0.132 0.044 PythonQLParser.py:6765(testlist_comp)
1 0.000 0.000 0.012 0.012 PythonQLParser.py:6154(window_end_cond)
1 0.000 0.000 0.057 0.057 PythonQLParser.py:6058(sliding_window)
1 0.000 0.000 0.057 0.057 PythonQLParser.py:5941(window_clause)
1 0.000 0.000 0.004 0.004 PythonQLParser.py:5807(for_clause_entry)
1 0.000 0.000 0.020 0.020 PythonQLParser.py:5752(for_clause)
2/1 0.000 0.000 0.068 0.068 PythonQLParser.py:5553(query_expression)
48/10 0.000 0.000 0.133 0.013 PythonQLParser.py:5370(atom)
48/7 0.000 0.000 0.315 0.045 PythonQLParser.py:5283(power)
48/7 0.000 0.000 0.315 0.045 PythonQLParser.py:5212(factor)
48/7 0.000 0.000 0.331 0.047 PythonQLParser.py:5132(term)
47/7 0.000 0.000 0.346 0.049 PythonQLParser.py:5071(arith_expr)
47/7 0.000 0.000 0.361 0.052 PythonQLParser.py:5010(shift_expr)
47/7 0.000 0.000 0.376 0.054 PythonQLParser.py:4962(and_expr)
47/7 0.000 0.000 0.390 0.056 PythonQLParser.py:4914(xor_expr)
47/7 0.000 0.000 0.405 0.058 PythonQLParser.py:4866(expr)
44/7 0.000 0.000 0.405 0.058 PythonQLParser.py:4823(star_expr)
43/7 0.000 0.000 0.422 0.060 PythonQLParser.py:4615(not_test)
43/7 0.000 0.000 0.438 0.063 PythonQLParser.py:4563(and_test)
43/7 0.000 0.000 0.453 0.065 PythonQLParser.py:4509(or_test)
43/7 0.000 0.000 0.467 0.067 PythonQLParser.py:4293(old_test)
43/7 0.000 0.000 0.467 0.067 PythonQLParser.py:4179(try_catch_expr)
43/7 0.000 0.000 0.482 0.069 PythonQLParser.py:3978(test)
1 0.000 0.000 0.048 0.048 PythonQLParser.py:2793(import_from)
1 0.000 0.000 0.048 0.048 PythonQLParser.py:2702(import_stmt)
7 0.000 0.000 1.728 0.247 PythonQLParser.py:2251(testlist_star_expr)
4 0.000 0.000 1.770 0.443 PythonQLParser.py:2161(expr_stmt)
5 0.000 0.000 1.822 0.364 PythonQLParser.py:2063(small_stmt)
5 0.000 0.000 1.855 0.371 PythonQLParser.py:1980(simple_stmt)
5 0.000 0.000 1.859 0.372 PythonQLParser.py:1930(stmt)
1 0.000 0.000 1.898 1.898 PythonQLParser.py:1085(file_input)
176 0.002 0.000 0.993 0.006 Lexer.py:127(nextToken)
420 0.000 0.000 0.535 0.001 ParserATNSimulator.py:1120(closure)
705 0.003 0.000 1.642 0.002 ParserATNSimulator.py:315(adaptivePredict)
The PythonQL program that I was parsing is this one:
# This example illustrates the window query in PythonQL
from collections import namedtuple
trade = namedtuple('Trade', ['day','ammount', 'stock_id'])
trades = [ trade(1, 15.34, 'APPL'),
trade(2, 13.45, 'APPL'),
trade(3, 8.34, 'APPL'),
trade(4, 9.87, 'APPL'),
trade(5, 10.99, 'APPL'),
trade(6, 76.16, 'APPL') ]
# Maximum 3-day sum
res = (select win
for sliding window win in ( select t.ammount for t in trades )
start at s when True
only end at e when (e-s == 2))
print (res)