What can be slowing down my program when i use multithreading? - python-3.x

I'm writing a program that downloads data from a website (eve-central.com). It returns xml when I send a GET request with some parameters. The problem is that I need to make about 7080 of such requests because i can't specify the typeid parameter more than once.
def get_data_eve_central(typeids, system, hours, minq=1, thread_count=1):
import xmltodict, urllib3
pool = urllib3.HTTPConnectionPool('api.eve-central.com')
for typeid in typeids:
r = pool.request('GET', '/api/quicklook', fields={'typeid': typeid, 'usesystem': system, 'sethours': hours, 'setminQ': minq})
answer = xmltodict.parse(r.data)
It was really slow when I just connected to the website and made all the requests so I decided to make it use multiple threads at a time (I read that if the process involves a lot of waiting (I/O, HTTP requests), it can be speeded up a lot with multithreading). I rewrote it using multiple threads, but it somehow isn't any faster (a bit slower in fact). Here's the code rewritten using multithreading:
def get_data_eve_central(all_typeids, system, hours, minq=1, thread_count=1):
if thread_count > len(all_typeids): raise NameError('TooManyThreads')
def requester(typeids):
pool = urllib3.HTTPConnectionPool('api.eve-central.com')
for typeid in typeids:
r = pool.request('GET', '/api/quicklook', fields={'typeid': typeid, 'usesystem': system, 'sethours': hours, 'setminQ': minq})
answer = xmltodict.parse(r.data)['evec_api']['quicklook']
answers.append(answer)
def chunkify(items, quantity):
chunk_len = len(items) // quantity
rest_count = len(items) % quantity
chunks = []
for i in range(quantity):
chunk = items[:chunk_len]
items = items[chunk_len:]
if rest_count and items:
chunk.append(items.pop(0))
rest_count -= 1
chunks.append(chunk)
return chunks
t = time.clock()
threads = []
answers = []
for typeids in chunkify(all_typeids, thread_count):
threads.append(threading.Thread(target=requester, args=[typeids]))
threads[-1].start()
threads[-1].join()
print(time.clock()-t)
return answers
What I do is I divide all typeids into as many chunks as the quantity of threads i want to use and create a thread for each chunk to process it. The question is: what can slow it down? (I apologise for my bad english)

Python has Global Interpreter Lock. It can be your problem. Actually Python cannot do it in a genuine parallel way. You may think about switching to other languages or staying with Python but use process-based parallelism to solve your task. Here is a nice presentation Inside the Python GIL

Related

Need to do CPU bound processing using 2+ processes in Python by reading from a gzipped file

I have a gzipped file spanning (compressed 10GB, uncompressed 100GB) and which has some reports separated by demarcations and I have to parse it.
The parsing and processing the data is taking long time and hence is a CPU bound problem (not an IO bound problem). So I am planning to split the work into multiple processes using multiprocessing module. The problem is I am unable to send/share data to child processes efficiently. I am using subprocess.Popen to stream in the uncompressed data in parent process.
process = subprocess.Popen('gunzip --keep --stdout big-file.gz',
shell=True,
stdout=subprocess.PIPE)
I am thinking of using a Lock() to read/parse one report in child-process-1 and then release the lock, and switch to child-process-2 to read/parse next report and then switch back to child-process-1 to read/parse next report). When I share the process.stdout as args with the child processes, I get a pickling error.
I have tried to create multiprocessing.Queue() and multiprocessing.Pipe() to send data to child processes, but this is way too slow (in fact it is way slower than doing it in single thread ie serially).
Any thoughts/examples about sending data to child processes efficiently will help.
Could you try something simple instead? Have each worker process run its own instance of gunzip, with no interprocess communication at all. Worker 1 can process the first report and just skip over the second. The opposite for worker 2. Each worker skips every other report. Then an obvious generalization to N workers.
Or not ...
I think you'll need to be more specific about what you tried, and perhaps give more info about your problem (like: how many records are there? how big are they?).
Here's a program ("genints.py") that prints a bunch of random ints, one per line, broken into groups via "xxxxx\n" separator lines:
from random import randrange, seed
seed(42)
for i in range(1000):
for j in range(randrange(1, 1000)):
print(randrange(100))
print("xxxxx")
Because it forces the seed, it generates the same stuff every time. Now a program to process those groups, both in parallel and serially, via the most obvious way I first thought of. crunch() takes time quadratic in the number of ints in a group, so it's quite CPU-bound. The output from one run, using (as shown) 3 worker processes for the parallel part:
parallel result: 10,901,000,334 0:00:35.559782
serial result: 10,901,000,334 0:01:38.719993
So the parallelized run took about one-third the time. In what relevant way(s) does that differ from your problem? Certainly, a full run of "genints.py" produces less than 2 million bytes of output, so that's a major difference - but it's impossible to guess from here whether it's a relevant difference. Perahps, e.g., your problem is only very mildly CPU-bound? It's obvious from output here that the overheads of passing chunks of stdout to worker processes are all but insignificant in this program.
In short, you probably need to give people - as I just did for you - a complete program they can run that reproduces your problem.
import multiprocessing as mp
NWORKERS = 3
DELIM = "xxxxx\n"
def runjob():
import subprocess
# 'py' is just a shell script on my box that
# invokes the desired version of Python -
# which happened to be 3.8.5 for this run.
p = subprocess.Popen("py genints.py",
shell=True,
text=True,
stdout=subprocess.PIPE)
return p.stdout
# Return list of lines up to (but not including) next DELIM,
# or EOF. If the file is already exhausted, return None.
def getrecord(f):
result = []
foundone = False
for line in f:
foundone = True
if line == DELIM:
break
result.append(line)
return result if foundone else None
def crunch(rec):
total = 0
for a in rec:
for b in rec:
total += abs(int(a) - int(b))
return total
if __name__ == "__main__":
import datetime
now = datetime.datetime.now
s = now()
total = 0
f = runjob()
with mp.Pool(NWORKERS) as pool:
for i in pool.imap_unordered(crunch,
iter((lambda: getrecord(f)), None)):
total += i
f.close()
print(f"parallel result: {total:,}", now() - s)
s = now()
# try the same thing serially
total = 0
f = runjob()
while True:
rec = getrecord(f)
if rec is None:
break
total += crunch(rec)
f.close()
print(f"serial result: {total:,}", now() - s)

How to change the number of multiprocessing pool workers on the go

I want to change the number of workers in the pool that are currently used.
My current idea is
while True:
current_connection_number = get_connection_number()
forced_break = False
with mp.Pool(current_connection_number) as p:
for data in p.imap_unordered(fun, some_infinite_generator):
yield data
if current_connection_number != get_connection_number():
forced_break = True
break
if not forced_break:
break
The problem is that it just terminates the workers and so the last items that were gotten from some_infinite_generator and weren't processed yet are lost. Is there some standard way of doing this?
Edit: I've tried printing inside some_infinite_generator and it turns out p.imap_unordered requests 1565 items with just 2 pool workers even before anything is processed, how do I limit the number of items requested from generator? If I use the code above and change number of connections after just 2 items, I will loose 1563 items
The problem is that the Pool will consume the generator internally in a separate thread. You have no way to control that logic.
What you can do, is feeding to the Pool.imap_unordered method a portion of the generator and get that consumed before scaling according to the available connections.
CHUNKSIZE = 100
while True:
current_connection_number = get_connection_number()
with mp.Pool(current_connection_number) as p:
while current_connection_number == get_connection_number():
for data in p.imap_unordered(fun, grouper(CHUNKSIZE, some_infinite_generator)):
yield data
def grouper(n, iterable):
it = iter(iterable)
while True:
chunk = tuple(itertools.islice(it, n))
if not chunk:
return
yield chunk
It's a bit less optimal as the scaling happens every chunk instead of every iteration but with a bit of fine tuning of the CHUNKSIZE value you can easily get it right.
The grouper recipe.

Python Threading Issue, Is this Right?

I am attempting to make a few thousand dns queries. I have written my script to use python-adns. I have attempted to add threading and queue's to ensure the script runs optimally and efficiently.
However, I can only achieve mediocre results. The responses are choppy/intermittent. They start and stop, and most times pause for 10 to 20 seconds.
tlock = threading.Lock()#printing to screen
def async_dns(i):
s = adns.init()
for i in names:
tlock.acquire()
q.put(s.synchronous(i, adns.rr.NS)[0])
response = q.get()
q.task_done()
if response == 0:
dot_net.append("Y")
print(i + ", is Y")
elif response == 300:
dot_net.append("N")
print(i + ", is N")
tlock.release()
q = queue.Queue()
threads = []
for i in range(100):
t = threading.Thread(target=async_dns, args=(i,))
threads.append(t)
t.start()
print(threads)
I have spent countless hours on this. I would appreciate some input from expedienced pythonista's . Is this a networking issue ? Can this bottleneck/intermittent responses be solved by switching servers ?
Thanks.
Without answers to the questions, I asked in comments above, I'm not sure how well I can diagnose the issue you're seeing, but here are some thoughts:
It looks like each thread is processing all names instead of just a portion of them.
Your Queue seems to be doing nothing at all.
Your lock seems to guarantee that you actually only do one query at a time (defeating the purpose of having multiple threads).
Rather than trying to fix up this code, might I suggest using multiprocessing.pool.ThreadPool instead? Below is a full working example. (You could use adns instead of socket if you want... I just couldn't easily get it installed and so stuck with the built-in socket.)
In my testing, I also sometimes see pauses; my assumption is that I'm getting throttled somewhere.
import itertools
from multiprocessing.pool import ThreadPool
import socket
import string
def is_available(host):
print('Testing {}'.format(host))
try:
socket.gethostbyname(host)
return False
except socket.gaierror:
return True
# Test the first 1000 three-letter .com hosts
hosts = [''.join(tla) + '.com' for tla in itertools.permutations(string.ascii_lowercase, 3)][:1000]
with ThreadPool(100) as p:
results = p.map(is_available, hosts)
for host, available in zip(hosts, results):
print('{} is {}'.format(host, 'available' if available else 'not available'))

How do I get improved pymongo performance using threading?

I'm trying to see performance improvements on pymongo, but I'm not observing any.
My sample db has 400,000 records. Essentially I'm seeing threaded and single threaded performance be equal - and the only performance gain coming from multiple process execution.
Does pymongo not release the GIL during queries?
Single Perf: real 0m0.618s
Multiproc:real 0m0.144s
Multithread:real 0m0.656s
Regular code:
choices = ['foo','bar','baz']
def regular_read(db, sample_choice):
rows = db.test_samples.find({'choice':sample_choice})
return 42 # done to remove calculations from the picture
def main():
client = MongoClient('localhost', 27017)
db = client['test-async']
for sample_choice in choices:
regular_read(db, sample_choice)
if __name__ == '__main__':
main()
$ time python3 mongotest_read.py
real 0m0.618s
user 0m0.085s
sys 0m0.018s
Now when I use multiprocessing I can see some improvement.
from random import randint, choice
import functools
from pymongo import MongoClient
from concurrent import futures
choices = ['foo','bar','baz']
MAX_WORKERS = 4
def regular_read(sample_choice):
client = MongoClient('localhost', 27017,connect=False)
db = client['test-async']
rows = db.test_samples.find({'choice':sample_choice})
#return sum(r['data'] for r in rows)
return 42
def main():
f = functools.partial(regular_read)
with futures.ProcessPoolExecutor(MAX_WORKERS) as executor:
res = executor.map(f, choices)
print(list(res))
return len(list(res))
if __name__ == '__main__':
main()
$ time python3 mongotest_proc_read.py
[42, 42, 42]
real 0m0.144s
user 0m0.106s
sys 0m0.041s
But when you switch from ProcessPoolExecutor to ThreadPoolExecutor the speed drops back to single threaded mode.
...
def main():
client = MongoClient('localhost', 27017,connect=False)
f = functools.partial(regular_read, client)
with futures.ThreadPoolExecutor(MAX_WORKERS) as executor:
res = executor.map(f, choices)
print(list(res))
return len(list(res))
$ time python3 mongotest_thread_read.py
[42, 42, 42]
real 0m0.656s
user 0m0.111s
sys 0m0.024s
...
PyMongo uses the standard Python socket module, which does drop the GIL while sending and receiving data over the network. However, it's not MongoDB or the network that's your bottleneck: it's Python.
CPU-intensive Python processes do not scale by adding threads; indeed they slow down slightly due to context-switching and other inefficiencies. To use more than one CPU in Python, start subprocesses.
I know it doesn't seem intuitive that a "find" should be CPU intensive, but the Python interpreter is slow enough to contradict our intuition. If the query is fast and there's no latency to MongoDB on localhost, MongoDB can easily outperform the Python client. The experiment you just ran, substituting subprocesses for threads, confirms that Python performance is the bottleneck.
To ensure maximum throughput, make sure you have C extensions enabled: pymongo.has_c() == True. With that in place, PyMongo runs as fast as a Python client library can achieve, to get more throughput go to multiprocessing.
If your expected real-world scenario involves more time-consuming queries, or a remote MongoDB with some network latency, multithreading may give you some performance increase.
You can use mongodb indexes to optimize your queries.
https://docs.mongodb.com/manual/tutorial/optimize-query-performance-with-indexes-and-projections/

how to use wxpython threading to prevent blocking main loop

I'm working on a school project to develop a customized media player on python platform. The problem is when i use time.sleep(duration), it block the main loop of my GUI preventing it from updating. I've consulted my supervisor and was told to use multi-threading but i have no idea how to use threading. Would anyone advice me on how to implement threading in my scenario below?
Code:
def load_playlist(self, event):
playlist = ["D:\Videos\test1.mp4", "D:\Videos\test2.avi"]
for path in playlist:
#calculate each media file duration
ffmpeg_command = ['C:\\MPlayer-rtm-svn-31170\\ffmpeg.exe', '-i' , path]
pipe = subprocess.Popen(ffmpeg_command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
results = pipe.communicate()
#Regular expression to get the duration
length_regexp = 'Duration: (\d{2}):(\d{2}):(\d{2})\.\d+,'
re_length = re.compile(length_regexp)
# find the matches using the regexp that to compare with the buffer/string
matches = re_length.search(str(results))
#print matches
hour = matches.group(1)
minute = matches.group(2)
second = matches.group(3)
#Converting to second
hour_to_second = int(hour) * 60 * 60
minute_to_second = int(minute) * 60
second_to_second = int(second)
num_second = hour_to_second + minute_to_second + second_to_second
print num_second
#Play the media file
trackPath = '"%s"' % path.replace("\\", "/")
self.mplayer.Loadfile(trackPath)
#Sleep for the duration of second(s) for the video before jumping to another video
time.sleep(num_second) #THIS IS THE PROBLEM#
You'll probably want to take a look at the wxPython wiki which has several examples of using threads, Queues and other fun things:
http://wiki.wxpython.org/LongRunningTasks
http://wiki.wxpython.org/Non-Blocking%20Gui
I also wrote a tutorial on the subject here: http://www.blog.pythonlibrary.org/2010/05/22/wxpython-and-threads/
The main thing to keep in mind is that when you use threads, you cannot call your wx methods directly (i.e. myWidget.SetValue, etc). Instead, you need to use one of the wxPython threadsafe methods: wx.CallAfter, wx.CallLater or wx.PostEvent
You would start a new thread like any other multithreading example:
from threading import Thread
# in caller code, start a new thread
Thread(target=load_playlist).start()
However, you have to make sure that calls to wx have to deal with inter-thread communication. You cannot just call wx-code from this new thread. It will segfault. So, use wx.CallAfter:
# in load_playlist, you have to synchronize your wx calls
wx.CallAfter(self.mplayer.Loadfile, trackPath)

Resources