how to use wxpython threading to prevent blocking main loop - multithreading

I'm working on a school project to develop a customized media player on python platform. The problem is when i use time.sleep(duration), it block the main loop of my GUI preventing it from updating. I've consulted my supervisor and was told to use multi-threading but i have no idea how to use threading. Would anyone advice me on how to implement threading in my scenario below?
Code:
def load_playlist(self, event):
playlist = ["D:\Videos\test1.mp4", "D:\Videos\test2.avi"]
for path in playlist:
#calculate each media file duration
ffmpeg_command = ['C:\\MPlayer-rtm-svn-31170\\ffmpeg.exe', '-i' , path]
pipe = subprocess.Popen(ffmpeg_command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
results = pipe.communicate()
#Regular expression to get the duration
length_regexp = 'Duration: (\d{2}):(\d{2}):(\d{2})\.\d+,'
re_length = re.compile(length_regexp)
# find the matches using the regexp that to compare with the buffer/string
matches = re_length.search(str(results))
#print matches
hour = matches.group(1)
minute = matches.group(2)
second = matches.group(3)
#Converting to second
hour_to_second = int(hour) * 60 * 60
minute_to_second = int(minute) * 60
second_to_second = int(second)
num_second = hour_to_second + minute_to_second + second_to_second
print num_second
#Play the media file
trackPath = '"%s"' % path.replace("\\", "/")
self.mplayer.Loadfile(trackPath)
#Sleep for the duration of second(s) for the video before jumping to another video
time.sleep(num_second) #THIS IS THE PROBLEM#

You'll probably want to take a look at the wxPython wiki which has several examples of using threads, Queues and other fun things:
http://wiki.wxpython.org/LongRunningTasks
http://wiki.wxpython.org/Non-Blocking%20Gui
I also wrote a tutorial on the subject here: http://www.blog.pythonlibrary.org/2010/05/22/wxpython-and-threads/
The main thing to keep in mind is that when you use threads, you cannot call your wx methods directly (i.e. myWidget.SetValue, etc). Instead, you need to use one of the wxPython threadsafe methods: wx.CallAfter, wx.CallLater or wx.PostEvent

You would start a new thread like any other multithreading example:
from threading import Thread
# in caller code, start a new thread
Thread(target=load_playlist).start()
However, you have to make sure that calls to wx have to deal with inter-thread communication. You cannot just call wx-code from this new thread. It will segfault. So, use wx.CallAfter:
# in load_playlist, you have to synchronize your wx calls
wx.CallAfter(self.mplayer.Loadfile, trackPath)

Related

Nim counting over multiple threads

I'm currently trying to count how many times I requested a website. In python I would just use a global variable but I have no idea how I would write this in nim.
import httpclient
proc threadMain(a: int) {.thread.} =
var client = newHttpClient()
while true:
try:
var r = client.getContent("URL")
echo "sent"
#Count here
except:
echo "error"
var thread: array[0..10, Thread[int]]
for i in 0..10:
thread[i].createThread(threadMain, i)
thread.joinThreads()
This is explained almost as a copy in the "Nim in Action" book, page 174.
First of all, if you used a global in Python, you had to use a lock or risk a race condition. The thing is not different in Nim: first create a global, and guard it using a lock.
import locks
var counterLock: Lock
initLock(counterLock)
var counter {.guard: counterLock.} = 0
Now use a withLock where you need to update the counter:
withLock counterLock:
counter.inc
The chapter of the book related to paralellism/concurrency is very good. You should check it, as it also explains concurrency (your code is an example of concurrency being better than threading) or how to use Channels to pass data between threads, for example.

Need to do CPU bound processing using 2+ processes in Python by reading from a gzipped file

I have a gzipped file spanning (compressed 10GB, uncompressed 100GB) and which has some reports separated by demarcations and I have to parse it.
The parsing and processing the data is taking long time and hence is a CPU bound problem (not an IO bound problem). So I am planning to split the work into multiple processes using multiprocessing module. The problem is I am unable to send/share data to child processes efficiently. I am using subprocess.Popen to stream in the uncompressed data in parent process.
process = subprocess.Popen('gunzip --keep --stdout big-file.gz',
shell=True,
stdout=subprocess.PIPE)
I am thinking of using a Lock() to read/parse one report in child-process-1 and then release the lock, and switch to child-process-2 to read/parse next report and then switch back to child-process-1 to read/parse next report). When I share the process.stdout as args with the child processes, I get a pickling error.
I have tried to create multiprocessing.Queue() and multiprocessing.Pipe() to send data to child processes, but this is way too slow (in fact it is way slower than doing it in single thread ie serially).
Any thoughts/examples about sending data to child processes efficiently will help.
Could you try something simple instead? Have each worker process run its own instance of gunzip, with no interprocess communication at all. Worker 1 can process the first report and just skip over the second. The opposite for worker 2. Each worker skips every other report. Then an obvious generalization to N workers.
Or not ...
I think you'll need to be more specific about what you tried, and perhaps give more info about your problem (like: how many records are there? how big are they?).
Here's a program ("genints.py") that prints a bunch of random ints, one per line, broken into groups via "xxxxx\n" separator lines:
from random import randrange, seed
seed(42)
for i in range(1000):
for j in range(randrange(1, 1000)):
print(randrange(100))
print("xxxxx")
Because it forces the seed, it generates the same stuff every time. Now a program to process those groups, both in parallel and serially, via the most obvious way I first thought of. crunch() takes time quadratic in the number of ints in a group, so it's quite CPU-bound. The output from one run, using (as shown) 3 worker processes for the parallel part:
parallel result: 10,901,000,334 0:00:35.559782
serial result: 10,901,000,334 0:01:38.719993
So the parallelized run took about one-third the time. In what relevant way(s) does that differ from your problem? Certainly, a full run of "genints.py" produces less than 2 million bytes of output, so that's a major difference - but it's impossible to guess from here whether it's a relevant difference. Perahps, e.g., your problem is only very mildly CPU-bound? It's obvious from output here that the overheads of passing chunks of stdout to worker processes are all but insignificant in this program.
In short, you probably need to give people - as I just did for you - a complete program they can run that reproduces your problem.
import multiprocessing as mp
NWORKERS = 3
DELIM = "xxxxx\n"
def runjob():
import subprocess
# 'py' is just a shell script on my box that
# invokes the desired version of Python -
# which happened to be 3.8.5 for this run.
p = subprocess.Popen("py genints.py",
shell=True,
text=True,
stdout=subprocess.PIPE)
return p.stdout
# Return list of lines up to (but not including) next DELIM,
# or EOF. If the file is already exhausted, return None.
def getrecord(f):
result = []
foundone = False
for line in f:
foundone = True
if line == DELIM:
break
result.append(line)
return result if foundone else None
def crunch(rec):
total = 0
for a in rec:
for b in rec:
total += abs(int(a) - int(b))
return total
if __name__ == "__main__":
import datetime
now = datetime.datetime.now
s = now()
total = 0
f = runjob()
with mp.Pool(NWORKERS) as pool:
for i in pool.imap_unordered(crunch,
iter((lambda: getrecord(f)), None)):
total += i
f.close()
print(f"parallel result: {total:,}", now() - s)
s = now()
# try the same thing serially
total = 0
f = runjob()
while True:
rec = getrecord(f)
if rec is None:
break
total += crunch(rec)
f.close()
print(f"serial result: {total:,}", now() - s)

Python Threading Issue, Is this Right?

I am attempting to make a few thousand dns queries. I have written my script to use python-adns. I have attempted to add threading and queue's to ensure the script runs optimally and efficiently.
However, I can only achieve mediocre results. The responses are choppy/intermittent. They start and stop, and most times pause for 10 to 20 seconds.
tlock = threading.Lock()#printing to screen
def async_dns(i):
s = adns.init()
for i in names:
tlock.acquire()
q.put(s.synchronous(i, adns.rr.NS)[0])
response = q.get()
q.task_done()
if response == 0:
dot_net.append("Y")
print(i + ", is Y")
elif response == 300:
dot_net.append("N")
print(i + ", is N")
tlock.release()
q = queue.Queue()
threads = []
for i in range(100):
t = threading.Thread(target=async_dns, args=(i,))
threads.append(t)
t.start()
print(threads)
I have spent countless hours on this. I would appreciate some input from expedienced pythonista's . Is this a networking issue ? Can this bottleneck/intermittent responses be solved by switching servers ?
Thanks.
Without answers to the questions, I asked in comments above, I'm not sure how well I can diagnose the issue you're seeing, but here are some thoughts:
It looks like each thread is processing all names instead of just a portion of them.
Your Queue seems to be doing nothing at all.
Your lock seems to guarantee that you actually only do one query at a time (defeating the purpose of having multiple threads).
Rather than trying to fix up this code, might I suggest using multiprocessing.pool.ThreadPool instead? Below is a full working example. (You could use adns instead of socket if you want... I just couldn't easily get it installed and so stuck with the built-in socket.)
In my testing, I also sometimes see pauses; my assumption is that I'm getting throttled somewhere.
import itertools
from multiprocessing.pool import ThreadPool
import socket
import string
def is_available(host):
print('Testing {}'.format(host))
try:
socket.gethostbyname(host)
return False
except socket.gaierror:
return True
# Test the first 1000 three-letter .com hosts
hosts = [''.join(tla) + '.com' for tla in itertools.permutations(string.ascii_lowercase, 3)][:1000]
with ThreadPool(100) as p:
results = p.map(is_available, hosts)
for host, available in zip(hosts, results):
print('{} is {}'.format(host, 'available' if available else 'not available'))

How to get a working Progress Bar in python using multi-process or multi-threaded clients?

That's the basic idea.
I couldn't understand them while reading about all this in python's documentation. There's simply too much of it to understand it.
Could someone explain me how to get a working progress bar with a multi-threaded(or multi-process) client?
Or is there any other way to "update" progress bar without locking program's GUI?
Also I did read something about "I/O" errors when these types of clients try to access a file at the same time & X-server errors when an application doesn't call multi-thread libs correctly. How do I avoid 'em?
This time I didn't write code, I don't want to end with a zombie-process or something like that (And then force PC to shutdown, I would hate to corrupt precious data...). I need to understand what am I doing first!
Something like the below might get you started? - Ive tried to annotate as much as possible to explain the process.
from Tkinter import *
from Queue import Queue
import ttk, threading
import time
queue = Queue()
root = Tk()
# Function to do 'stuff' and place object in queue for later #
def foo():
# sleep to demonstrate thread doing work #
time.sleep(5)
obj = [x for x in range(0,10)]
queue.put(obj)
# Create thread object, targeting function to do 'stuff' #
thread1 = threading.Thread(target=foo, args=())
# Function to check state of thread1 and to update progressbar #
def progress(thread, queue):
# starts thread #
thread.start()
# defines indeterminate progress bar (used while thread is alive) #
pb1 = ttk.Progressbar(root, orient='horizontal', mode='indeterminate')
# defines determinate progress bar (used when thread is dead) #
pb2 = ttk.Progressbar(root, orient='horizontal', mode='determinate')
pb2['value'] = 100
# places and starts progress bar #
pb1.pack()
pb1.start()
# checks whether thread is alive #
while thread.is_alive():
root.update()
pass
# once thread is no longer active, remove pb1 and place the '100%' progress bar #
pb1.destroy()
pb2.pack()
# retrieves object from queue #
work = queue.get()
return work
work = progress(thread1, queue)
root.mainloop()
Hope this helps. Let me know what you think!

What can be slowing down my program when i use multithreading?

I'm writing a program that downloads data from a website (eve-central.com). It returns xml when I send a GET request with some parameters. The problem is that I need to make about 7080 of such requests because i can't specify the typeid parameter more than once.
def get_data_eve_central(typeids, system, hours, minq=1, thread_count=1):
import xmltodict, urllib3
pool = urllib3.HTTPConnectionPool('api.eve-central.com')
for typeid in typeids:
r = pool.request('GET', '/api/quicklook', fields={'typeid': typeid, 'usesystem': system, 'sethours': hours, 'setminQ': minq})
answer = xmltodict.parse(r.data)
It was really slow when I just connected to the website and made all the requests so I decided to make it use multiple threads at a time (I read that if the process involves a lot of waiting (I/O, HTTP requests), it can be speeded up a lot with multithreading). I rewrote it using multiple threads, but it somehow isn't any faster (a bit slower in fact). Here's the code rewritten using multithreading:
def get_data_eve_central(all_typeids, system, hours, minq=1, thread_count=1):
if thread_count > len(all_typeids): raise NameError('TooManyThreads')
def requester(typeids):
pool = urllib3.HTTPConnectionPool('api.eve-central.com')
for typeid in typeids:
r = pool.request('GET', '/api/quicklook', fields={'typeid': typeid, 'usesystem': system, 'sethours': hours, 'setminQ': minq})
answer = xmltodict.parse(r.data)['evec_api']['quicklook']
answers.append(answer)
def chunkify(items, quantity):
chunk_len = len(items) // quantity
rest_count = len(items) % quantity
chunks = []
for i in range(quantity):
chunk = items[:chunk_len]
items = items[chunk_len:]
if rest_count and items:
chunk.append(items.pop(0))
rest_count -= 1
chunks.append(chunk)
return chunks
t = time.clock()
threads = []
answers = []
for typeids in chunkify(all_typeids, thread_count):
threads.append(threading.Thread(target=requester, args=[typeids]))
threads[-1].start()
threads[-1].join()
print(time.clock()-t)
return answers
What I do is I divide all typeids into as many chunks as the quantity of threads i want to use and create a thread for each chunk to process it. The question is: what can slow it down? (I apologise for my bad english)
Python has Global Interpreter Lock. It can be your problem. Actually Python cannot do it in a genuine parallel way. You may think about switching to other languages or staying with Python but use process-based parallelism to solve your task. Here is a nice presentation Inside the Python GIL

Resources