I am really trying to wrap my head around the concept of Threading concept in practical applications. I am using the Threading module in python 3.4 and I am not sure if the logic is right for the program functionality.
Here is the gist of my code:
def myMain():
""" my main function actually uses sockets to
send data over a network
"""
# there will be an infinite loop call it loop-1 here
while True:
#perform encoding scheme
#send data out... with all other exception handling
# here is infinite loop 2 which waits for messages
# from other devices
while True:
# Check for incoming messages
callCheckFunction() ----> Should I call this on a thread?
the above mentioned callCheckFunction() will do some comparison on the received data and if the data values don't match I want to run the myMain() function again.
Here is the gist of the callCheckFunction():
def callCheckFunction():
if data == 'same':
# some work done here and then get out
# function and return back to listening on the socket
else data == 'not same':
myMain() ---------> Should I thread this one too??
This might be complicated but I am not sure if Threading is the thing I want. I did a nasty hack by calling the myMain() function the above mentioned fashioned which works great! but I assume there will definitely some limit to calling the function within the function and I want my code to be a bit professional not Hacky!
I have my mind set on Threading since I am listening to the socket in an infinite fashion when some new Data comes in the whole myMain() is called back creating kind of a hectic recursion which I want to control.
EDIT
So I have managed to make the code a bit more modular i.e. I have split the two Infinite Loops in to two different functions
now myMain is divided into
task1()
task2()
and the gist is as follows:
def task1():
while True:
# encoding and sending data
#in the end I call task2() since it the primary
# state which decides things
task2() ---------> still need to decide if I should thread or not
def task2():
while True:
# check for incoming messages
checker = threading.Thread(callCheckFunction, daemon=True)
checker.start()
checker.join()
Now since the callCheckFunction() needs the func1() I decided to Thread func1() in the function Note func1 is actually kinda the main() of the code:
def callCheckFunction():
else data == 'not same':
thready = threading.Thread(func1, daemon= True)
thready.start()
thready.join()
Results
with little understanding I do manage to get the code working. But I am not sure if this is really hacky or a professional way of doing things! I can ofcourse share the code via GitHub and also a Finite State Machine for the system. Also I am not sure if this code is Thread Safe ! But Help/Suggestions really needed
Related
I have 3 classes that represent nearly isolated processes that can be run concurrently (meant to be persistent, like 3 main() loops).
class DataProcess:
...
def runOnce(self):
...
class ComputeProcess:
...
def runOnce(self):
...
class OtherProcess:
...
def runOnce(self):
...
Here's the pattern I'm trying to achieve:
start various streams
start each process
allow each process to publish to any stream
allow each process to listen to any stream (at various points in it's loop) and behave accordingly (allow for interruption of it's current task or not, etc.)
For example one 'process' Listens for external data. Another process does computation on some of that data. The computation process might be busy for a while, so by the time it comes back to start and checks the stream, there may be many values that piled up. I don't want to just use a queue because, actually I don't want to be forced to process each one in order, I'd rather be able to implement logic like, "if there is one or multiple things waiting, just run your process one more time, otherwise go do this interruptible task while you wait for something to show up."
That's like a lot, right? So I was thinking of using an actor model until I discovered RxPy. I saw that a stream is like a subject
from reactivex.subject import BehaviorSubject
newData = BehaviorSubject()
newModel = BehaviorSubject()
then I thought I'd start 3 threads for each of my high level processes:
thread = threading.Thread(target=data)
threads = {'data': thread}
thread = threading.Thread(target=compute)
threads = {'compute': thread}
thread = threading.Thread(target=other)
threads = {'other': thread}
for thread in threads.values():
thread.start()
and I thought the functions of those threads should listen to the streams:
def data():
while True:
DataProcess().runOnce() # publishes to stream inside process
def compute():
def run():
ComuteProcess().runOnce()
newData.events.subscribe(run())
newModel.events.subscribe(run())
def other():
''' not done '''
ComuteProcess().runOnce()
Ok, so that's what I have so far. Is this pattern going to give me what I'm looking for?
Should I use threading in conjunction with rxpy or just use rxpy scheduler stuff to achieve concurrency? If so how?
I hope this question isn't too vague, I suppose I'm looking for the simplest framework where I can have a small number of computational-memory units (like objects because they have internal state) that communicate with each other and work in parallel (or concurrently). At the highest level I want to be able to treat these computational-memory units (which I've called processes above) as like individuals who mostly work on their own stuff but occasionally broadcast or send a message to a specific other individual, requesting information or providing information.
Am I perhaps actually looking for an actor model framework? or is this RxPy setup versatile enough to achieve that without extreme complexity?
Thanks so much!
Working with Python 3.6, what I’m looking to accomplish is to create a function that continuously scrapes dynamic/changing data from a webpage, while the rest of the script executes, and is able to reference the data returned from the continuous function.
I know this is likely a threading task, however I’m not super knowledgeable in it yet. Pseudo-code I might think looks something like this
def continuous_scraper():
# Pull data from webpage
scraped_table = pd.read_html(url)
return scraped_table
# start the continuous scraper function here, to run either indefinitely, or preferably stop after a predefined amount of time
scraped_table = thread(continuous_scraper)
# the rest of the script is run here, making use of the updating “scraped_table”
while True:
print(scraped_table[“Col_1”].iloc[0]
Here is a fairly simple example using some stock market page that seems to update every couple of seconds.
import threading, time
import pandas as pd
# A lock is used to ensure only one thread reads or writes the variable at any one time
scraped_table_lock = threading.Lock()
# Initially set to None so we know when its value has changed
scraped_table = None
# This bad-boy will be called only once in a separate thread
def continuous_scraper():
# Tell Python this is a global variable, so it rebinds scraped_table
# instead of creating a local variable that is also named scraped_table
global scraped_table
url = r"https://tradingeconomics.com/australia/stock-market"
while True:
# Pull data from webpage
result = pd.read_html(url, match="Dow Jones")[0]
# Acquire the lock to ensure thread-safety, then assign the new result
# This is done after read_html returns so it doesn't hold the lock for so long
with scraped_table_lock:
scraped_table = result
# You don't wanna flog the server, so wait 2 seconds after each
# response before sending another request
time.sleep(2)
# Make the thread daemonic, so the thread doesn't continue to run once the
# main script and any other non-daemonic threads have ended
scraper_thread = threading.Thread(target=continuous_scraper, daemon=True)
# start the continuous scraper function here, to run either indefinitely, or
# preferably stop after a predefined amount of time
scraper_thread.start()
# the rest of the script is run here, making use of the updating “scraped_table”
for _ in range(100):
print("Time:", time.time())
# Acquire the lock to ensure thread-safety
with scraped_table_lock:
# Check if it has been changed from the default value of None
if scraped_table is not None:
print(" ", scraped_table)
else:
print("scraped_table is None")
# You probably don't wanna flog your stdout, either, dawg!
time.sleep(0.5)
Be sure to read about multithreaded programming and thread safety. It's easy to make mistakes. If there is a bug, it often only manifests in rare and seemingly random occasions, making it difficult to debug.
I recommend looking into multiprocessing library and Pool class.
The docs have multiple examples of how to use it.
Question itself is too general to make a simple answer.
I have two threads concurrently running, speechRecognition and speakBack. Both of these threads are run in while loops (while True: #do something).
Speech recognition is constantly waiting for microphone input. Then, once it is received, it saves the text version of the verbal input to a file, which is loaded by my second thread, speakBack, and spoken through the speakers.
My issue is that when the phrase is spoken through the speakers, it is picked up by the microphone and then translated and once again saved to this file to be processed, resulting in an endless loop.
How can i make the speechRecognition thread suspend itself, wait for the speakBack thread to stop outputting sound through the speakers, and then continue listening for the next verbal input?
Im using the speechRecognition library and the pyttsx3 library for speech recognition and verbal ouput respectively.
The way to do this is to have shared state between the threads (either with global variables that the threads can store into and read from to indicate their progress, or with a mutable reference that is passed into each thread). The solution I’ll give below involves a global variable that stores a mutable reference, but you could just as easily pass the queue into both threads instead of storing it globally.
Using queues is a very standard way to pass messages between threads in python, because queues are already written in a thread-safe way that makes it so you don’t have to think about synchronization and locking. Furthermore, the blocking call to queue.get is implemented in a way that doesn’t involve repeatedly and wastefully checking a condition variable in a while loop.
Here’s how some code might look:
import queue
START_SPEAK_BACK = 0
START_SPEECH_RECOGNITION = 1
messageQueue = queue.Queue()
# thread 1
def speechRecognition():
while True:
# wait for input like you were doing before
# write to file as before
# put message on the queue for other thread to get
messageQueue.put(START_SPEAK_BACK)
# Calling `get` with no arguments makes the call be
# "blocking" in the sense that it won't return until
# there is an element on the queue to get.
messageFromOtherThread = messageQueue.get()
# logically, messageFromOtherThread can only ever be
# START_SPEECH_RECOGNITION, but you could still
# check that this is true and raise an exception if not.
# thread 2
def speakBack():
while True:
messageFromOtherThread = messageQueue.get()
# likewise, this message will only be START_SPEAK_BACK
# but you could still check.
# Here, fill in the code that speaks through the speakers.
# When that's done:
messageQueue.put(START_SPEECH_RECOGNITION)
Some comments:
This solution uses a single queue. It could just have easily used two queues, one for speakBack —> speechRecognition communication and the other for speechRecognition —> communication. This might make more sense if the two threads were generating messages concurrently.
This solution doesn’t actually involve inspecting the contents of the messages. However, if you need to pass additional information between threads, you could very easily pass objects or data as messages (instead of just constant values)
Finally, it’s not clear to me why you don’t just run all code in the same thread. It seems like there’s a very clear (serial) series of steps you want your program to follow: get audio input, write it to file, speak it back, start over. It might make more sense to write everything as a normal, serial, threadless python program.
I want to do a simple job. I have a list of n elements, and want to split the list into two smaller lists and use threading to perform a simple calculation and append them to a new list. I've written some testcode and it seems to work fine when I have a small amount of elements (say 3000). But when the element list is larger (30,000), over 12-20k tasks are being dropped and the append just doesn't go through.
I've read a lot about what constitutes threadsafe, and queueing. I believe it has something to do with that, but even after experimenting with Lock() I still seem to be unable to get a threadsafe Thread.
Can someone point me in the right direction? Cheers.
# Seperate thread workload
a_genes = genes[0:count_seperator]
b_genes = genes[count_seperator:genes_count]
class GeneThread (Thread):
def __init__(self, genelist):
Thread.__init__(self)
self.genelist = genelist
def run(self):
for gene in self.genelist:
total_reputation = 0
for local_snp in gene:
user_rsid = rsids[0]
if user_rsid is None:
continue
rep = "B"
# If multiplier is 0, don't waste time calculating
if not rep or rep == "G" or rep == "U":
continue
importance = 1
weighted_reputation = importance * mul[rep]
zygosity = "homozygous_minor"
if rep == "B":
weighted_reputation *= z_mul[zygosity]
# Now we apply the spread amplifier, we raise the score to the power of the spread number
rep_square = pow(spread, weighted_reputation)
total_reputation += rep_square
try:
with lock:
UserGeneReputation.append(total_reputation)
except:
pass
start_time = time.time()
# Create new threads
gene_thread1 = GeneThread(genelist=a_genes)
gene_thread2 = GeneThread(genelist=b_genes)
gene_thread1.daemon, gene_thread2.daemon = True, True
# Start new Threads
gene_thread1.start()
gene_thread2.start()
print(len(UserGeneReputation))
print("--- %s seconds ---" % (time.time() - start_time))
You have, broadly speaking, two choices with threads. You can have them be autonomous, do their work, and then terminate themselves quietly. Or you can have them be managed by some other thread that monitors their lifetime and knows when they're done. You have a design that absolutely requires the second option (how else will you know when you have all the results you need?), yet you've chosen the first (set them for self-termination and not monitored).
Don't make the threads daemon threads. Instead, wait for both threads to finish after you start them. That's not the most sophisticated or elegant solution, but it's the one everyone learns first.
The problem with this approach is that it forces your code to be dependent on how work is assigned to threads. This can cause performance problems as you wind up having to create and destroy a thread every time you want to know when work is done, and the only way you can know that work is done is by waiting for it. Ideally, you would treat threads as an abstraction that gets work done somehow and code that has to wait for work to be finished would wait for the work itself to be finished (through some synchronization associated with the work itself) rather than wait for the thread to be finished. That way, you can be flexible about what thread does what work and don't have to keep creating and destroying threads every time you need to assign work.
But everyone learns the create/join method. And sometimes it really is the best choice. Even when you use other methods, you likely still have an outer create/join to create the threads in the first place and, typically, ensure they cleanly finish to shut down your program in an orderly way.
I currently have code in the form of a generator which calls an IO-bound task. The generator actually calls sub-generators as well, so a more general solution would be appreciated.
Something like the following:
def processed_values(list_of_io_tasks):
for task in list_of_io_tasks:
value = slow_io_call(task)
yield postprocess(value) # in real version, would iterate over
# processed_values2(value) here
I have complete control over slow_io_call, and I don't care in which order I get the items from processed_values. Is there something like coroutines I can use to get the yielded results in the fastest order by turning slow_io_call into an asynchronous function and using whichever call returns fastest? I expect list_of_io_tasks to be at least thousands of entries long. I've never done any parallel work other than with explicit threading, and in particular I've never used the various forms of lightweight threading which are available.
I need to use the standard CPython implementation, and I'm running on Linux.
Sounds like you are in search of multiprocessing.Pool(), specifically the Pool.imap_unordered() method.
Here is a port of your function to use imap_unordered() to parallelize calls to slow_io_call().
def processed_values(list_of_io_tasks):
pool = multiprocessing.Pool(4) # num workers
results = pool.imap_unordered(slow_io_call, list_of_io_tasks)
while True:
yield results.next(9999999) # large time-out
Note that you could also iterate over results directly (i.e. for item in results: yield item) without a while True loop, however calling results.next() with a time-out value works around this multiprocessing keyboard interrupt bug and allows you to kill the main process and all subprocesses with Ctrl-C. Also note that the StopIteration exceptions are not caught in this function but one will be raised when results.next() has no more items return. This is legal from generator functions, such as this one, which are expected to either raise StopIteration errors when there are no more values to yield or just stop yielding and a StopIteration exception will be raised on it's behalf.
To use threads in place of processes, replace
import multiprocessing
with
import multiprocessing.dummy as multiprocessing