Python3 threading Timer / callback .start() again? - multithreading

Let's say I have built a small custom class - I think quickly built a representative prototype below - to help me hold on to and operate on some data.
The ultimate use case, though, is to hold on to data and periodically delete older data, really just holding onto newer data. I was thinking to start I would just try to delete all of the data after 15 seconds. And understand that in trying to do things/operate on the data I have, well, that it's all less than 15 seconds old for sure. So it's all new data.
And then later I could make the class more complicated - do things like - not delete all of the data but just the VERY old stuff, or change the 15 second interval to something dynamic/smarter, etc.
But before I even got near that point, I was stumbled by the way threading/timer worked. I discovered empirically that the below code only prints out 'Fresh!' if I include that second/subsequent threading.Timer call INSIDE the freshen_up method which is the callback. I would have thought I'd only need to start the timer once and it would just call the callback every 15 seconds. Am I crazy, or do I need to set the timer 'every time' (as I do in the code below), for this simple implementation to work? Again, as far as I can tell, this code works, I am just surprised that I needed to start a new timer every time. So I want to make sure I am not doing something inefficient/unnecessary due to my lack of understanding how timer works... thanks
import threading
class Fresh_Data_Container():
def __init__(self):
self.fresh_data_dict = {}
threading.Timer(15.0, self.freshen_up).start()
def freshen_up(self):
self.fresh_data_dict = {} # starting over for now, perhaps more nuanced later
print ('Fresh!')
threading.Timer(15.0, self.freshen_up).start() # does this really have to be here or am i looking at this wrong?
def add_fresh_data(self, some_key, fresh_data):
if some_key in self.fresh_data_dict:
self.fresh_data_dict[some_key].append(fresh_data)
else:
self.fresh_data_dict[some_key] = [fresh_data]
def operate_on_data(self):
pass # do stuff

threading.Timer isn't periodic -- that is, it only calls the given function once after the elapsed 15 seconds, not every 15 seconds.
A better way to do this would be to create a regular Thread instead of a Timer:
def __init__(self):
self.fresh_data_dict = {}
self.running = True
threading.Thread(target=self.freshen_up, daemon=True).start()
def stop(self):
self.running = False
def freshen_up(self):
while self.running:
time.sleep(15)
self.fresh_data_dict.clear()

Related

How to run a threaded function that returns a variable?

Working with Python 3.6, what I’m looking to accomplish is to create a function that continuously scrapes dynamic/changing data from a webpage, while the rest of the script executes, and is able to reference the data returned from the continuous function.
I know this is likely a threading task, however I’m not super knowledgeable in it yet. Pseudo-code I might think looks something like this
def continuous_scraper():
# Pull data from webpage
scraped_table = pd.read_html(url)
return scraped_table
# start the continuous scraper function here, to run either indefinitely, or preferably stop after a predefined amount of time
scraped_table = thread(continuous_scraper)
# the rest of the script is run here, making use of the updating “scraped_table”
while True:
print(scraped_table[“Col_1”].iloc[0]
Here is a fairly simple example using some stock market page that seems to update every couple of seconds.
import threading, time
import pandas as pd
# A lock is used to ensure only one thread reads or writes the variable at any one time
scraped_table_lock = threading.Lock()
# Initially set to None so we know when its value has changed
scraped_table = None
# This bad-boy will be called only once in a separate thread
def continuous_scraper():
# Tell Python this is a global variable, so it rebinds scraped_table
# instead of creating a local variable that is also named scraped_table
global scraped_table
url = r"https://tradingeconomics.com/australia/stock-market"
while True:
# Pull data from webpage
result = pd.read_html(url, match="Dow Jones")[0]
# Acquire the lock to ensure thread-safety, then assign the new result
# This is done after read_html returns so it doesn't hold the lock for so long
with scraped_table_lock:
scraped_table = result
# You don't wanna flog the server, so wait 2 seconds after each
# response before sending another request
time.sleep(2)
# Make the thread daemonic, so the thread doesn't continue to run once the
# main script and any other non-daemonic threads have ended
scraper_thread = threading.Thread(target=continuous_scraper, daemon=True)
# start the continuous scraper function here, to run either indefinitely, or
# preferably stop after a predefined amount of time
scraper_thread.start()
# the rest of the script is run here, making use of the updating “scraped_table”
for _ in range(100):
print("Time:", time.time())
# Acquire the lock to ensure thread-safety
with scraped_table_lock:
# Check if it has been changed from the default value of None
if scraped_table is not None:
print(" ", scraped_table)
else:
print("scraped_table is None")
# You probably don't wanna flog your stdout, either, dawg!
time.sleep(0.5)
Be sure to read about multithreaded programming and thread safety. It's easy to make mistakes. If there is a bug, it often only manifests in rare and seemingly random occasions, making it difficult to debug.
I recommend looking into multiprocessing library and Pool class.
The docs have multiple examples of how to use it.
Question itself is too general to make a simple answer.

Performance difference between multithread using queue and futures.ThreadPoolExecutor using list in python3?

I was trying various approaches with python multi-threading to see which one fits my requirements. To give an overview, I have a bunch of items that I need to send to an API. Then based on the response, some of the items will go to a database and all the items will be logged; e.g., for an item if the API returns success, that item will only be logged but when it returns failure, that item will be sent to database for future retry along with logging.
Now based on the API response I can separate out success items from failure and make a batch query with all failure items, which will improve my database performance. To do that, I am accumulating all requests at one place and trying to perform multithreaded API calls(since this is an IO bound task, I'm not even thinking about multiprocessing) but at the same time I need to keep track of which response belongs to which request.
Coming to the actual question, I tried two different approaches which I thought would give nearly identical performance, but there turned out to be a huge difference.
To simulate the API call, I created an API in my localhost with a 500ms sleep(for avg processing time). Please note that I want to start logging and inserting to database after all API calls are complete.
Approach - 1(With threading.Thread and queue.Queue())
import requests
import datetime
import threading
import queue
def target(data_q):
while not data_q.empty():
data_q.get()
response = requests.get("https://postman-echo.com/get?foo1=bar1&foo2=bar2")
print(response.status_code)
data_q.task_done()
if __name__ == "__main__":
data_q = queue.Queue()
for i in range(0, 20):
data_q.put(i)
start = datetime.datetime.now()
num_thread = 5
for _ in range(num_thread):
worker = threading.Thread(target=target(data_q))
worker.start()
data_q.join()
print('Time taken multi-threading: '+str(datetime.datetime.now() - start))
I tried with 5, 10, 20 and 30 times and the results are below correspondingly,
Time taken multi-threading: 0:00:06.625710
Time taken multi-threading: 0:00:13.326969
Time taken multi-threading: 0:00:26.435534
Time taken multi-threading: 0:00:40.737406
What shocked me here is, I tried the same without multi-threading and got almost same performance.
Then after some googling around, I was introduced to futures module.
Approach - 2(Using concurrent.futures)
def fetch_url(im_url):
try:
response = requests.get(im_url)
return response.status_code
except Exception as e:
traceback.print_exc()
if __name__ == "__main__":
data = []
for i in range(0, 20):
data.append(i)
start = datetime.datetime.now()
urls = ["https://postman-echo.com/get?foo1=bar1&foo2=bar2" + str(item) for item in data]
with futures.ThreadPoolExecutor(max_workers=5) as executor:
responses = executor.map(fetch_url, urls)
for ret in responses:
print(ret)
print('Time taken future concurrent: ' + str(datetime.datetime.now() - start))
Again with 5, 10, 20 and 30 times and the results are below correspondingly,
Time taken future concurrent: 0:00:01.276891
Time taken future concurrent: 0:00:02.635949
Time taken future concurrent: 0:00:05.073299
Time taken future concurrent: 0:00:07.296873
Now I've heard about asyncio, but I've not used it yet. I've also read that it gives even better performance than futures.ThreadPoolExecutor().
Final question, If both approaches are using threads(or so I think) then why there is a huge performance gap? Am I doing something terribly wrong? I looked around. Was not able to find a satisfying answer. Any thoughts on this would be highly appreciated. Thanks for going through the question.
[Edit 1]The whole thing is running on python 3.8.
[Edit 2] Updated code examples and execution times. Now they should run on anyone's system.
The documentation of ThreadPoolExecutor explains in detail how many threads are started when the max_workers parameter is not given, as in your example. The behaviour is different depending on the exact Python version, but the number of tasks started is most probably more than 3, the number of threads in the first version using a queue. You should use futures.ThreadPoolExecutor(max_workers= 3) to compare the two approaches.
To the updated Approach - 1 I suggest to modify the for loop a bit:
for _ in range(num_thread):
target_to_run= target(data_q)
print('target to run: {}'.format(target_to_run))
worker = threading.Thread(target= target_to_run)
worker.start()
The output will be like this:
200
...
200
200
target to run: None
target to run: None
target to run: None
target to run: None
target to run: None
Time taken multi-threading: 0:00:10.846368
The problem is that the Thread constructor expects a callable object or None as its target. You do not give it a callable, rather queue processing happens on the first invocation of target(data_q) by the main thread, and 5 threads are started that do nothing because their target is None.

How to run multiple functions in Python

I need to use the turtle function to create a shape that moves and that can be paused. I figured out how to do that but now I need to make it run for 10 seconds then end. It also needs to print the seconds as the countdown runs and the timer needs to pause when the app is paused.
This is what I have so far.
import turtle
import time
wn = turtle.Screen()
wn.bgcolor('blue')
player = turtle.Turtle()
player.color('yellow')
player.shape('triangle')
player.penup()
speed=1
def PlayerPause():
global speed
speed = 0
def PlayerPlay():
global speed
speed = 1
while True:
player.forward(speed)
player.left(speed)
def Timer():
seconds = 11
for i in range(1,11):
print(str(seconds - i)+ ' Seconds remain')
time.sleep(1)
print('GAME OVER')
turtle.listen()
turtle.onkey(PlayerPause, 'p')
turtle.onkey(PlayerPlay, 'r')
turtle.onkey(Timer, 'e')
wn.mainloop()
I suggest you practice "incremental development." That means instead writing a whole bunch of code and then trying to make it work, you make just one relatively small thing work, then gradually add more functions, one small step at a time.
I'll sketch how you could do this for this question.
Get something to happen once a second. Just make some kind of visible change once a second. It could be making some object alternately appear and disappear. It can be anything that is easy, visible, and doesn't break what you already have. The Screen class has an ontimer method you can use for this.
Only after the first step is working, start counting seconds and displaying the counter. Don't worry about pausing or terminating. Just make a counter that shows increasing numbers as long as the script runs.
Change it to shut everything down when the counter gets to ten.
Change it to count down from ten to zero instead of up from zero to ten.
Change the timer function to check whether the action is paused. If so, it should not change the seconds counter.
If you get stuck on any step, you can post a more specific question.

Scheduling actions

I'm making a BlackJack application using Kivy, I basically need to make a sort of delay or even a time.sleep, but of course, it doesn't have to freeze the program. I saw kivy has Clock.whatever to schedule certain actions. What I'd like to do is scheduling multiple actions so that when the first action has finished, the second will be run and so on. What's the best way to achive this? or is there in the Clock module something to perform multiple delays one after another?
This could be an example of what i need to do:
from kivy.clock import Clock
from kivy.uix import BoxLayout
from functools import partial
class Foo(BoxLayout):
def __init__(self, **kwargs):
super().__init__(**kwargs)
for index_time, card in enumerate(cards, 1):
# Schedule this action to be run after 1 sec from the previous one and so on
Clock.schedule_once(partial(self.function, card), index_time)
def function(self, card, *args):
self.add_widget(card)
First, I'm surprised that your question didn't get down-voted since this is not supposed to be a place for opinion questions. So you shouldn't ask for best.
The Clock module doesn't have a specific method to do what you want. Obvioulsy, you could do a list of Clock.schedule_once() calls, as your example code does. Another way is to have each function schedule its successor, but that assumes that the functions will always be called in that order.
Anyway, there are many ways to do what you want. I have used a construct like the following:
class MyScheduler(Thread):
def __init__(self, funcsList=None, argsList=None, delaysList=None):
super(MyScheduler, self).__init__()
self.funcs = funcsList
self.delays = delaysList
self.args = argsList
def run(self):
theLock = threading.Lock()
for i in range(len(self.funcs)):
sleep(self.delays[i])
Clock.schedule_once(partial(self.funcs[i], *self.args[i], theLock))
theLock.acquire()
It is a separate thread, so you don't have to worry about freezing your gui. You pass it a list of functions to be executed, a list of arguments for those functions, and a list of delays (for a sleep before each function is executed). Note that using Clock.schedule_once() schedules the execution on the main thread, and not all functions need to be executed on the main thread. The functions must allow for an argument that is a Lock object and the functions must release the Lock when it completes. Something like:
def function2(self, card1, card2, theLock=None, *args):
print('in function2, card1 = ' + str(card1) + ', card2 = ' + str(card2))
if theLock is not None:
theLock.release()
The MyScheduler class __init__() method could use more checking to make sure it won't throw an exception when it is run.

using threading to call the main function again?

I am really trying to wrap my head around the concept of Threading concept in practical applications. I am using the Threading module in python 3.4 and I am not sure if the logic is right for the program functionality.
Here is the gist of my code:
def myMain():
""" my main function actually uses sockets to
send data over a network
"""
# there will be an infinite loop call it loop-1 here
while True:
#perform encoding scheme
#send data out... with all other exception handling
# here is infinite loop 2 which waits for messages
# from other devices
while True:
# Check for incoming messages
callCheckFunction() ----> Should I call this on a thread?
the above mentioned callCheckFunction() will do some comparison on the received data and if the data values don't match I want to run the myMain() function again.
Here is the gist of the callCheckFunction():
def callCheckFunction():
if data == 'same':
# some work done here and then get out
# function and return back to listening on the socket
else data == 'not same':
myMain() ---------> Should I thread this one too??
This might be complicated but I am not sure if Threading is the thing I want. I did a nasty hack by calling the myMain() function the above mentioned fashioned which works great! but I assume there will definitely some limit to calling the function within the function and I want my code to be a bit professional not Hacky!
I have my mind set on Threading since I am listening to the socket in an infinite fashion when some new Data comes in the whole myMain() is called back creating kind of a hectic recursion which I want to control.
EDIT
So I have managed to make the code a bit more modular i.e. I have split the two Infinite Loops in to two different functions
now myMain is divided into
task1()
task2()
and the gist is as follows:
def task1():
while True:
# encoding and sending data
#in the end I call task2() since it the primary
# state which decides things
task2() ---------> still need to decide if I should thread or not
def task2():
while True:
# check for incoming messages
checker = threading.Thread(callCheckFunction, daemon=True)
checker.start()
checker.join()
Now since the callCheckFunction() needs the func1() I decided to Thread func1() in the function Note func1 is actually kinda the main() of the code:
def callCheckFunction():
else data == 'not same':
thready = threading.Thread(func1, daemon= True)
thready.start()
thready.join()
Results
with little understanding I do manage to get the code working. But I am not sure if this is really hacky or a professional way of doing things! I can ofcourse share the code via GitHub and also a Finite State Machine for the system. Also I am not sure if this code is Thread Safe ! But Help/Suggestions really needed

Resources