Synchronizing two senders and receiver threads - multithreading

I have the following problem, I have 2 senders, that send one after each other, and only after 2 of them sent the message, I want a receiver to start receiving and then wait for the next two sends, e.g.
S1
S2
Rec
S1
S2 Rec...
semOne = threading.Semaphore(value = 1)
semTwo = threading.Semaphore(value = 0)
semRecNotify = threading.Semaphore(value = 0)
def senderOne():
while True:
semOne.acquire()
print("Sender One")
semTwo.release()
def senderTwo():
while True:
semTwo.acquire()
print("Sender Two")
semOne.release()
semRecNotify.release()
def receiver():
while True:
semRecNotify.acquire()
print("Receiver")
Using semOne and semTwo, I managed to synchronize 2 senders and using semRecNotify I tried (unsuccessfully) make sure that we enter receiver only after 2 consecutive sends. I am not able to get wanted functionality.
Please take a note that this is not language specific problem, I am just trying to understand myself how to properly use semaphores for synchronization.
Is there more general/mathematical approach to tackle this problem in general?

Related

Multiple HTTP request to the same page without consuming much CPU

Currently, I'm trying to improve a code that sends multiple HTTP requests to a webpage until it can capture some text (which the code localizes through a known pattern) or until 180 seconds runs out (the time we wait for the page to give us an answer).
This is the part of the code (a little edited for privacy purposes):
if matches == None:
txt = "No answer til now"
print(txt)
Solution = False
start = time.time()
interval = 0
while interval < 180:
response = requests.get("page address")
subject = response.text
matches = re.search(pattern, subject, re.IGNORECASE)
if matches != None:
Solution =matches.group(1)
time = "{:.2f}".format(time.time()-start)
txt = "Found an anwswer "+ Solution + "time needed : "+ time
print(txt)
break
interval = time.time()-start
else:
Solution = matches.group(1)
It runs OK, but I was told that doing "infinite requests in a loop" could cause an hight CPU usage of the server. Do you guys know of something I can use in order to avoid that?
Ps: I heard that in PHP people use curl_multi_select() for things like these. Don't know if I'm correct though.
Usually an HTTP REST API will specify in the documentation how many requests you can make in a given time period against which endpoint resources.
For a website, if you are not hitting a request limit and getting flagged/banned for too many requests, then you should be okay to continuously loop like this, but you may want to introduce a time.sleep call into your while loop.
An alternative to the 180 second timeout:
Since HTTP requests are I/O operations and can take a variable amount of time, you may want to change your exit case for the loop to a certain amount of requests (like 25 or something) and then incorporate the aforementioned sleep call.
That could look like:
# ...
if matches is None:
solution = None
num_requests = 25
start = time.time()
while num_requests:
response = requests.get("page address")
if response.ok: # It's good to attempt to handle potential HTTP/Connectivity errors
subject = response.text
matches = re.search(pattern, subject, re.IGNORECASE)
if matches:
solution = matches.group(1)
elapsed = "{:.2f}".format(time.time()-start)
txt = "Found an anwswer " + solution + "time needed : " + elapsed
print(txt)
break
else:
# Maybe raise an error here?
pass
time.sleep(2)
num_requests -= 1
else:
solution = matches.group(1)
Notes:
Regarding PHP's curl_multi_select - (NOT a PHP expert here...) it seems that this function is designed to allow you to watch multiple connections to different URLs in an asynchronous manner. Async doesn't really apply to your use case here because you are only scraping one webpage (URL), and are just waiting for some data to appear there.
If the response.text you are searching through is HTML and you aren't already using it somewhere else in your code, I would recommend Beautiful Soup or scrapy to (before regex) for searching for string patterns in webpage markup.

SimPy: Is it possible to create an Exclusive OR (OneOf) alternative to AnyOf which allows only one process to be triggered?

I would like to be able to trigger exactly one process in a list, and interrupt the other processes in the list. In the example below I want to trigger either A or B, but never both. The example below works, fine except when waitingtimeA = waitingtimeB.
import simpy
from simpy import AnyOf
def waitingprocess(env, waitingtime):
try:
yield env.timeout(waitingtime)
except:
pass
def example(env):
waitingtimeA = 1.0
waitingtimeB = 1.0
A = env.process(waitingprocess(env, waitingtimeA))
B = env.process(waitingprocess(env, waitingtimeB))
events = [A,B]
yield AnyOf(env, events)
for e in events:
if not e.processed:
e.interrupt()
print('is A Processed:', A.processed)
print('is B Processed:', B.processed)
env = simpy.Environment()
Example = env.process(example(env))
env.run()
Which should lead to either A or B being processed, but which prints:
is A Processed: True
is B Processed: True
I understand that both processes A and B are triggered and processed before the Example process continues, but is there a way around this to make processes A and B trigger mutually exclusive?
The only thing I can think of is to make a OneOf() alternative of AnyOf(), which as soon as exactly one event is triggered, interrupts all other events. For this to work the OneOf() Condition Event must be an URGENT event (an event with higher priority than normal events). However, I am kind of out of my depth. Does anyone have an idea where to start?
Thanks a lot!

Nim counting over multiple threads

I'm currently trying to count how many times I requested a website. In python I would just use a global variable but I have no idea how I would write this in nim.
import httpclient
proc threadMain(a: int) {.thread.} =
var client = newHttpClient()
while true:
try:
var r = client.getContent("URL")
echo "sent"
#Count here
except:
echo "error"
var thread: array[0..10, Thread[int]]
for i in 0..10:
thread[i].createThread(threadMain, i)
thread.joinThreads()
This is explained almost as a copy in the "Nim in Action" book, page 174.
First of all, if you used a global in Python, you had to use a lock or risk a race condition. The thing is not different in Nim: first create a global, and guard it using a lock.
import locks
var counterLock: Lock
initLock(counterLock)
var counter {.guard: counterLock.} = 0
Now use a withLock where you need to update the counter:
withLock counterLock:
counter.inc
The chapter of the book related to paralellism/concurrency is very good. You should check it, as it also explains concurrency (your code is an example of concurrency being better than threading) or how to use Channels to pass data between threads, for example.

How to stop threads in an infinte loop by passing in a True or False variable?

I have been searching for days for a solution to this issue, if anyone can point me in the right direction, I will be so grateful.
I have been writing a program that allows a user to create multiple trading bots at the same time, each bot is created as an individual thread that makes an api call to Binance for the latest market data and evaluates this data with conditional statements and place trades accordingly. The code I am using for this has been trimmed to only the essential parts for simplicity.. I have a Bot class...
class Bot(threading.Thread):
def __init__(self, symbol, time, exposure, strategy):
threading.Thread.__init__(self)
self.symbol = symbol
self.time = time
self.exposure = exposure
self.strategy = strategy
self.stop_bot = False
def scanner(self):
while self.stop_bot == False:
client = Client(self.api_key, self.secret_key, {"verify": True, "timeout": 20})
price_data = client.get_klines(symbol=self.symbol, interval=self.time_interval)
self.df = pd.DataFrame(price_data)
self.modified_df = pd.DataFrame(price_data)
time.sleep(10)
def kill(self):
self.stop_bot = True
And the Bot class is called from the bot manager terminal class...
class Bot_Manager:
def __init__(self):
self.bot_list = []
bot = object
def terminal(self):
while True:
user_input = input('(1) Create new Bot (2) Stop Bot')
if user_input == '1':
symbol = 'OMGUSDT'
time = '1D'
exposure = 1000
strategy = 'SAR'
bot_name = input('name your bot: ')
bot = Bot(symbol=symbol, time=time, exposure=exposure, strategy=strategy, bot_name=bot_name)
scanner_thread = threading.Thread(target=bot.scanner, daemon=True)
scanner_thread.name = bot_name
scanner_thread.start()
self.bot_list.append(scanner_thread)
elif user_input == '2':
for thread in threading.enumerate():
print(thread.name)
print(self.bot_list)
user_input = input('Bot to stop: ')
i = int(user_input)
print(self.bot_list[i])
Now I am able to create multiple threads / bots by repeatedly selecting option 1. However the issue I am facing is stopping these threads when a user selects option 2. If for example I create 3 bots and name them Bot A, Bot B and Bot C.. when I enumerate this in a for loop, i get the following:
MainThread
Bot A
Bot B
Bot C
and when I store each thread into a list and print the list I see the following:
[<Thread(Bot A, started 8932)>, <Thread(Bot B, started 12268)>, <Thread(Bot C, started 13436)>]
I would like the user to be able to select the thread / bot they want to stop from the list, so in this example if the user types 1, it should return the thread <Thread(Bot B, started 12268)>and stop this thread by passing the variable stop_bot = True. However I haven't had much luck with this.
When I call the function bot.kill() it only stops the last thread that was created, so for this example, Bot C. When it runs again, it doesn't remove any other thread. Is there anyway to pass in the variable stop_bot = True on an already created object / thread? Or is there another method to this that I have overlooked... Any help would be greatly appreciated....
I managed to find a solution to this by adding multiple bots to a data frame and passing this into the bot class. 1 thread was created to iterate through the data frame and by removing rows of the data frame, the bots would be removed.

Dask: Submit continuously, work on all submitted data

Having 500, continously growing DataFrames, I would like to submit operations on the (for each DataFrame indipendent) data to dask. My main question is: Can dask hold the continously submitted data, so I can submit a function on all the submitted data - not just the newly submitted?
But lets explain it on an example:
Creating a dask_server.py:
from dask.distributed import Client, LocalCluster
HOST = '127.0.0.1'
SCHEDULER_PORT = 8711
DASHBOARD_PORT = ':8710'
def run_cluster():
cluster = LocalCluster(dashboard_address=DASHBOARD_PORT, scheduler_port=SCHEDULER_PORT, n_workers=8)
print("DASK Cluster Dashboard = http://%s%s/status" % (HOST, DASHBOARD_PORT))
client = Client(cluster)
print(client)
print("Press Enter to quit ...")
input()
if __name__ == '__main__':
run_cluster()
Now I can connect from my my_stream.py and start to submit and gather data:
DASK_CLIENT_IP = '127.0.0.1'
dask_con_string = 'tcp://%s:%s' % (DASK_CLIENT_IP, DASK_CLIENT_PORT)
dask_client = Client(self.dask_con_string)
def my_dask_function(lines):
return lines['a'].mean() + lines['b'].mean
def async_stream_redis_to_d(max_chunk_size = 1000):
while 1:
# This is a redis queue, but can be any queueing/file-stream/syslog or whatever
lines = self.queue_IN.get(block=True, max_chunk_size=max_chunk_size)
futures = []
df = pd.DataFrame(data=lines, columns=['a','b','c'])
futures.append(dask_client.submit(my_dask_function, df))
result = self.dask_client.gather(futures)
print(result)
time sleep(0.1)
if __name__ == '__main__':
max_chunk_size = 1000
thread_stream_data_from_redis = threading.Thread(target=streamer.async_stream_redis_to_d, args=[max_chunk_size])
#thread_stream_data_from_redis.setDaemon(True)
thread_stream_data_from_redis.start()
# Lets go
This works as expected and it is really quick!!!
But next, I would like to actually append the lines first before the computation takes place - And wonder if this is possible? So in our example here, I would like to calculate the mean over all lines which have been submitted, not only the last submitted ones.
Questions / Approaches:
Is this cummulative calculation possible?
Bad Alternative 1: I
cache all lines locally and submit all the data to the cluster
every time a new row arrives. This is like an exponential overhead. Tried it, it works, but it is slow!
Golden Option: Python
Program 1 pushes the data. Than it would be possible to connect with
another client (from another python program) to that cummulated data
and move the analysis logic away from the inserting logic. I think Published DataSets are the way to go, but are there applicable for this high-speed appends?
Maybe related: Distributed Variables, Actors Worker
Assigning a list of futures to a published dataset seems ideal to me. This is relatively cheap (everything is metadata) and you'll be up-to-date as of a few milliseconds
client.datasets["x"] = list_of_futures
def worker_function(...):
futures = get_client().datasets["x"]
data = get_client.gather(futures)
... work with data
As you mention there are other systems like PubSub or Actors. From what you say though I suspect that Futures + Published datasets are simpler and a more pragmatic option.

Resources