How to use Release and Acquire in Lock in python multithreading? - python-3.x

I was trying to implement a program that can simultaneously change the elements of an array and print it, using multithreading in python.
from threading import Thread
import threading
import random
def print_ele(array):
count=True
while count:
print(array)
def change_ele():
array=[1,2,3,4]
t1=Thread(target=print_ele,args=(array,))
t1.start()
lock=threading.Lock()
random.seed(10)
count=True
while count:
lock.acquire()
for i in range(5):
array[i]=random.random()
lock.release()
change_ele()
I expect to get different random numbers printed in each iteration. But instead it seems that the array gets updated only once.
I know that we can do the same thing without multithreading. But I was wondering if we could do the same thing using multithreading.

Related

Multiprocessing code does not work when trying to initialize dataframe columns

I am trying to use multiprocessing module to initialize each column of a dataframe using a separate CPU core in Python 3.6 but my code doesn't work. Does anybody know the issue with this code? I appreciate your help.
My laptop has Windows 10 and its CPU is Core i7 8th Gen:
import time
import pandas as pd
import numpy as np
import multiprocessing
df=pd.DataFrame(index=range(10),columns=["A","B","C","D"])
def multiprocessing_func(col):
for i in range(0,df.shape[0]):
df.iloc[i,col]=np.random(4)
print("column "+str(col)+ " is completed" )
if __name__ == '__main__':
starttime = time.time()
processes = []
for i in range(0,df.shape[1]):
p = multiprocessing.Process(target=multiprocessing_func, args=(i,))
processes.append(p)
p.start()
for process in processes:
process.join()
print('That took {} seconds'.format(time.time() - starttime))
When you start a Process, it is basically a copy of the parent process. (I'm skipping over some details here, but they shouldn't matter for the explanation).
Unlike threads, processes don't share data. (Processes can use shared memory, but this is not automatic. To the best of my knowledge, the mechanisms in multiprocessing for sharing data cannot handle a dataframe.)
So what happens is that each of the worker processes is modifying its own copy of the dataframe, not the dataframe in the parent process.
For this to work, you'd have to send the new data back to the parent process. You could do that by e.g. return-ing it from the worker function, and then putting the returned data into the original dataframe.
It only makes sense to use multiprocessing like this if the work of generating the data takes significantly longer then launching a new worker process, sending the data back to the parent process and putting it into the dataframe. Since you are basically filling the columns with random data, I don't think that is the case here.
So I don't see why you would use multiprocessing here.
Edit: Based on your comment that it takes days to calculate each column, I would propose the following.
Use Proces like you have been doing, but have each of the worker processes save the numbers they produce in a file where the filename includes the value of i. Have the workers return a status code so you can determine that thay have succeeded or failed. In case of failure, also return some kind of index of the amount of data successfully completed, so you don't have to re-calculate that again.
The file format should be simple and preferable readable. E.g. one number per line.
Wait for all processes to finish, read the files and fill the dataframe.

Python multiprocessing: with and without pooling

I'm trying to understand Python's multiprocessing, and have devised the following code to test it:
import multiprocessing
def F(n):
if n == 0: return 0
elif n == 1: return 1
else: return F(n-1)+F(n-2)
def G(n):
print(f'Fibbonacci of {n}: {F(n)}')
processes = []
for i in range(25, 35):
processes.append(multiprocessing.Process(target=G, args=(i, )))
for pro in processes:
pro.start()
When I run it, I tells me that the computing time was roughly of 6.65s.
I then wrote the following code, which I thought to be functionally equivalent to the latter:
from multiprocessing.dummy import Pool as ThreadPool
def F(n):
if n == 0: return 0
elif n == 1: return 1
else: return F(n-1)+F(n-2)
def G(n):
print(f'Fibbonacci of {n}: {F(n)}')
in_data = [i for i in range(25, 35)]
pool = ThreadPool(10)
results = pool.map(G, in_data)
pool.close()
pool.join()
and its running time was almost 12s.
Why is it that the second takes almost twice as the first one? Aren't they supposed to be equivalent?
(NB. I'm running Python 3.6, but also tested a similar code on 3.52 with same results.)
The reason the second takes twice as long as the first is likely due to the CPython Global Interpreter Lock.
From http://python-notes.curiousefficiency.org/en/latest/python3/multicore_python.html:
[...] the GIL effectively restricts bytecode execution to a single core, thus rendering pure Python threads an ineffective tool for distributing CPU bound work across multiple cores.
As you know, multiprocessing.dummy is a wrapper around the threading module, so you're creating threads, not processes. The Global Interpreter Lock, with a CPU-bound task as here, is not much different than simply executing your Fibonacci calculations sequentially in a single thread (except that you've added some thread-management/context-switching overhead).
With the "true multiprocessing" version, you only have a single thread in each process, each of which is using its own GIL. Hence, you can actually make use of multiple processors to improve the speed.
For this particular processing task, there is no significant advantage to using multiple threads over multiple processes. If you only have a single processor, there is no advantage to using either multiple processes or multiple threads over a single thread/process (in fact, both merely add context-switching overhead to your task).
(FWIW: A join in the true multiprocessing version is apparently being done automatically by the python runtime so adding an explicit join doesn't seem to make any difference in my tests using time(1). And, by the way, if you did want to add join, you should add a second loop for the join processing. Adding join to the existing loop will simply serialize your processes.)

Python thread never starts if run() contains yield from

Python 3.4, I'm trying to make a server using the websockets module (I was previously using regular sockets but wanted to make a javascript client) when I ran into an issue (because it expects async, at least if the examples are to be trusted, which I didn't use before). Threading simply does not work. If I run the following code, bar will never be printed, whereas if I comment out the line with yield from, it works as expected. So yield is probably doing something I don't quite understand, but why is it never even executed? Should I install python 3.5?
import threading
class SampleThread(threading.Thread):
def __init__(self):
super(SampleThread, self).__init__()
print("foo")
def run(self):
print("bar")
yield from var2
thread = SampleThread()
thread.start()
This is not the correct way to handle multithreading. run is neither a generator nor a coroutine. It should be noted that the asyncio event loop is only defined for the main thread. Any call to asyncio.get_event_loop() in a new thread (without first setting it with asyncio.set_event_loop() will throw an exception.
Before looking at running the event loop in a new thread, you should first analyze to see if you really need the event loop running in its own thread. It has a built-in thread pool executor at: loop.run_in_executor(). This will take a pool from concurrent.futures (either a ThreadPoolExecutor or a ProcessPoolExecutor) and provides a non-blocking way of running processes and threads directly from the loop object. As such, these can be await-ed (with Python3.5 syntax)
That being said, if you want to run your event loop from another thread, you can do it thustly:
import asyncio
class LoopThread(threading.Thread):
def __init__(self):
self.loop = asyncio.new_event_loop()
def run():
ayncio.set_event_loop(self.loop)
self.loop.run_forever()
def stop():
self.loop.call_soon_threadsafe(self.loop.stop)
From here, you still need to device a thread-safe way of creating tasks, etc. Some of the code in this thread is usable, although I did not have a lot of success with it: python asyncio, how to create and cancel tasks from another thread

Locking a method in Python?

Here is my problem: I'm using APScheduler library to add scheduled jobs in my application. I have multiple jobs executing same code at the same time, but with different parameters. The problem occurs when these jobs access the same method at the same time which causes my program to work incorrectly.
I wanna know if there is a way to lock a method in Python 3.4 so that only one thread may access it at a time? If so, could you please post a simple example code? Thanks.
You can use a basic python locking mechanism:
from threading import Lock
lock = Lock()
...
def foo():
lock.acquire()
try:
# only one thread can execute code there
finally:
lock.release() #release lock
Or with context managment:
def foo():
with lock:
# only one thread can execute code there
For more details see Python 3 Lock Objects and Thread Synchronization Mechanisms in Python.

ReCreating threads in python

I'm using the following template to recreate threads that I need to run into infinity.
I want to know if this template is scalable in terms of memory. Are threaded destroyed properly?
import threading
import time
class aLazyThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def run(self):
time.sleep(10)
print "I don not want to work :("
class aWorkerThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def run(self):
time.sleep(1)
print "I want to work!!!!!!!"
threadA = aLazyThread()
threadA.start()
threadB = aWorkerThread()
threadB.start()
while True:
if not (threadA.isAlive()):
threadA = aLazyThread()
threadA.start()
if not (threadB.isAlive()):
threadB = aWorkerThread()
threadB.start()
The thing that bother me is the following picture taking in eclipse which show debug info, and It seems that thread are stacking it.
I see nothing wrong with the image. There's the main thread and the 2 threads that you created (according to the code, 3 threads are supposed to be running at any time)
Like any other Python objects, threads are garbage collected when they're not used; e.g. in your main while cycle, when you instantiate the class (let's say aLazyThread), the old threadA value is destroyed (maybe not exactly at that point, but shortly after)
The main while cycle, could also use a sleep (e.g. time.sleep(1)), otherwise it will consume the processor, uselessly checking if the other threads are running.

Resources