How to give different names to ThreadPoolExecutor threads in Python - python-3.x

I have the below code for creating threads and running them.
from concurrent.futures import ThreadPoolExecutor
import threading
def task(n):
result = 0
i = 0
for i in range(n):
result = result + i
print("I: {}".format(result))
print(f'Thread : {threading.current_thread()} executed with variable {n}')
def main():
executor = ThreadPoolExecutor(max_workers=3)
task1 = executor.submit(task, (10))
task2 = executor.submit(task, (100))
if __name__ == '__main__':
main()
When i run the code in my windows 10 machine this is the output which gets generated:
I: 45
Thread : <Thread(ThreadPoolExecutor-0_0, started daemon 11956)> executed with variable 10
I: 4950
Thread : <Thread(ThreadPoolExecutor-0_0, started daemon 11956)> executed with variable 100
Process finished with exit code 0
As we see both the threads have the same name. How do i differentiate between them by giving them different names ? Is this somehow a feature of the concurrent.futures class ?
Many thanks for any answers.

from the docs:
New in version 3.6: The thread_name_prefix argument was added to allow users to control the threading.Thread names for worker threads created by the pool for easier debugging.
using the thread_name_prefix argument:
concurrent.futures.ThreadPoolExecutor(max_workers=None, thread_name_prefix='')

You say: "both the threads have the same name".
That's not correct! The name is the same because the same thread is used for both tasks: in fact task() exits immediately.
In order to have both threads involved, you have to add some sleep in your task() function.
Just to recap:
(1) without sleep:
from concurrent.futures import ThreadPoolExecutor
import threading
import time
def task(n):
result = 0
i = 0
for i in range(n): result = result + i
print(f'{threading.current_thread().name} with variable {n}: {result}')
executor = ThreadPoolExecutor(max_workers=3)
executor.submit(task, (10))
executor.submit(task, (100))
In this case the output will be:
ThreadPoolExecutor-0_0 with variable 10: 45
ThreadPoolExecutor-0_0 with variable 100: 4950
(2) with a sleep inside task(), to make the function longer in time:
from concurrent.futures import ThreadPoolExecutor
import threading
import time
def task(n):
result = 0
i = 0
for i in range(n): result = result + i
time.sleep(.5)
print(f'{threading.current_thread().name} with variable {n}: {result}')
executor = ThreadPoolExecutor(max_workers=3)
executor.submit(task, (10))
executor.submit(task, (100))
In this case the output will be:
ThreadPoolExecutor-0_0 with variable 10: 45
ThreadPoolExecutor-0_1 with variable 100: 4950

Related

python3 - Main thread kills child thread after some timeout?

I'm not sure it is doable with thread in python. Basically, I have a function which invokes GDAL library to open an image file. But this can be stuck, so, after 10 seconds, if the file cannot be opened, then it should raise an exception.
import threading
import osgeo.gdal as gdal
def test(filepath):
# After 10 seconds, if the filepath cannot be opened, this function must end and throw exception.
# If the filepath can be opened before 10 seconds, then it return dataset
dataset = gdal.Open(filepath)
return dataset
filepath="http://.../test.tif"
t = threading.Thread(target = test, args = [filepath])
t.start()
# is there something called t.timeout(10)
# and if this thread cannot be finished in 10 seconds, it raises a RuntimeException?
t.join()
I ended up using multiprocessing and Queue from multiprocessing to achieve what I wanted:
import multiprocessing
import time
from multiprocessing import Queue
q = Queue()
TIME_OUT = 5
def worker(x, queue):
time.sleep(15)
a = (1, 5, 6, 7)
queue.put(a)
queue = Queue()
process = multiprocessing.Process(target=worker, args=(5, queue,))
process.start()
# main thread waits child process after TIME_OUT
process.join(TIME_OUT)
if process.is_alive():
print("Process hangs!")
process.terminate()
print("Process finished")
print(queue.qsize())
if queue.qsize() > 0:
a, b, _, d = queue.get()
print(a, b, d)

Python multiprocessing a child function

I have trying to learn multi processing.
I have a simple function which generates a list of numbers and I am trying to use multiprocessing to add the numbers if it is divisible by 10.
My objective is to run the child function in parallel with available cpu.
import multiprocessing
import time
def add_multiple_of_10_v0(number):
number_list = []
for i in range(1, number):
x = i**3 + i**2 + i + 1
number_list.append(x)
print(number_list)
pool = multiprocessing.Pool(6)
result = 0
for value in pool.map(check_multiple_10_v0, number_list):
if value > 0:
result = result + value
else:
pass
pool.close()
pool.join()
return result
def check_multiple_10_v0(in_number):
if in_number % 10 == 0:
time.sleep(5)
return in_number
else:
return -1
I am getting the below error -
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Am new to python and multiprocessing and would appreciate guidance.

How would I use Pythons thread class to sync operations across threads

I was just reading through the threading API in Python 3 and I don't see a method of syncing threads. In my case I'm creating performance testing in a Server and Client environment so I might want to use the thread class call this test with N users:
def tDoTest( ):
doSync() # Wait for all thread creation
doLogin()
doSync() # Wait for all users to login
startTest()
create100Rows()
endTest()
doSync() # Wait for completion
doLogout()
I was hoping there was a built in way to handle this that I missed.
You could use a blocking queue.
import queue
q = queue.Queue()
If you want one thread to wait for all members of a group of three other threads to do some task, the waiting thread could get() three tokens from the queue:
for i in range(3):
q.get()
Each of the three awaited threads could signify that it has completed its task by putting an informationless token into the queue:
q.put(())
The waiting thread will not exit its for loop until it has collected all three tokens.
You could wrap it up in a class:
import queue
import threading
import time
class CountdownLatch:
def __init__(self, N):
self.N = N
self.q = queue.Queue()
def check_in(self):
self.q.put(())
def await_all_checkins(self):
for i in range(self.N):
self.q.get()
def demo_CountDownLatch():
cdl = CountdownLatch(3)
def task(cdl,delay):
time.sleep(delay)
print(delay)
cdl.check_in()
threading.Thread(target=task, args=(cdl,2.0)).start()
threading.Thread(target=task, args=(cdl,1.5)).start()
threading.Thread(target=task, args=(cdl,1.0)).start()
cdl.await_all_checkins()
print("nighty night.")
if __name__ == "__main__":
demo_CountDownLatch()
OK after a bunch of searching this is the answer. Never expected this functionality to be named Barrier(). The is a built in function for it.
b = Barrier(2, timeout=5)
def doSync():
ct = threading.currentThread()
b.wait( )
print('Sync done: ' + ct.getName())
The output looks like as expected now:
0.75
1.0
Sync done: 1
Sync done: 2
0.75
1.0
Sync done: 1
Sync done: 2
0.75
1.0
Sync done: 1
Sync done: 2

Asyncio big list of Task with sequential combine run_in_executor and standard Coroutine in each

I need to handle list of 2500 ip-addresses from csv file. So I need to create_task from coroutine 2500 times. Inside every coroutine firstly I need to fast-check access of IP:PORT via python module "socket" and it is a synchronous function want to be in loop.run_in_executor(). Secondly if IP:PORT is opened I need to connect to this socket via asyncssh.connect() for doing some bash commands and this is standart asyncio coroutine. Then I need to collect results of this bash commands to another csv file.
Additionaly there is an issue in Linux: system can not open more than 1024 connections at same time. I think it may be solved by making list of lists[1000] with asyncio.sleep(1) between or something like that.
I expected my tasks will be executed by 1000 in 1 second but it only 20 in 1 sec. Why?
Little working code snippet with comments here:
#!/usr/bin/env python3
import asyncio
import csv
import time
from pathlib import Path
import asyncssh
import socket
from concurrent.futures import ThreadPoolExecutor as Executor
PARALLEL_SESSIONS_COUNT = 1000
LEASES_ALL = Path("ip_list.csv")
PORT = 22
TIMEOUT = 1
USER = "testuser1"
PASSWORD = "123"
def is_open(ip,port,timeout):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(timeout)
try:
s.connect((ip, int(port)))
s.shutdown(socket.SHUT_RDWR)
return {"result": True, "error": "NoErr"}
except Exception as ex:
return {"result": False, "error": str(ex)}
finally:
s.close()
def get_leases_list():
# Minimal csv content:
# header must contain "IPAddress"
# every other line is concrete ip-address.
result = []
with open(LEASES_ALL, newline="") as csvfile_1:
reader_1 = csv.DictReader(csvfile_1)
result = list(reader_1)
return result
def split_list(some_list, sublist_count):
result = []
while len(some_list) > sublist_count:
result.append(some_list[:sublist_count])
some_list = some_list[sublist_count:]
result.append(some_list)
return result
async def do_single_host(one_lease_dict): # Function for each Task
# Firstly
IP = one_lease_dict["IPAddress"]
loop = asyncio.get_event_loop()
socket_check = await loop.run_in_executor(None, is_open, IP, PORT, TIMEOUT)
print(socket_check, IP)
# Secondly
if socket_check["result"] == True:
async with asyncssh.connect(host=IP, port=PORT, username=USER, password=PASSWORD, known_hosts=None) as conn:
result = await conn.run("uname -r", check=True)
print(result.stdout, end="") # Just print without write in file at this point.
def aio_root():
leases_list = get_leases_list()
list_of_lists = split_list(leases_list, PARALLEL_SESSIONS_COUNT)
r = []
loop = asyncio.get_event_loop()
for i in list_of_lists:
for j in i:
task = loop.create_task(do_single_host(j))
r.append(task)
group = asyncio.wait(r)
loop.run_until_complete(group) # At this line execute only by 20 in 1sec. Can't understand why :(
loop.close()
def main():
aio_root()
if __name__ == '__main__':
main()
loop.run_in_exectutor signature:
awaitable loop.run_in_executor(executor, func, *args)ΒΆ
The default ThreadPoolExecutor is used if executor is None.
ThreadPoolExecutor document:
Changed in version 3.5: If max_workers is None or not given, it will default to the number of processors on the machine, multiplied by 5, assuming that ThreadPoolExecutor is often used to overlap I/O instead of CPU work and the number of workers should be higher than the number of workers for ProcessPoolExecutor.
Changed in version 3.8: Default value of max_workers is changed to min(32, os.cpu_count() + 4). This default value preserves at least 5 workers for I/O bound tasks. It utilizes at most 32 CPU cores for CPU bound tasks which release the GIL. And it avoids using very large resources implicitly on many-core machines.

Muti-core parallel computing over a for loop in python-3.x

I have a simple for loop which is to print a number from 1 to 9999 with 5 seconds sleep in between. The code is as below:
import time
def run():
length = 10000
for i in range(1, length):
print(i)
time.sleep(5)
run()
I want to apply multiprocessing to run the for loop concurrently with multi-cores. So I amended the code above to take 5 cores:
import multiprocessing as mp
import time
def run():
length = 10000
for i in range(1, length):
print(i)
time.sleep(5)
if __name__ == '__main__':
p = mp.Pool(5)
p.map(run())
p.close()
There is no issue in running the job but it seems like it is not running in parallel with 5 cores. How could I get the code worked as expected?
First, you are running the same 1..9999 loop 5 times, and second, you are executing the run function instead of passing it to the .map() method.
You must prepare your queue before passing it to the Pool instance so that all 5 workers process the same queue:
import multiprocessing as mp
import time
def run(i):
print(i)
time.sleep(5)
if __name__ == '__main__':
length = 10000
queue = range(1, length)
p = mp.Pool(5)
p.map(run, queue)
p.close()
Note that it will process the numbers out of order as explained in the documentation. For example, worker #1 will process 1..500, worker #2 will process 501..1000 etc:
This method chops the iterable into a number of chunks which it submits to the process pool as separate tasks. The (approximate) size of these chunks can be specified by setting chunksize to a positive integer.
If you want to process the numbers more similarly to the single threaded version, set chunksize to 1:
p.map(run, queue, 1)

Resources