How to get different answers from different threads? - multithreading

To get to know threading concept better I tried to use threads in a simple program. I want to call a function 3 times which does random selection.
def func(arg):
lst = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
num = random.choice(lst)
arg.append(num)
return arg
def search(arg):
a = func(arg)
a = func(a)
threads_list = []
que = queue.Queue()
for i in range(3):
t = threading.Thread(target=lambda q, arg1: q.put(func(arg1)), args=(que, a))
t.start()
threads_list.append(t)
for t in threads_list:
t.join()
while not que.empty():
result = que.get()
print (result)
if __name__ == '__main__':
lst = []
search(lst)
As you can see In the third part, I used threads but I expected to get different lists ( different for the third part).
but all the threads return the same answer.
Can anyone help me to get different lists from different threads?
I think I misunderstood the concept of multiprocessing and multithreading.

Possibly, the pseudo-random number generator which random.choice is using is using three instances - one for each thread - and in the absence of a unique seed will produce the same pseudo-random sequence. Since no seed is provided, it may be using the system time which, depending on the precision, may be the same for all three threads.
You might try seeding the PRNG with something that will differ from thread to thread, inside the thread that invokes the PRNG. This should cause the three threads to use different seeds and give you different pseudo-random sequences.

Related

Sieve of Eratosthenes with nested for loops

This is a very simple question about python. I was trying to get a list of prime numbers so I tried
primes = [2]
for i in primes:
for j in range(50):
if j%i != 0:
primes.append(j)
print(primes)
But it just goes into an indefinite loop.
You are looping over a list and appending to it that will make that infinite process as the iteration just takes the index but you can put an end by defining the limit value
There are lots of methods to achieve that, but i would suggest to just break if the number is bigger than the limit:
primes = [2]
limit = 40
for i in primes:
if primes[-1]>limit:break
for j in range(50):
if j%i != 0:
primes.append(j)
print(primes)
Before that, you should know what you are doing, an advice is to first do your calculations on paper and then code them.
Your code provides different results as they are not prime, first come up with an algorithm or search online.
using the provided algorithm you should be able to use the pseudocode to come up with one on your own, here is an example:
from math import isqrt
limit = 40
primes = [True]*limit
primes[0] = False
primes[1] = False
for i in range(2,isqrt(limit-1)+1):
if primes[i]:
for j in range(i*i,limit,i):
primes[j] = False
from numpy import where
print(list(where(primes)[0]))
# >>> [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]

Can I use side_effect in my mocking to provide an indefinite number of values?

So I can use an iterable with side_effect in python mock to produce changing values returned by my calls to the mock:
some_mock.side_effect = [1, 2, 3]
return_value provides the same value every time
some_mock.return_value = 8
Is there a way I can use one or both of these methods so that a mock produces some scheduled values to begin and then an infinite response of one particular value when the first set is exhausted? i.e.:
[1, 2, 3, 8, 8, 8, 8, 8, 8, etc. etc etc.]
There is no specific build-in feature that does that, but you can achieve this by adding a side effect that does this.
In the cases I can think of, it would be sufficient to just add some highest needed number of values instead of an infinite number, and use the side_effect version that takes a list:
side_effect = [1, 2, 3] + [8] * 100
my_mock.side_effect = side_effect
If you really need that infinite number of responses, you can use the other version of side_effect instead that uses a function object instead of a list. You need some generator function that creates your infinite list, and iterate over that in your function, remembering the current position. Here is an example implementation for that (using a class to avoid global variables):
from itertools import repeat
class SideEffect:
def __init__(self):
self.it = self.generator() # holds the current iterator
#staticmethod
def generator():
yield from range(1, 4) # yields 1, 2, 3
yield from repeat(8) # yields 8 infinitely
def side_effect(self, *args, **kwargs):
return next(self.it)
...
my_mock.side_effect = SideEffect().side_effect
This should have the wanted effect.

generating a list of arrays using multiprocessing in python

I am having difficulty implementing parallelisation for generating a list of arrays. In this case, each array is generated independently, and then appended to a list. Somehow multiprocessing.apply_asynch() is outputting an empty array when I feed it with complicated arguments.
More specifically, just to give the context, I am attempting implement a machine learning algorithm using parallelisation . The idea is the following: I have an 'system', and an 'agent' which performs actions on the system. To teach the agent (in this case a neural net) how to behave optimally (with respect to a certain reward scheme that I have omitted here), the agent needs to generate trajectories of the system by applying actions on it. From the obtained reward obtained upon performing the actions, the agent then learns what to do and what not to do. Note importantly that the possible actions in the code are referred to as integers with:
possible_actions = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
So here I am attempting to generate many such trajectories using multiprocessing (sorry the code is not runnable here as it requires many other files, but I'm hoping somebody can spot the issue):
from quantum_simulator_EC import system
from reinforce_keras_EC import Agent
import multiprocessing as mp
s = system(1200, N=3)
s.set_initial_state([0,0,1])
agent = Agent(alpha=0.0003, gamma=0.95, n_actions=len( s.actions ))
def get_result(result):
global action_batch
action_batch.append(result)
def generate_trajectory(s, agent):
sequence_of_actions = []
for k in range( 5 ):
net_input = s.generate_net_input_FULL(6)
action = agent.choose_action( net_input )
sequence_of_actions.append(action)
return sequence_of_actions
action_batch = []
pool = mp.Pool(2)
for i in range(0, batch_size):
pool.apply_async(generate_trajectory, args=(s,agent), callback=get_result)
pool.close()
pool.join()
print(action_batch)
The problem is the code returns an empty array []. Can somebody explain to me what the issue is? Are there restrictions on the kind of arguments that I can pass to apply_asynch? In this example I am passing my system 's' and my 'agent', both complicated objects. I am mentioning this because when I test my code with simple arguments like integers or matrices, instead of agent and system, it works fine. If there is no obvious reason why it's not working, if somebody has some tips to debug the code that would also be helpful.
Note that there is no problem if I do not use multiprocessing by replacing the last part by:
action_batch = []
for i in range(0, batch_size):
get_result( generate_sequence(s,agent) )
print(action_batch)
And in this case, the output here is as expected, a list of sequences of 5 actions:
[[4, 2, 1, 1, 7], [8, 2, 2, 12, 1], [8, 1, 9, 11, 9], [7, 10, 6, 1, 0]]
The final results can directly be appended to a list in the main process, no need to create a callback function. Then you can close and join the pool, and finally retrieve all the results using get.
See the following two examples, using apply_async and starmap_async, (see this post for the difference).
Solution apply
import multiprocessing as mp
import time
def func(s, agent):
print(f"Working on task {agent}")
time.sleep(0.1) # some task
return (s, s, s)
if __name__ == '__main__':
agent = "My awesome agent"
with mp.Pool(2) as pool:
results = []
for s in range(5):
results.append(pool.apply_async(func, args=(s, agent)))
pool.close()
pool.join()
print([result.get() for result in results])
Solution starmap
import multiprocessing as mp
import time
def func(s, agent):
print(f"Working on task {agent}")
time.sleep(0.1) # some task
return (s, s, s)
if __name__ == '__main__':
agent = "My awesome agent"
with mp.Pool(2) as pool:
result = pool.starmap_async(func, [(s, agent) for s in range(5)])
pool.close()
pool.join()
print(result.get())
Output
Working on task My awesome agent
Working on task My awesome agent
Working on task My awesome agent
Working on task My awesome agent
Working on task My awesome agent
[(0, 0, 0), (1, 1, 1), (2, 2, 2), (3, 3, 3), (4, 4, 4)]

How to use Logging with multiprocessing in Python3

I am trying to use python built in logging with multiprocessing.
Goal -- is to have errors logged to a file called "error.log"
Issue -- The errors are printed in the console instead of the log file. see code below
import concurrent.futures
from itertools import repeat
import logging
def data_logging():
error_logger = logging.getLogger("error.log")
error_logger.setLevel(logging.ERROR)
formatter = logging.Formatter('%(asctime)-12s %(levelname)-8s %(message)s')
file_handler = logging.FileHandler('error.log')
file_handler.setLevel(logging.ERROR)
file_handler.setFormatter(formatter)
error_logger.addHandler(file_handler)
return error_logger
def check_number(error_logger, key):
if key == 1:
print ("yes")
else:
error_logger.error(f"{key} is not = 1")
def main():
key_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 4, 5, 4, 3, 4, 5, 4, 3, 4, 5, 4, 3, 4, 3]
error_logger = data_logging()
with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
executor.map(check_number, repeat(error_logger), key_list)
if __name__ == '__main__':
main()
function check_number checks if numbers in list key_list is 1 or not
if key = 1, prints yes to the console, if not i would like the program to add {key} is not = 1 to the log file.
instead with the code above it prints it to the console. please help if u can. this is a mini example to my program so don't change the logic
Be able to pass a logger instance to the child processes, you must have been using python 3.7+. So here is a little about how things work.
The basic
Only serializable objects can be passed to the child process, or in the other way of speaking, pickleable. This includes all primitive types, such as int, float, str. Why? because python knows how to reconstruct (or unpickle) them back to an object in the child process.
For any other complex class instance, it is unpickable because the lack of information about the class to reconstruct its instance from serialized bytes.
So if we provide the class information, our instance can be unpickable, right?
To a certain degree, yes. By calling a ClassName(*parameters) it certainly can reconstruct the instance from scratch. So, what if you have modified your instance before it gets pickled, like adding some attributes that not in the __init__ method, such as error_logger.addHandler(file_handler) ? The pickle module is not that smart to know every other thing that you added to your instance afterward.
The why
Then, how does python 3.7+ can pickle a Logger instance? It doesn't do anything much. It just saves the logger's name which is a pure str. Next, to unpickle, it just calls getLogger(name) to reconstruct the instance. So now you understand your first complicated problem: the logger that the child process reconstructs, is a default logger without any handler attached to it and a default level of WARNING.
The how
Long story short: use logger-tt. It supports multiprocessing out of the box.
import concurrent.futures
from itertools import repeat
from logger_tt import setup_logging, logger
setup_logging(use_multiprocessing=True)
def check_number(key):
if key == 1:
print ("yes")
else:
logger.error(f"{key} is not = 1")
def main():
key_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 4, 5, 4, 3, 4, 5, 4, 3, 4, 5, 4, 3, 4, 3]
error_logger = data_logging()
with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
executor.map(check_number, key_list)
if __name__ == '__main__':
main()
If you want to start from the beginning, there are some more problems that you need to solve:
interprocess communication: multiprocessing.Queue or socket
logging using QueueHandler and QueueListener
offload the logging to a different thread or child process
The above things are needed to avoid duplicating log entries or missing a part of the log or no log at all.

Optimize brute force with permutations

EXPLAINING WHAT THE SCRIPT DOES
I made a python script where the goal is to balance marbles on a circular board. Marble 1 weighs 1 Unit, 2 weighs 2 Units, and so on. The goal is to find the best order so it's a balanced as possible.
PROBLEM
I also made a method that tries all the possibilities with permutations. I get a memory error if I try with more than 10 marbles (3628800 possibilites).
Is there any way to optimize the code with either multi threading/ multiprocessing, maybe a better way than permutations?
CODE
# balance_game.py
# A program used to your skills in balancing marbles on a board
from itertools import permutations
from math import cos, radians, pow, sin, sqrt
from time import time
# Checks if marbles will balance on a circular board
# Marble 1 weighs 1 unit, 2 weighs 2 units, and so on
def test_your_might(NUMBER_OF_MARBLES, marbles):
angle = 360 / NUMBER_OF_MARBLES
angles = [angle * n for n in range(1, NUMBER_OF_MARBLES + 1)]
X = []
Y = []
Fx = []
Fy = []
i = 0
for n in range(0, NUMBER_OF_MARBLES):
angle = radians(angles[i])
X.append(cos(angle))
Y.append(sin(angle))
i += 1
for n in range(0, NUMBER_OF_MARBLES):
Fx.append(X[n] * marbles[n])
for n in range(0, NUMBER_OF_MARBLES):
Fy.append(Y[n] * marbles[n])
return sqrt(pow(sum(Fx), 2) + pow(sum(Fy), 2))
def brute_force_solution(NUMBER_OF_MARBLES):
possibilities = permutations([x for x in range(1, NUMBER_OF_MARBLES + 1)])
solutions = {}
for possibility in possibilities:
possibility = list(possibility)
solution = test_your_might(NUMBER_OF_MARBLES, possibility)
solutions[str(possibility)] = solution
return solutions
# print(test_your_might(5, [5, 1, 4, 3, 2]))
t0 = time()
solutions = brute_force_solution(10)
t1 = time()
best_order = min(solutions, key=solutions.get)
lowest_score = solutions[best_order]
print(f"Lowest score: {lowest_score}\nOrder: {best_order}")
print(f"It took {t1-t0} seconds to find the best possibility")
print(f"There were {len(solutions)} possibilities")
FYI
The method is brute_force_solution
Since the bottleneck is CPU usage, multithreading won't do a lot to help here, but multiprocessing should. Not an expert but have been experimenting with parallelism recently so will have a play around and update this answer if I get anywhere. (EDIT: I have tried a number of attempts at using multiprocessing but I've only succeeded in increasing the run time!)
It might be that you need to store all solutions, but if not, one small optimisation in terms of time, but huge in terms of memory, would be to not store all the possible results and just store the best result, so you're not creating another very long array needlessly. Ideally you could calculate number of solutions directly since it only depends on NUMBER_OF_MARBLES but have included it in the function to be consistent.
def brute_force_solution(NUMBER_OF_MARBLES):
possibilities = permutations([x for x in range(1, NUMBER_OF_MARBLES + 1)])
# count the number of solutions
number_of_solutions = 0
best_solution_so_far = None
for possibility in possibilities:
number_of_solutions += 1
possibility = list(possibility)
solution = test_your_might(NUMBER_OF_MARBLES, possibility)
# If this solution is the best so far, record the score and configuration of marbles.
if (best_solution_so_far is None) or (solution < best_solution_so_far[1]):
best_solution_so_far = (str(possibility), solution)
# Return the best solution and total number of solutions tested.
return (best_solution_so_far, number_of_solutions)
t0 = time()
one_solution = brute_force_solution(11)
t1 = time()
best_order = one_solution[0][0]
best_score = one_solution[0][1]
number_of_solutions = one_solution[1]
It took a while but it ran with 11 marbles:
>>>Lowest score: 0.00021084993450850984
>>>Order: [10, 7, 3, 4, 11, 1, 8, 9, 5, 2, 6]
>>>It took 445.57227993011475 seconds to find the best possibility
>>>There were 39916800 possibilities
and was marginally quicker when run for 10 (and note that you aren't including the sorting of your results in your timing which is not needed with this new method and adds almost another second to your time to get the best solution):
Old
Lowest score: 1.608181078507726e-17
Order: [1, 7, 3, 10, 4, 6, 2, 8, 5, 9]
It took 43.81806421279907 seconds to find the best possibility
There were 3628800 possibilities
New
Lowest score: 1.608181078507726e-17
Order: [1, 7, 3, 10, 4, 6, 2, 8, 5, 9]
It took 37.06034016609192 seconds to find the best possibility
There were 3628800 possibilities

Resources