Multiprocessing prime number check exhibits weird behavior

Multiprocessing prime number check exhibits weird behavior - python-3.x

I have the following parallelized code that checks if a number is a prime number.
import math
from multiprocessing import Pool, Manager
import time
from itertools import product
SERIAL_CHECK_CUTOFF = 21
CHECK_EVERY = 1000
FLAG_CLEAR = b'0'
FLAG_SET = b'1'
print("CHECK_EVERY", CHECK_EVERY)
def create_range(from_i, to_i, nbr_processes):
if from_i == to_i:
return from_i
else:
nbr_ranges = []
count = from_i
while(count < to_i + 1):
nbr_ranges.append(count)
count+=1
k, m = divmod(len(nbr_ranges), nbr_processes)
subranges = list((nbr_ranges[i*k+min(i, m):(i+1)*k+min(i+1, m)] for i in range(nbr_processes)))
subranges = [arr[::len(arr) - 1] if len(arr) > 1 else arr for arr in subranges]
return subranges
def check_prime_in_range(n_from_i_to_i, value):
(n, _range) = n_from_i_to_i
if len(_range) > 1:
(from_i, to_i) = _range
else:
return True
if n % 2 == 0:
return False
# check at every 1000 iterations.
# At every check, see if value.value has been set to FLAG_SET
# If so, exit the search.
# If in the search a process finds the factor, set the flag and exit the process.
# NOTE: check_every flag is suboptimal
check_every = CHECK_EVERY
for i in range(from_i, math.floor(to_i), 2):
check_every = -1
if not check_every:
if value.value == FLAG_SET:
return False
check_every = CHECK_EVERY
if n % i == 0:
value.value = FLAG_SET
return False
return True
def check_prime(n, nbr_processes, value):
# serial check to quickly check for small factors. if none are found, then a
# parallel search is started
from_i = 3
to_i = SERIAL_CHECK_CUTOFF
value.value = FLAG_CLEAR
if not check_prime_in_range((n, (from_i, to_i)), value):
print("Found small non-prime factor")
return False
# continue to check for larger factors in parallel
from_i = to_i
to_i = int(math.sqrt(n)) + 1
ranges_to_check = create_range(from_i, to_i, nbr_processes)
ranges_to_check = zip(len(ranges_to_check) * [n], ranges_to_check)
with Pool() as pool:
args = ((arg, value) for arg in product(list(ranges_to_check)))
# print(list(args)) # comment out this line and the code breaks
results = pool.map(check_prime_in_range, args)
if False in results:
return False
return True
if __name__ == "__main__":
start = time.time()
nbr_processes = 4
manager = Manager()
value = manager.Value(b'c', FLAG_CLEAR) # 1-byte character
n = 98_823_199_699_242_79
isprime = check_prime(n, nbr_processes, value)
end = time.time()
if isprime:
print(f"{n} is a prime number")
else:
print(f"{n} is not a prime number")
print(f"Duration: {end - start}s")
check_prime() finds a range of factors and tries to determine if there is a non-prime factor. Each range is sent to a process to find a non-prime factor. A multiprocessing.Manager object is used as a flag, so that if a process found a non-prime factor, it sets a flag. The flag is checked periodically. If the flag is set, all processes are terminated.
Because multiprocessing.map only accepts function with one argument, I used itertools.product to create an argument generator that contains the range and the manager object.
If I run the code as it is, I get the following error.
Traceback (most recent call last):
File "/home/briansia/projects/python/multiprocess/prime_manager.py", line 118, in <module>
isprime = check_prime(n, nbr_processes, value)
File "/home/briansia/projects/python/multiprocess/prime_manager.py", line 103, in check_prime
results = pool.map(check_prime_in_range, list(args))
File "/usr/lib/python3.10/multiprocessing/pool.py", line 367, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.10/multiprocessing/pool.py", line 774, in get
raise self._value
TypeError: check_prime_in_range() missing 1 required positional argument: 'value'
If I print the args generator above the map function, then the code runs correctly. How exactly did the print statement modify the generator such that it works with the map function?

To illustrate the problem, change the with Pool() block in your program to this:
with Pool() as pool:
args = ((arg, value) for arg in product(list(ranges_to_check)))
x = list(args)
# print(x)
results = pool.map(check_prime_in_range, x)
This will always crash whether you comment out the print statement or not.
The print statement in your code is not the issue: it is list(args), which causes the generator expression to run. Generator expressions run once and once only. After you've done list(args), the generator still exists and it's still named args, but it is now exhausted. When you run your program with the line containing list(args), you have already used up the generator; therefore you effectively pass an empty iterator to Pool.map. Your program doesn't actually work. It doesn't produce a traceback because it doesn't really do anything.
The problem with your code is this function:
def check_prime_in_range(n_from_i_to_i, value)
It takes two arguments. The first time you call it, you indeed pass two arguments:
if not check_prime_in_range((n, (from_i, to_i)), value):
But when you call it indirectly through Pool.map, it is called with only one argument. As you stated yourself, Pool.map only passes a single argument to its function. It's true that your generator has made two objects into a tuple, but that tuple is only one object and therefore it's the only argument that gets passed to check_prime_in_range. There is no second argument, as the traceback indicates.
I don't know how you want to fix the problem, but it might be a good idea to define check_prime_in_range as a function of one argument, and call it consistently. You can unpack the single argument inside the function, for example:
def check_prime_in_range(x):
n_from_i_to_i, value = x
(n, _range) = n_from_i_to_i
# etc.
Your first call would now be:
if not check_prime_in_range(((n, (from_i, to_i)), value)):
That's rather clumsy so I would consider defining a little class to hold all the variables in a single object. But that's a style issue.

Related

is there a way to track the number of times a function is called in n seconds?

Perhaps an odd question but here it goes: I am trying to write an function which
take no arguments
return true if this function has been called 3 times or fewer in the last 1 second
return false otherwise
def myfunction():
myfunction.counter += 1
myfunction.counter = 0
This above code keeps track how many times this function is called but how to modify this so it satisfy above requirements?
I know that I can use time module in python but how to use it to solve this problem?

First keep track of when the function was called with a decorator:
import time
def counted(f):
def wrapped(*args, **kwargs):
wrapped.calls.append(int(round(time.time() * 1000))) # append the ms it was called
return f(*args, **kwargs)
wrapped.calls = []
return wrapped
This decorator can be used like so:
#counted
def foo():
print(2)
time.sleep(.3)
Then have a function to group the timestamps within a certain range:
def group_by(lst, seconds):
"""
Groups a list of ms times into the {seconds}
range it was called. Most recent grouping will
be in the first element of the list.
"""
ms = 1000 * seconds
result = []
if lst:
start = lst[-1]
count = 1
for ele in reversed(lst[:-1]):
if ele > start - ms:
count += 1
else:
result.append(count)
count = 1
start = ele
result.append(count)
return result
Finally test it:
for _ in range(5):
foo()
data = foo.calls
number_of_calls_last_second = group_by(data, 1)
print(f"foo called {number_of_calls_last_second[0]} times within the last second")
print(number_of_calls_last_second[0] <= 3) # Here is your True False output
Output:
2
2
2
2
2
foo called 4 times within the last second
False

I would use a decorator like this:
import time
def call_counter(calls_number, max_time):
def _decorator(function):
def helper():
helper.calls.append(time.time())
function()
if len(helper.calls) > calls_number:
helper.calls = helper.calls[calls_number:]
return time.time() - helper.calls[0] > max_time
return True
helper.calls = []
return helper
return _decorator
#call_counter(3, 1000)
def my_function():
pass
for _ in range(3):
print(my_function()) # Prints True three times
print(my_function()) # Prints False: the function has been called four times in less than one second.
time.sleep(1)
print(my_function()) # Prints True
I used parameters in the decorator so that you can reuse it with different values. If you have any question, ask me in the comments.

iteration over a sequence with an implicit type in Python 3.6

I am trying to iterate over a sequence of numbers. I have this:
from itertools import islice, count
handle = int(input("Please enter a number:")
handler = str(handle)
parameter = []
for i in handler:
parameter.append(i)
print(parameter) #This was for debugging
revised = parameter(count(1[2])) #I'm not sure I'm using the correct syntax here, the purpose is to make revised == parameter[0] and parameter[2]
Ultimately, what I am trying to achieve is to take a sequence of numbers or two, and compare them. For instance, if i[0] == i[1] + i [2] I want to return True, or for that matter if i[0] == i[1] - i[2]. I want the program to iterate over the entire sequence, checking for these types of associations, for instance, 23156 would == true because 2*3 = 6, 2+3 = 5, 5+1 = 6, 2+3+1=6; etc. It's strictly for my own purposes, just trying to make a toy.
When I utilize
revised = parameter(count(1[2])
I am getting an error that says builtins. TYPEERROR, type int is not subscriptable but I explicitly turned the integer input into a string.

Albeit unclear, what you have attempted to describe is hard to explain. It appears to be akin to a Running Total but with restrictions and of various operations, i.e. addition, subtraction and products.
Restrictions
The first two numbers are seeds
The following numbers must accumulate by some operation
The accumulations must progress contiguously
Code
import operator as op
import itertools as it
def accumulate(vals):
"""Return a set of results from prior, observed operations."""
adds = set(it.accumulate(vals)) # i[0] == i[1] + i[2]
muls = set(it.accumulate(vals, op.mul)) # i[0] == i[1] * i[2]
subs = {-x for x in it.accumulate(vals, func=op.sub)} # i[0] == i[1] - i[2]
#print(adds, muls, subs)
return adds | muls | subs
def rolling_acc(vals):
"""Return accumulations by sweeping all contiguous, windowed values."""
seen = set()
for i, _ in enumerate(vals):
window = vals[i:]
if len(window) >= 3:
seen |= accumulate(window)
return seen
def is_operable(vals):
"""Return `True` if rolling operations on contiguous elements will be seen."""
s = str(vals)
nums = [int(x) for x in s]
ahead = nums[2:]
accums = rolling_acc(nums)
#print(ahead, accums)
return len(set(ahead) & accums) == len(ahead)
Tests
assert is_operable(23156) == True
assert is_operable(21365) == False # {2,3} non-contiguous
assert is_operable(2136) == True
assert is_operable(11125) == True

Python 3.6 Bitonic Sort with Multiprocessing library and multiple processes

I am trying to implement bitonic with the python multiprocessing library and a shared resource array that will be sorted at the end of the program.
The problem I am running into is that when I run the program, I get an prompt that asks "Your program is still running! Are you sure you want to cancel it?" and then when I click cancel N - 1 times (where N is the amount of processes I am trying to spawn) then it just hangs.
When this is run from the command line, it just outputs the unsorted array. Of course, I expect it to be sorted at the program's finish.
I've been using this resource to try and get a firm grasp on how I can mitigate my errors but I haven't had any luck, and now I am here.
ANY help would be appreciated, as I really don't have anywhere else to turn to.
I wrote this using Python 3.6 and here is the program in its entirety:
from multiprocessing import Process, Array
import sys
from random import randint
# remember to move this to separate file
def createInputFile(n):
input_file = open("input.txt","w+")
input_file.write(str(n)+ "\n")
for i in range(n):
input_file.write(str(randint(0, 1000000)) + "\n")
def main():
# createInputFile(1024) # uncomment this to create 'input.txt'
fp = open("input.txt","r") # remember to read from sys.argv
length = int(fp.readline()) # guaranteed to be power of 2 by instructor
arr = Array('i', range(length))
nums = fp.read().split()
for i in range(len(nums)):
arr[i]= int(nums[i]) # overwrite shared resource values
num_processes = 8 # remember to read from sys.argv
process_dict = dict()
change_in_bounds = len(arr)//num_processes
low_b = 0 # lower bound
upp_b = change_in_bounds # upper bound
for i in range(num_processes):
print("Process num: " + str(i)) # are all processes being generated?
process_dict[i] = Process(target=bitonic_sort, args=(True, arr[low_b:upp_b]) )
process_dict[i].start()
low_b += change_in_bounds
upp_b += change_in_bounds
for i in range(num_processes):
process_arr[i].join()
print(arr[:]) # Print our sorted array (hopefully)
def bitonic_sort(up, x):
if len(x) <= 1:
return x
else:
first = bitonic_sort(True, x[:len(x) // 2])
second = bitonic_sort(False, x[len(x) // 2:])
return bitonic_merge(up, first + second)
def bitonic_merge(up, x):
# assume input x is bitonic, and sorted list is returned
if len(x) == 1:
return x
else:
bitonic_compare(up, x)
first = bitonic_merge(up, x[:len(x) // 2])
second = bitonic_merge(up, x[len(x) // 2:])
return first + second
def bitonic_compare(up, x):
dist = len(x) // 2
for i in range(dist):
if (x[i] > x[i + dist]) == up:
x[i], x[i + dist] = x[i + dist], x[i] #swap
main()

I won't go into all the syntax errors in your code since I am sure your IDE tells you about those. The problem that you have is that you are missing an if name==main. I changed your def main() to def sort() and wrote this:
if __name__ == '__main__':
sort()
And it worked (after solving all the syntax errors)

how to multiply all numbers in a stack

Trying to multiply all the numbers in a stack, I originally thought of popping all elements into a list and then multiplying but wasn't sure how to/ if that was right.
this is my current code but I'm getting:
TypeError: 'method' object cannot be interpreted as an integer.
def multi_stack(s):
stack = Stack()
mult = 1
size = my_stack.size
for number in range(size):
tmp = my_stack.pop(size)
mult = mult * tmp
L.append(tmp)
for number in range(size):
my_stack.push(L.pop())
print(must)
I made a test case aswell
my_stack = Stack()
my_stack.push(12)
my_stack.push(2)
my_stack.push(4)
my_stack.push(40)
print(multi_stack(my_stack))
print(my_stack.size())`
this should print out :
3840
0
The Stack class I'm using
class Stack():
def __init__(self):
self.items = []
def is_empty(self):
return self.items == []
def push(self,items):
return self.items.append(items)
def pop(self):
if self.is_empty() == True:
raise IndexError("The stack is empty!")
else:
return self.items.pop()
def peek(self):
if self.is_empty() == True:
raise IndexError("The stack is empty!")
else:
return self.items[len(self.items) - 1]
def size(self):
return len(self.items)

Python lists support append() and pop() methods that allow you to replicate LIFO stack behavior. Use append() to add to the end and pop() to remove the last element.
However, the underlying data structure is still a list. You can use many things to multiply a list together. for example, assuming a non-empty list:
import functools
mylist = [i for i in range(1, 10)]
product = functools.reduce(lambda x, y: x * y, mylist)
or
mylist = [i for i in range(1, 10)]
product = mylist[0]
for j in mylist[1:]:
product *= j
EDIT: Here is an example using your Stack class:
import functools
stack = Stack()
stack.push(1)
stack.push(3)
stack.push(9)
def multi_stack(s):
"""
s: a Stack() object
"""
return functools.reduce(lambda x, y: x * y, s.items)
def multi_stack_readable(s):
"""
s: a Stack() object
"""
if s.size() > 1:
product = s.items[0]
for i in s.items[1:]:
product *= i
return product
elif s.size() == 1:
return s.items
else:
raise IndexError("the stack is empty!")
print(multi_stack(stack))
print(multi_stack_readable(stack))
Using lambda functions is sometimes considered less readable, so I included a more readable version using a for loop. Both produce the same result.

Your code doesnt work because size = my_stack.size returns a method object and not the integer you expected; you forgot to add the parentheses at the end to actually call the method. So when you tried for number in range(size):, you get an exception because you are passing a method object instead of an integer to range(). There are also a bunch of other mistakes: you didnt use the parameter passed to the function at all, instead affecting global variable my_stack (unless that was your intent); you're performing operations on some unknown variable L; you created stack at the top of your function and did nothing with it, and so on. In general, too convoluted for such a simple goal. There are more efficient ways to do this but correcting your code:
def multi_stack(s):
mult = 1
size = s.size()
for i in range(size):
tmp = s.pop()
mult = mult * tmp
return mult
This should return your expected product, though it wont empty the stack. If you want to do that, then get rid of the function parameter, and substitute s for my_stack as before.

Tree traversals python

I have to define three functions: preorder(t):, postorder(t):, and inorder(t):.
Each function will take a binary tree as input and return a list. The list should then be ordered in same way the tree elements would be visited in the respective traversal (post-order, pre-order, or in-order)
I have written a code for each of them, but I keep getting an error when I call another function (flat_list()), I get an index error thrown by
if not x or len(x) < 1 or n > len(x) or x[n] == None:
IndexError: list index out of range
The code for my traversal methods is as follows:
def postorder(t):
pass
if t != None:
postorder(t.get_left())
postorder(t.get_right())
print(t.get_data())
def pre_order(t):
if t != None:
print(t.get_data())
pre_order(t.get_left())
pre_order(t.get_right())
def in_order(t):
pass
if t != None:
in_order(t.get_left())
print(t.get_data())
in_order(t.get_right())
def flat_list2(x,n):
if not x or len(x) < 1 or n > len(x) or x[n] == None:
return None
bt = BinaryTree( x[n] )
bt.set_left( flat_list2(x, 2*n))
bt.set_right(flat_list2(x, 2*n + 1))
return bt
this is how i call flat_list2
flat_node_list = [None, 55, 24, 72, 8, 51, None, 78, None, None, 25]
bst = create_tree_from_flat_list2(flat_node_list,1)
pre_order_nodes = pre_order(bst)
in_order_nodes = in_order(bst)
post_order_nodes = post_order(bst)
print( pre_order_nodes)
print( in_order_nodes)
print( post_order_nodes)

You should actually write three function that return iterators. Let the caller decide whether a list is needed. This is most easily done with generator functions. In 3.4+, 'yield from` can by used instead of a for loop.
def in_order(t):
if t != None:
yield from in_order(t.get_left())
yield t.get_data()
yield from in_order(t.get_right())
Move the yield statement for the other two versions.

First things first, I noticed that your indentation was inconsistent in the code block that you provided (fixed in revision). It is critical that you ensure that your indentation is consistent in Python or stuff will go south really quickly. Also, in the code below, I am assuming that you wanted your t.get_data() to still fall under if t != None in your postorder(t), so I have indented as such below. And lastly, I noticed that your method names did not match the spec you listed in the question, so I have updated the method names below to be compliant with your spec (no _ in the naming).
For getting the list, all you need to do is have your traversal methods return a list, and then extend your list at each level of the traversal with the other traversed values. This is done in lieu of printing the data.
def postorder(t):
lst = []
if t != None:
lst.extend(postorder(t.get_left()))
lst.extend(postorder(t.get_right()))
lst.append(t.get_data())
return lst
def preorder(t):
lst = []
if t != None:
lst.append(t.get_data())
lst.extend(preorder(t.get_left()))
lst.extend(preorder(t.get_right()))
return lst
def inorder(t):
lst = []
if t != None:
lst.extend(inorder(t.get_left()))
lst.append(t.get_data())
lst.extend(inorder(t.get_right()))
return lst
This will traverse to the full depths both left and right on each node and, depending on if it's preorder, postorder, or inorder, will append all the traversed elements in the order that they were traversed. Once this has occurred, it will return the properly ordered list to the next level up to get appended to its list. This will recurse until you get back to the root level.
Now, the IndexError coming from your flat_list, is probably being caused by trying to access x[n] when n could be equal to len(x). Remember that lists/arrays in Python are indexed from 0, meaning that the last element of the list would be x[n-1], not x[n].
So, to fix that, replace x[n] with x[n-1]. Like so:
def flat_list2(x,n):
if not x or len(x) < 1 or n < 1 or n > len(x) or x[n-1] == None:
return None
bt = BinaryTree( x[n-1] )
bt.set_left( flat_list2(x, 2*n))
bt.set_right(flat_list2(x, 2*n + 1))
return bt

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Multiprocessing prime number check exhibits weird behavior - python-3.x

Related

is there a way to track the number of times a function is called in n seconds?

iteration over a sequence with an implicit type in Python 3.6

Python 3.6 Bitonic Sort with Multiprocessing library and multiple processes

how to multiply all numbers in a stack

Tree traversals python

Categories

Resources