multiprocessing gives wrong number of outputs, repeats outside scope - python-3.x

When I use multiprocessing.pool.starmap it repeats the process outside the part I define as the process.
I am confused as to why.
MRA:
from multiprocessing import Pool
def countAdd( first, second, third ):
added = str( first + second + third )
with open( "test.txt", 'a' ) as f:
f.write( "this is string: " + added + '\n')
testlist = [ 1, 2, 3, 4, 5, 6, ]
testlist2 = [ 7, 8, 9, 10, 11, 12 ]
testlist3 = [ 13, 14, 15, 16, 17, 18 ]
arglist = [ (testlist[ test ], testlist2[ test ], testlist3[ test ]) for test in range( 0, len( testlist ) ) ]
print( arglist )
print( len( testlist ) )
if __name__ == '__main__':
with Pool( 3 ) as pool:
pool.starmap( countAdd, arglist )
Output:
[(1, 7, 13), (2, 8, 14), (3, 9, 15), (4, 10, 16), (5, 11, 17), (6, 12, 18)]
6
[(1, 7, 13), (2, 8, 14), (3, 9, 15), (4, 10, 16), (5, 11, 17), (6, 12, 18)]
6
[(1, 7, 13), (2, 8, 14), (3, 9, 15), (4, 10, 16), (5, 11, 17), (6, 12, 18)]
6
[(1, 7, 13), (2, 8, 14), (3, 9, 15), (4, 10, 16), (5, 11, 17), (6, 12, 18)]
6
file:
this is string: 21
this is string: 24
this is string: 27
this is string: 33
this is string: 36
So I'm really confused. Why are the print statements also printed repeatedly? But only 4 times?!
Why does the file only contain 5 times the output of the function and not 6 times, eventhough the arglist contains 6 tuples?
I searched around, found a similar probleme but isn't it.
Python Multiprocessing data output wrong
Thanks for taking the time to respond.
EDIT:
Based on the comments by MisterMiyagi I changed my code to the following, but there are still problems:
from multiprocessing import Pool
def countAdd( first, second, third ):
added = str( first + second + third )
with open( "test.txt", 'a' ) as f:
f.write( "this is string: " + added + '\n')
if __name__ == '__main__':
testlist = [ 1, 2, 3, 4, 5, 6, ]
testlist2 = [ 7, 8, 9, 10, 11, 12 ]
testlist3 = [ 13, 14, 15, 16, 17, 18 ]
arglist = [ (testlist[ test ], testlist2[ test ], testlist3[ test ]) for test in range( 0, len( testlist ) ) ]
print( arglist )
print( len( arglist ) )
with Pool( 3 ) as pool:
pool.starmap( countAdd, arglist )
print( "done")
I repeatedly ran the code. Sometimes I get the six lines of output but othertimes I don't. I ran the code 10 times and I got 54 lines of text in the output file. What is going on here?

Related

Modifying overlapping time period to include 1 day difference

I am trying to modify the overlapping time period problem so that if there is 1 day difference between dates, it should still be counted as an overlap. As long as the difference in dates is less than 2 days it should be seen as an overlap.
This is the dataframe containing the dates
df_dates = pd.DataFrame({"id": [102, 102, 102, 102, 103, 103, 104, 104, 104, 102, 104, 104, 103, 106, 106, 106],
"start dates": [pd.Timestamp(2002, 1, 1), pd.Timestamp(2002, 3, 3), pd.Timestamp(2002,10,20), pd.Timestamp(2003, 4, 4), pd.Timestamp(2003, 8, 9), pd.Timestamp(2005, 2, 8), pd.Timestamp(1993, 1, 1), pd.Timestamp(2005, 2, 3), pd.Timestamp(2005, 2, 16), pd.Timestamp(2002, 11, 16), pd.Timestamp(2005, 2, 23), pd.Timestamp(2005, 10, 11), pd.Timestamp(2015, 2, 9), pd.Timestamp(2011, 11, 24), pd.Timestamp(2011, 11, 24), pd.Timestamp(2011, 12, 21)],
"end dates": [pd.Timestamp(2002, 1, 3), pd.Timestamp(2002, 12, 3),pd.Timestamp(2002,11,20), pd.Timestamp(2003, 4, 4), pd.Timestamp(2004, 11, 1), pd.Timestamp(2015, 2, 8), pd.Timestamp(2005, 2, 3), pd.Timestamp(2005, 2, 15) , pd.Timestamp(2005, 2, 21), pd.Timestamp(2003, 2, 16), pd.Timestamp(2005, 10, 8), pd.Timestamp(2005, 10, 21), pd.Timestamp(2015, 2, 17), pd.Timestamp(2011, 12, 31), pd.Timestamp(2011, 11, 25), pd.Timestamp(2011, 12, 22)]
})
This was helpful with answering the overlap question but I am not sure how to modify it (red circle) to include 1 day difference
This was my attempt at answering the question, which kind of did (red circle), but then the overlap calculation is not always right (yellow circle)
def Dates_Restructure(df, pers_id, start_dates, end_dates):
df.sort_values([pers_id, start_dates], inplace=True)
df['overlap'] = (df.groupby(pers_id)
.apply(lambda x: (x[end_dates].shift() - x[start_dates]) < timedelta(days=-1))
.reset_index(level=0, drop=True))
df['cumsum'] = df.groupby(pers_id)['overlap'].cumsum()
return df.groupby([pers_id, 'cumsum']).aggregate({start_dates: min, end_dates: max}).reset_index()
I will appreciate your help with this. Thanks
This was the answer I came up with and it worked. I combined the 2 solutions in my question to get this solution.
def Dates_Restructure(df_dates, pers_id, start_dates, end_dates):
df2 = df_dates.copy()
startdf2 = pd.DataFrame({pers_id: df2[pers_id], 'time': df2[start_dates], 'start_end': 1})
enddf2 = pd.DataFrame({pers_id: df2[pers_id], 'time': df2[end_dates], 'start_end': -1})
mergedf2 = pd.concat([startdf2, enddf2]).sort_values([pers_id, 'time'])
mergedf2['cumsum'] = mergedf2.groupby(pers_id)['start_end'].cumsum()
mergedf2['new_start'] = mergedf2['cumsum'].eq(1) & mergedf2['start_end'].eq(1)
mergedf2['group'] = mergedf2.groupby(pers_id)['new_start'].cumsum()
df2['group_id'] = mergedf2['group'].loc[mergedf2['start_end'].eq(1)]
df3 = df2.groupby([pers_id, 'group_id']).aggregate({start_dates: min, end_dates: max}).reset_index()
df3.sort_values([pers_id, start_dates], inplace=True)
df3['overlap'] = (df3.groupby(pers_id).apply(lambda x: (x[end_dates].shift() - x[start_dates]) < timedelta(days=-1))
.reset_index(level=0, drop=True))
df3['GROUP_ID'] = df3.groupby(pers_id)['overlap'].cumsum()
return df3.groupby([pers_id, 'GROUP_ID']).aggregate({start_dates: min, end_dates: max}).reset_index()

Stacking 4-D np arrays to get 5-D np arrays

For Python 3.9 and numpy 1.21.5, I have four 4-D numpy arrays:
x = np.random.normal(loc=0.0, scale=1.0, size=(5, 5, 7, 10))
y = np.random.normal(loc=0.0, scale=1.0, size=(5, 5, 7, 10))
z = np.random.normal(loc=0.0, scale=1.0, size=(5, 5, 7, 10))
w = np.random.normal(loc=0.0, scale=1.0, size=(5, 5, 7, 10))
x.shape, y.shape, z.shape, w.shape
# ((5, 5, 7, 10), (5, 5, 7, 10), (5, 5, 7, 10), (5, 5, 7, 10))
I want to stack them to get the desired shape: (4, 5, 5, 7, 10).
The code that I have tried so far includes:
np.vstack((x, y, z, w)).shape
# (20, 5, 7, 10)
np.concatenate((x, y, z, w), axis=0).shape
# (20, 5, 7, 10)
np.concatenate((x, y, z, w)).shape
# (20, 5, 7, 10)
They seem to be doing (4 * 5, 5, 7, 10) instead of the desired shape/dimension: (4, 5, 5, 7, 10)
Help?
The following code can get the expected shape:
np.array([x, y, z, w]) # shape --> (4, 5, 5, 7, 10)

Python 3: IndexError: list index out of range while doing Knapsack Problem

I am currently self-learning python for a career change. While doing some exercises about 'list', I encountered IndexError: list index out of range.
So, I am trying to build a function, that determines which product should be placed on my store's shelves. But, I also put constraints.
The shelve has a max capacity of 200
small-sized items should be placed first
if two or more items have the same size, the item with the highest price should be placed first
As an input for the function, I have a list of tuples "dairy_items", denoted as [(id, size, price)].
This is my code:
capacity=200
dairy_items=[('p1', 10, 3), ('p2', 13, 5),
('p3', 15, 2), ('p4', 26, 2),
('p5', 18, 6), ('p6', 25, 3),
('p7', 20, 4), ('p8', 10, 5),
('p9', 15, 4), ('p10', 12, 7),
('p11', 19, 3), ('p12', 27, 6),
('p13', 16, 4), ('p14', 23, 5),
('p15', 14, 2), ('p16', 23, 5),
('p17', 12, 7), ('p18', 11, 3),
('p19', 16, 5), ('p20', 11, 4)]
def shelving(dairy_items):
#first: sort the list of tuples based on size: low-to-big
items = sorted(dairy_items, key=lambda x: x[1], reverse=False)
#second: iterate the sorted list of tuples.
#agorithm: retrieve the first 2 elements of the sorted list
#then compare those two elements by applying rules/conditions as stated
#the 'winning' element is placed to 'result' and this element is removed from 'items'. Also 'temp' list is resetted
#do again untill shelves cannot be added anymore (capacity full and do not exceeds limit)
result = []
total_price = []
temp_capacity = []
temp = items[:2]
while sum(temp_capacity) < capacity:
#add conditions: (low first) and (if size the same, highest price first)
if (temp[0][1] == temp[1][1]) and (temp[0][2] > temp[1][2]):
temp_capacity.append(temp[0][1])
result.append(temp.pop(0))
items.pop(0)
temp.clear()
temp = items[:2]
total_price.append(temp[0][2])
elif ((temp[0][1] == temp[1][1])) and (temp[0][2] < temp[1][2]):
temp_capacity.append(temp[1][1])
result.append(temp.pop())
items.pop()
temp.clear()
temp = items[:2]
total_price.append(temp[1][2])
else:
temp_capacity.append(temp[0][1])
result.append(temp.pop(0))
items.pop(0)
temp.clear()
temp = items[:2]
total_price.append(temp[0][2])
result = result.append(temp_capacity)
#return a tuple with three elements: ([list of product ID to be placed in order], total occupied capacity of shelves, total prices)
return result
c:\Users\abc\downloads\listexercise.py in <module>
----> 1 print(shelving(dairy_items))
c:\Users\abc\downloads\listexercise.py in shelving(dairy_items)
28 while sum(temp_capacity) < capacity:
29
---> 30 if (temp[0][1] == temp[1][1]) and (temp[0][2] > temp[1][2]):
31 temp_capacity.append(temp[0][1])
32 result.append(temp2.pop(0))
IndexError: list index out of range
EDIT:
This is the expected result:
#Result should be True
print(shelving(dairy_items) == (['p8', 'p1', 'p20', 'p18', 'p10', 'p17', 'p2', 'p15', 'p9', 'p3', 'p19', 'p13', 'p5', 'p11'], 192, 60))
The IndexError occured because, you had tried to append the 2nd element after popping it from temp because, after popping it out, there will be only one element in temp which can indexed with 0.
Also I noticed a few more bugs which could hinder your program from giving the correct output and rectified them.
The following code will work efficiently...
from time import time
start = time()
capacity = 200
dairy_items = [('p1', 10, 3), ('p2', 13, 5),
('p3', 15, 2), ('p4', 26, 2),
('p5', 18, 6), ('p6', 25, 3),
('p7', 20, 4), ('p8', 10, 5),
('p9', 15, 4), ('p10', 12, 7),
('p11', 19, 3), ('p12', 27, 6),
('p13', 16, 4), ('p14', 23, 5),
('p15', 14, 2), ('p16', 23, 5),
('p17', 12, 7), ('p18', 11, 3),
('p19', 16, 5), ('p20', 11, 4)]
def shelving(dairy_items):
items = sorted(dairy_items, key=lambda x: x[1])
result = ([],)
total_price, temp_capacity = 0, 0
while (temp_capacity+items[0][1]) < capacity:
temp = items[:2]
if temp[0][1] == temp[1][1]:
if temp[0][2] > temp[1][2]:
temp_capacity += temp[0][1]
result[0].append(temp[0][0])
total_price += temp[0][2]
items.pop(0)
elif temp[0][2] < temp[1][2]:
temp_capacity += temp[1][1]
result[0].append(temp[1][0])
total_price += temp[1][2]
items.pop(items.index(temp[1]))
else:
temp_capacity += temp[0][1]
result[0].append(temp[0][0])
total_price += temp[0][2]
items.pop(0)
else:
temp_capacity += temp[0][1]
result[0].append(temp[0][0])
total_price += temp[0][2]
items.pop(0)
result += (temp_capacity, total_price)
return result
a = shelving(dairy_items)
end = time()
print(a)
print(f"\nTime Taken : {end-start} secs")
Output:-
(['p8', 'p1', 'p20', 'p18', 'p10', 'p17', 'p2', 'p15', 'p9', 'p3', 'p19', 'p13', 'p5', 'p11'], 192, 60)
Time Taken : 3.123283386230469e-05 secs
Not sure what the question is, but the following information may be relevant:
IndexError occurs when a sequence subscript is out of range. What does this mean? Consider the following code:
l = [1, 2, 3]
a = l[0]
This code does two things:
Define a list of 3 integers called l
Assigns the first element of l to a variable called a
Now, if I were to do the following:
l = [1, 2, 3]
a = l[3]
I would raise an IndexError, as I'm accessing the fouth element of a three element list. Somewhere in your code, you're likely over-indexing your list. This is a good chance to learn about debugging using pdg. Throw a call to breakpoint() in your code and inspect the variables, good luck!
ok, firstly, you should debug your code, if you print temp before adding temp[1][2] to total_price you would see that the last index is what causing the error, the example is here:
capacity=200
dairy_items=[('p1', 10, 3), ('p2', 13, 5),
('p3', 15, 2), ('p4', 26, 2),
('p5', 18, 6), ('p6', 25, 3),
('p7', 20, 4), ('p8', 10, 5),
('p9', 15, 4), ('p10', 12, 7),
('p11', 19, 3), ('p12', 27, 6),
('p13', 16, 4), ('p14', 23, 5),
('p15', 14, 2), ('p16', 23, 5),
('p17', 12, 7), ('p18', 11, 3),
('p19', 16, 5), ('p20', 11, 4)]
def shelving(dairy_items):
#first: sort the list of tuples based on size: low-to-big
items = sorted(dairy_items, key=lambda x: x[1], reverse=False)
#second: iterate the sorted list of tuples.
#agorithm: retrieve the first 2 elements of the sorted list
#then compare those two elements by applying rules/conditions as stated
#the 'winning' element is placed to 'result' and this element is removed from 'items'. Also 'temp' list is resetted
#do again untill shelves cannot be added anymore (capacity full and do not exceeds limit)
result = []
total_price = []
temp_capacity = []
temp = items[:2]
while sum(temp_capacity) < capacity:
#add conditions: (low first) and (if size the same, highest price first)
if (temp[0][1] == temp[1][1]) and (temp[0][2] > temp[1][2]):
temp_capacity.append(temp[0][1])
result.append(temp.pop(0))
items.pop(0)
temp.clear()
temp = items[:2]
total_price.append(temp[0][2])
elif ((temp[0][1] == temp[1][1])) and (temp[0][2] < temp[1][2]):
temp_capacity.append(temp[1][1])
result.append(temp.pop())
items.pop()
temp.clear()
temp = items[:2]
print(temp) # -----------NEW LINE ADDED TO DEBUG YOUR CODE
total_price.append(temp[1][2])
else:
temp_capacity.append(temp[0][1])
result.append(temp.pop(0))
items.pop(0)
temp.clear()
temp = items[:2]
total_price.append(temp[0][2])
result = result.append(temp_capacity)
#return a tuple with three elements: ([list of product ID to be placed in order], total occupied capacity of shelves, total prices)
return result
shelving(dairy_items)
the result i am getting is:
[('p1', 10, 3), ('p8', 10, 5)]
[('p1', 10, 3), ('p8', 10, 5)]
[('p1', 10, 3), ('p8', 10, 5)]
[('p1', 10, 3), ('p8', 10, 5)]
[('p1', 10, 3), ('p8', 10, 5)]
[('p1', 10, 3), ('p8', 10, 5)]
[('p1', 10, 3), ('p8', 10, 5)]
[('p1', 10, 3), ('p8', 10, 5)]
[('p1', 10, 3), ('p8', 10, 5)]
[('p1', 10, 3), ('p8', 10, 5)]
[('p1', 10, 3), ('p8', 10, 5)]
[('p1', 10, 3), ('p8', 10, 5)]
[('p1', 10, 3), ('p8', 10, 5)]
[('p1', 10, 3), ('p8', 10, 5)]
[('p1', 10, 3), ('p8', 10, 5)]
[('p1', 10, 3), ('p8', 10, 5)]
[('p1', 10, 3), ('p8', 10, 5)]
[('p1', 10, 3), ('p8', 10, 5)]
[('p1', 10, 3)]
Traceback (most recent call last):
File "<string>", line 55, in <module>
File "<string>", line 44, in shelving
IndexError: list index out of range
>
as you see clearly last index [('p1', 10, 3)] has only 1 tuple, hence the IndexError

get multiple tuples from list of tuples using min function

I have a list that looks like this
mylist = [('Part1', 5, 5), ('Part2', 7, 7), ('Part3', 11, 9),
('Part4', 45, 45), ('part5', 5, 5)]
I am looking for all the tuples that has a number closest to my input
now i am using this code
result = min([x for x in mylist if x[1] >= 4 and x[2] >= 4])
The result i am getting is
('part5', 5, 5)
But i am looking for an result looking more like
[('Part1', 5, 5), ('part5', 5, 5)]
and if there are more tuples in it ( i have 2 in this example but it could be more) then i would like to get all the tuples back
the whole code
mylist = [('Part1', 5, 5), ('Part2', 7, 7), ('Part3', 11, 9), ('Part4', 45, 45), ('part5', 5, 5)]
result = min([x for x in mylist if x[1] >= 4 and x[2] >= 4])
print(result)
threshold = 4
mylist = [('Part1', 5, 5), ('Part2', 7, 7), ('Part3', 11, 9), ('Part4', 45, 45), ('part5', 5, 5)]
filtered = [x for x in mylist if x[1] >= threshold and x[2] >= threshold]
keyfunc = lambda x: x[1]
my_min = keyfunc(min(filtered, key=keyfunc))
result = [v for v in filtered if keyfunc(v)==my_min]
# [('Part1', 5, 5), ('part5', 5, 5)]

print list content without duplicate [duplicate]

This question already has answers here:
How can I print the possible combinations of a list in Python?
(3 answers)
Closed 3 years ago.
I have a list containing some contents like this:
numbers = [5,7,9,3,8]
I want to print consecutive elements starting with the first element and see output like this:
5,7
5,9
5,3
5,8
7,9
7,3
7,8
9,3
9,8
3,8
so, in the end, it will print elements without duplicating for any two elements
I have tried this
for e in numbers:
print(numbers[:])
but it gave me
[5, 7, 9, 3, 8]
[5, 7, 9, 3, 8]
[5, 7, 9, 3, 8]
[5, 7, 9, 3, 8]
[5, 7, 9, 3, 8]
so how to solve this
Thank you
You can use combinations from itertools.
>>> from itertools import combinations
>>> numbers = [5, 7, 9, 3, 8]
>>> list(combinations(numbers, 2))
[(5, 7), (5, 9), (5, 3), (5, 8), (7, 9), (7, 3), (7, 8), (9, 3), (9, 8), (3, 8)]
You can use itertools.combinations
from itertools import combinations
numbers = [5,7,9,3,8]
combos = combinations(numbers, 2)
for combo in combos:
print(combo)
(5, 7)
(5, 9)
(5, 3)
(5, 8)
(7, 9)
(7, 3)
(7, 8)
(9, 3)
(9, 8)
(3, 8)

Resources