Trying to write an optimization code using pulp.
From the given dataset i want to 5 items which in sum maximize the value whereas having constraints as 2 items having color blue, 2 items having color yellow and a random item
But instead by using the attached code i am getting only 3 items, Please refer output section
Please suggest the changes needs to be done to the existing code
import pandas as pd
import pulp
import re
import sys
sys.setrecursionlimit(10000)
data = [['A', 'blue', 'circle', 0.454],
['B', 'yellow', 'square', 0.570],
['C', 'red', 'triangle', 0.789],
['D', 'red', 'circle', 0.718],
['E', 'red', 'square', 0.828],
['F', 'orange', 'square', 0.709],
['G', 'blue', 'circle', 0.696],
['H', 'orange', 'square', 0.285],
['I', 'orange', 'square', 0.698],
['J', 'orange', 'triangle', 0.861],
['K', 'blue', 'triangle', 0.658],
['L', 'yellow', 'circle', 0.819],
['M', 'blue', 'square', 0.352],
['N', 'orange', 'circle', 0.883],
['O', 'yellow', 'triangle', 0.755]]
df = pd.DataFrame(data, columns = ['item', 'color', 'shape', 'value'])
BlueMatch = lambda x: 1 if x=='blue' else 0
YellowMatch = lambda x: 1 if x=='yellow' else 0
RedMatch = lambda x: 1 if x=='red' else 0
OrangeMatch = lambda x: 1 if x=='orange' else 0
df['color'] = df['color'].astype(str)
df['isBlue'] = df.color.apply(BlueMatch)
df['isYellow'] = df.color.apply(YellowMatch)
df['isRed'] = df.color.apply(RedMatch)
df['isOrange'] = df.color.apply(OrangeMatch)
prob = pulp.LpProblem("complex_napsack", pulp.LpMaximize)
x = pulp.LpVariable.dicts( "x", indexs = df.index, lowBound=0, cat='Integer')
prob += pulp.lpSum([x[i]*df.value[i] for i in df.index ])
prob += pulp.lpSum([x[i]*df.isBlue[i] for i in df.index])==2
prob += pulp.lpSum([x[i]*df.isYellow[i] for i in df.index])==2
prob += pulp.lpSum([x[i] for i in df.index ])==10
prob.solve()
for v in prob.variables():
if v.varValue != 0.0:
mystring = re.search('([0-9]*$)', v.name)
print(v.name, "=", v.varValue)
ind = int(mystring.group(1))
print(df.item[ind])
output:
x_11 = 2.0
L
x_13 = 6.0
N
x_6 = 2.0
G
You just need to declare your variables as Binary instead of Integer, like so:
x = pulp.LpVariable.dicts("x", indexs=df.index, cat=pulp.LpBinary)
Related
I have the following data frame and list values
import pandas as pd
import numpy as np
df_merge = pd.DataFrame({'column1': ['a', 'c', 'e'],
'column2': ['b', 'd', 'f'],
'column3': [0.5, 0.6, .04],
'column4': [0.7, 0.8, 0.9]
})
bb = ['b','h']
dd = ['d', 'I']
ff = ['f', 'l']
I am trying to use np.where and np.select to instead of IF FUNCTION:
condition = [((df_merge['column1'] == 'a') & (df_merge['column2'] == df_merge['column2'].isin(bb))),((df_merge['column1'] == 'c') & (df_merge['column2'] == df_merge['column2'].isin(dd))), ((df_merge['column1'] == 'e') & (df_merge['column2'] == df_merge['column2'].
isin(ff)))]
choices1 = [((np.where(df_merge['column3'] >= 1, 'should not have, ','correct')) & (np.where(df_merge['column4'] >= 0.45, 'should not have, ','correct')))]
df_merge['Reason'] = np.select(condition, choices1, default='correct')
However, when i try to run the code line of choices1, i get the following error:
TypeError: ufunc 'bitwise_and' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Im am not sure if we can use np.where in choices as mentioned above.
np.where should be applied for both columns. Expected output as below:
df_merge = pd.DataFrame({'column1': ['a', 'c', 'e'],
'column2': ['b', 'd', 'f'],
'column3': [0.5, 0.6, .04],
'column4': [0.7, 0.8, 0.9],
'Reason': ['correct, should not have', 'correct, should not have', 'correct, should not have'],
})
Any help / guidance / alternative is much appreciated.
First length of condition list has to be same like choices1, so last condition is commented (removed) for length 2.
Then if compare by isin output is condition (mask), so compare with column has no sense.
Last problem was need list of length 2, so replaced & to , and removed parantheses in choices1 list for avoid tuples:
condition = [(df_merge['column1'] == 'a') & df_merge['column2'].isin(bb),
(df_merge['column1'] == 'c') & df_merge['column2'].isin(dd)
# (df_merge['column1'] == 'e') & df_merge['column2'].isin(ff),
]
choices1 = [np.where(df_merge['column3'] >= 1, 'should not have','correct'),
np.where(df_merge['column4'] >= 0.45, 'should not have','correct')]
df_merge['Reason'] = np.select(condition, choices1, default='correct')
print (df_merge)
column1 column2 column3 column4 Reason
0 a b 0.50 0.7 correct
1 c d 0.60 0.8 should not have
2 e f 0.04 0.9 correct
I have a dataframe as given below
data = {
'Code': ['P', 'J', 'M', 'Y', 'P', 'Z', 'P', 'P', 'J', 'P', 'J', 'M', 'P', 'Z', 'Y', 'M', 'Z', 'J', 'J'],
'Value': [10, 10, 20, 30, 10, 40, 50, 10, 10, 20, 10, 50, 60, 40, 30, 20, 40, 20, 10]
}
example = pd.DataFrame(data)
Using Python 3, I want to create another dataframe from the dataframe example such that the Code associated with the greater number of Value is obtained.
The new dataframe should look like solution below
output = {'Code': ['J', 'M', 'Y', 'Z', 'P', 'M'],'Value': [10, 20, 30, 40, 50, 50]}
solution = pd.DataFrame(output)
As can be seen, J has more association to Value 10 than other Code so J is selected, and so on.
You could define a function that returns the most occurring items and apply it to the grouped elements. Finally explode to list to rows.
>>> def most_occurring(grp):
... res = Counter(grp)
... highest = max(res.values())
... return [k for k, v in res.items() if v == highest]
...
>>> example.groupby('Value')['Code'].apply(lambda x: most_occurring(x)).explode().reset_index()
Value Code
0 10 J
1 20 M
2 30 Y
3 40 Z
4 50 P
5 50 M
6 60 P
If I understood correctly, you need something like this:
grouped = example.groupby(['Code', 'Value']).indices
arr_tmp = []
[arr_tmp.append([i[0], i[1], len(grouped[i])]) for i in grouped]#['Int64Index'])
output = pd.DataFrame(data=arr_tmp, columns=['Code', 'Value', 'index_count'])
output = output.sort_values(by=['index_count'], ascending=False)
output.reset_index(inplace=True)
output
How do i choose to pick random choice from the list given below.
colours = ['red', 'blue', 'green', 'yellow', 'black', 'purple', 'Brown', 'Orange', 'violet', 'gray']
now pick 1 item from above 10 items list.
print
pick 2 items from remaining 9 items.
print
finally pick 3 items from remaining 7 items.
print
So final result will be like this
Brown
green and violet
red black and gray
A simple way would just be to delete the chosen values from the list. It is slightly simpler if you use sets:
In []:
colours = {'red', 'blue', 'green', 'yellow', 'black', 'purple',
'Brown', 'Orange', 'violet', 'gray'}
for n in [1, 2, 3]:
cs = random.sample(colours, k=n)
colours -= set(cs)
print(cs)
Out[]:
['Brown']
['Orange', 'red']
['purple', 'gray', 'blue']
colors = ['red', 'blue', 'green', 'yellow', 'black', 'purple','Brown', 'Orange', 'violet', 'gray']
for n in range(1,4):
select=np.random.choice(colors,n)
print(select)
colors=list(set(colors).difference(set(select)))
output:-['Brown']
['red' 'violet']
['yellow' 'Orange' 'black']
The method I use consist in shuffling your input vector take the selected number of elements you need.
import random
colours = ['red', 'blue', 'green', 'yellow', 'black', 'purple', 'Brown', 'Orange', 'violet', 'gray']
random.shuffle(colours)
for i in range(1, 4):
n, colours = colours[0:i], colours[i:]
print(n)
I am trying to do the example in Use Python & Pandas to Create a D3 Force Directed Network Diagram
But in the below line I am getting an error 'KeyError: ('count', 'occurred at index 0')'
temp_links_list = list(grouped_src_dst.apply(lambda row: {"source": row['source'], "target": row['target'], "value": row['count']}, axis=1))
I am new in python. What is the issue here?
Edited code
import pandas as pd
import json
import re
pcap_data = pd.read_csv('C:\packet_metadata.csv', index_col='No.')
dataframe = pcap_data
src_dst = dataframe[["Source","Destination"]]
src_dst.rename(columns={"Source":"source","Destination":"target"}, inplace=True)
grouped_src_dst = src_dst.groupby(["source","target"]).size().reset_index()
grouped_src_dst.rename(columns={'count':'value'}).to_dict(orient='records')
unique_ips = pd.Index(grouped_src_dst['source']
.append(grouped_src_dst['target'])
.reset_index(drop=True).unique())
But
print(grouped_src_dst.columns.tolist())
['source', 'target', 0]
Final code
import pandas as pd
import json
import re
pcap_data = pd.read_csv('C:\packet_metadata.csv', index_col='No.')
dataframe = pcap_data
src_dst = dataframe[["Source","Destination"]]
src_dst.sample(10)
grouped_src_dst = src_dst.groupby(["Source","Destination"]).size().reset_index()
d={0:'value',"Source":"source","Destination":"target"}
L = grouped_src_dst.rename(columns=d)
unique_ips = pd.Index(L['source']
.append(L['target'])
.reset_index(drop=True).unique())
group_dict = {}
counter = 0
for ip in unique_ips:
breakout_ip = re.match("^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})$", ip)
if breakout_ip:
net_id = '.'.join(breakout_ip.group(1,2,3))
if net_id not in group_dict:
counter += 1
group_dict[net_id] = counter
else:
pass
temp_links_list = list(L.apply(lambda row: {"source": row['source'], "target": row['target'], "value": row['value']}, axis=1))
I think there is problem with column name count - missing or some witespace like ' count'.
#check columns names
print (grouped_src_dst.columns.tolist())
['count', 'source', 'target']
Sample:
grouped_src_dst = pd.DataFrame({'source':['a','s','f'],
'target':['b','n','m'],
'count':[0,8,4]})
print (grouped_src_dst)
count source target
0 0 a b
1 8 s n
2 4 f m
f = lambda row: {"source": row['source'], "target": row['target'], "value": row['count']}
temp_links_list = list(grouped_src_dst.apply(f, axis=1))
print (temp_links_list)
[{'value': 0, 'source': 'a', 'target': 'b'},
{'value': 8, 'source': 's', 'target': 'n'},
{'value': 4, 'source': 'f', 'target': 'm'}]
Simplier solution is rename column count and use DataFrame.to_dict:
print (grouped_src_dst.rename(columns={'count':'value'}).to_dict(orient='records'))
[{'value': 0, 'source': 'a', 'target': 'b'},
{'value': 8, 'source': 's', 'target': 'n'},
{'value': 4, 'source': 'f', 'target': 'm'}]
EDIT1:
pcap_data = pd.read_csv('C:\packet_metadata.csv', index_col='No.')
grouped_src_dst = pcap_data.groupby(["Source","Destination"]).size().reset_index()
d = {0:'value', "Source":"source","Destination":"target"}
L = grouped_src_dst.rename(columns=d).to_dict(orient='records')
Sample:
pcap_data = pd.DataFrame({'Source':list('aabbccdd'),
'Destination':list('eertffff')})
print (pcap_data)
Destination Source
0 e a
1 e a
2 r b
3 t b
4 f c
5 f c
6 f d
7 f d
grouped_src_dst = pcap_data.groupby(["Source","Destination"]).size().reset_index()
print (grouped_src_dst)
Source Destination 0
0 a e 2
1 b r 1
2 b t 1
3 c f 2
4 d f 2
d = {0:'value', "Source":"source","Destination":"target"}
L = grouped_src_dst.rename(columns=d).to_dict(orient='records')
print (L)
[{'value': 2, 'source': 'a', 'target': 'e'},
{'value': 1, 'source': 'b', 'target': 'r'},
{'value': 1, 'source': 'b', 'target': 't'},
{'value': 2, 'source': 'c', 'target': 'f'},
{'value': 2, 'source': 'd', 'target': 'f'}]
unique_ips = pd.Index(grouped_src_dst['Source']
.append(grouped_src_dst['Destination'])
.reset_index(drop=True).unique())
print (unique_ips)
Index(['a', 'b', 'c', 'd', 'e', 'r', 't', 'f'], dtype='object')
import numpy as np
unique_ips = np.unique(grouped_src_dst[['Source','Destination']].values.ravel()).tolist()
print (unique_ips)
['a', 'b', 'c', 'd', 'e', 'f', 'r', 't']
I have a nparray that contains 0 and 1 values
k = np.array([0, 1, 1, 0 ,1])
I want to transform the array into an array that contains 'blue' if the value is 0 and 'red' if the value is 1. I prefer to know the fastest way possible
You can use np.take to index into a an array/list of 2 elements with those k values as indices, like so -
np.take(['blue','red'],k)
Sample run -
In [19]: k = np.array([0, 1, 1, 0 ,1])
In [20]: np.take(['blue','red'],k)
Out[20]:
array(['blue', 'red', 'red', 'blue', 'red'],
dtype='|S4')
With the explicit indexing method -
In [23]: arr = np.array(['blue','red'])
In [24]: arr[k]
Out[24]:
array(['blue', 'red', 'red', 'blue', 'red'],
dtype='|S4')
Or with initialization with one string and then assigning the other one -
In [41]: out = np.full(k.size, 'blue')
In [42]: out[k==1] = 'red'
In [43]: out
Out[43]:
array(['blue', 'red', 'red', 'blue', 'red'],
dtype='|S4')
Runtime test
Approaches -
def app1(k):
return np.take(['blue','red'],k)
def app2(k):
arr = np.array(['blue','red'])
return arr[k]
def app3(k):
out = np.full(k.size, 'blue')
out[k==1] = 'red'
return out
Timings -
In [46]: k = np.random.randint(0,2,(100000))
In [47]: %timeit app1(k)
...: %timeit app2(k)
...: %timeit app3(k)
...:
1000 loops, best of 3: 413 µs per loop
10000 loops, best of 3: 103 µs per loop
1000 loops, best of 3: 908 µs per loop