Optimization based on rank and absolute deviation

Optimization based on rank and absolute deviation - python-3.x

I have a problem statement where i am trying to optimize the position of the shipment such that 2 specific objectives are met, Deviation time and inherent ranking
Given: some Shipments, the objective is to sequence them in a way such that the deviation between the actual start - planned start is minimized, while also ensuring that the ranking of shipments is optimized.
Example: 3 shipments which need to be executed at [10 am, 12 noon, 1 PM] , assuming each statement takes 1 hour to be executed, what should be the sequence of execution?
Shipments [S1 S2 S3]
Planned ranking [1 2 3]
Planned time [10,12,13]
The solver should give me a solution where the ranking and deviation between the planned and actual time should be less
Example Solution
Optimized ranking can be [2 1 3] execute S2 first, then S1 and finally S3
Optimized time can be anywhere between 1 - 24 depending on other constraints.
hence S2 can begin at either 1 and can take values till 22
S1 should begin after S2 and can take matters from S1 end time till 23
and S3 should begin after S2 and S1 and can take values from the End of S1 till 24.
I was able to write down the ranking part where the objective function is Minimizing the rank(Derived froma pre-determined matrix) and binary choice of shipment and position:
'''
import random
import pandas as pd
from pulp import *
shipments = {'D1':1, 'D2':2, 'D3':3}
positions = {'P1':1, 'P2':2, 'P3':3}
endtime = {'D1':10, 'D2':15, 'D3':20}
rank={('D1', 'P1'): 2,
('D1', 'P2'): 4,
('D1', 'P3'): 7,
('D2', 'P1'): 3,
('D2', 'P3'): 2,
('D2', 'P2'): 1,
('D3', 'P3'): 1,
('D3', 'P2'): 2,
('D3', 'P1'): 3}
shipments_labels = sorted(shipments.keys())
positions_labels = sorted(positions.keys())
BIG_M = 9999.9
# Binary to hold shipment assigned to position
x = LpVariable.dicts("x", (shipments_labels,positions_labels), 0, 1, LpInteger)
y = LpVariable.dicts("y", (shipments_labels,positions_labels), 1, 24, LpInteger) # to hold the time starting for shipment
y_diff = LpVariable.dicts("y_diff", (shipments_labels), cat='Real') # to hold absolute deviations
print(x)
print(y)
print(y_diff)
print(rank[('D1', 'P1')])
prob = LpProblem("ranking", LpMinimize)
prob += (lpSum([(rank[(i,j)]**2 -1) * x[i][j] for i in shipments_labels for j in positions_labels]),
'Total_Cost') # Objective Function
# #every shipment should be assigned a position
for i in shipments_labels:
prob += (lpSum([x[i][j] for j in positions_labels])==1)
# #every position only 1 shipment
for i in positions_labels:
prob += (lpSum([x[j][i] for j in shipments_labels])==1)
prob.solve()
for v in prob.variables():
print(v)
print(v.varValue)
'''
However, I am confused about how to code the deviation logic into the objective. I created the variables yij to hold the interger value of the allocated time and y_diff to hold the absolute deviation. Since deviation can be both positive and negative, it can be penalized differently.
1- Problem Statement
2- Problem Statement
3- Problem Statement

Related

Implement Equal-Width Intervals feature engineering in Dask

In equal-width discretization, the variable values are assigned to intervals of the same width. The number of intervals is user-defined and the width is determined by the minimum/maximum values and the number of intervals.
For example, given the values 10, 20, 100, 130 the minimum is 10 and the maximum is 130. If the user defines the number of intervals as six, given the formula:
Interval Width = (Max(x) - Min(x)) / N
The width is (130 - 10) / 6 = 20
And the six zero-based intervals are: [ 10, 30, 50, 70, 90, 110, 130]
Finally, the interval assignments are defined for each element in the dataset:
Value in the dataset New feature engineered value
10 0
20 0
57 2
101 4
130 5
I have the following code that uses a pandas dataframe with a sklean function to divide the dataframe in equal width intervals:
from sklearn.preprocessing import KBinsDiscretizer
discretizer = KBinsDiscretizer(n_bins=10, encode='ordinal', strategy='uniform')
df['output_col'] = discretizer.fit_transform(df[['input_col']])
This works fine, but I need to implement an equivalent dask function that will trigger the process in parallel in multiple partitions, and I cannot find KBinsDiscretizer in dask_ml.preprocessing Any suggestions? I cannot use map_partitions because it will apply the function to each partition independently, and I need the intervals applied to the entire dataframe.

You're facing a common tradeoff with distributed workflows. Do you want to spend the time/resource/compute required to determine the exact min/max, which is a pre-requisite for the binning scheme you describe, or is an approximate answer alright? If the latter, how do you design an algorithm which adequately captures the data's min/max while remaining efficient?
We can start with the exact solution, since it's easier to implement. The key is simply to find the min and max first, then digitize the data. Note that this requires computing all values in the column twice. If persisting the data is an option (e.g. you are working with a distributed cluster or can fit the column to be binned in memory), it would help avoid unecessary repetition:
def discretize_exact(
s: dask.dataframe.Series, K: int
) -> dask.dataframe.Series:
"""
Discretize values in dask.dataframe Series into K equal-width bins
Parameters
----------
s : dask.dataframe.Series
Series with values to be binned
K : int
Number of equal-width bins to generate
Returns
-------
binned : dask.dataframe.Series
dask.dataframe.Series with scheduled np.digitize operation
called using map_partitions. The values in ``binned`` will
be in [0, K] giving the index of the K bins in the interval
[vmin, vmax].
"""
# schedule the min/max computation
vmin, vmax = s.min(), s.max()
# compute vmin and vmax together so we only compute once
vmin, vmax = dask.compute(vmin, vmax)
# will create K - 1 equal width bins, with
# the outer ends open, such that the first bin will be
# (-inf, vmin + step) and the last will be [vmax - step, inf)
bins = np.linspace(vmin, vmax, (K + 1))[1:-1]
return s.map_partitions(
np.digitize,
bins=bins,
meta=('binned', 'uint16'),
)
This does (I think) what you're looking for, but does involve computing the min and max first prior to scheduling the binning operation. Using an example frame:
import dask.dataframe, pandas as pd, numpy as np
N = 10000
df = dask.dataframe.from_pandas(
pd.DataFrame({'a': np.random.random(size=N)}),
chunksize=1000,
)
We can use the above function to discretize our data:
In [68]: df['binned_a'] = discretize_exact(df['a'], K=10)
In [69]: df
Out[69]:
Dask DataFrame Structure:
a binned_a
npartitions=10
0 float64 uint16
1000 ... ...
... ... ...
9000 ... ...
9999 ... ...
Dask Name: assign, 40 tasks
In [70]: df.compute()
Out[70]:
a binned_a
0 0.548415 5
1 0.872668 8
2 0.466869 4
3 0.133986 1
4 0.833126 8
... ... ...
9995 0.223438 2
9996 0.575271 5
9997 0.922593 9
9998 0.030127 0
9999 0.204283 2
[10000 rows x 2 columns]
Alternatively, you could try to approximate the bin edges. You could do this a number of ways, including sampling the dataframe to identify the min/max of one or more partitions, or you the user could provide an overly wide-estimate of the range. Note that, depending on your workflow, computing the first partition may still involve computing a large part of the overall graph, or even the entire graph if e.g. the dataframe was reshuffled in a recent step.
def find_minmax_of_first_partition(
s: dask.dataframe.Series
) -> tuple[float, float]:
"""
Find the min and max of the first partition of a dask.dataframe.Series
"""
partition_0_stats = (
s.partitions[0].compute().agg(['min', 'max'])
)
return (
partition_0_stats['min'].item(),
partition_0_stats['max'].item(),
)
You could widen this range if desired, using your intuition about the spread of the values:
vmin_p0, vmax_p0 = find_minmax_of_first_partition(df['a'])
range_p0 = (vmax_p0 - vmin_p0)
mean_p0 = (vmin_p0 + vmax_p0) / 2
# guess that the overall data is within 10x the range of partition 1
min_est, max_est = mean_p0 - 5*range_p0, mean_p0 + 5*range_p0
# now, bin all values using this estimated min, max. Note that
# any data falling outside your estimated min/max value will be
# coded as values 0 or K + 1.
bins = np.linspace(min_est, max_est, (K + 1))
binned = s.map_partitions(
np.digitize,
bins=bins,
meta=('binned', 'uint16'),
)
these bins will be equally spaced, but will not necessarily start/end at the min/max and therefore may either not catch all the data or may have empty bins at the edges. You may need to take a look at how your bin specification performs and iterate based on your data.

New to python and need help getting started with this function

I am very new to python and I have done research but all I could find on this problem was on outdated versions of python. I hear this community is able to help me with this problem.
I am attempting on making a function called makeChange with amount as a parameter.
The function is supposed to take user input as a decimal and convert what the user inputted into bills and coins. (for example .05$, .10$, .25$, .50$ 1$ and so on.)
Is it possible that I can get a base to build off of? (Not the entire function maybe a few errors just so I can learn.)
(Thanks for taking the time to read what I have to say!)

Problem
So, in makeChange problem we find try to find the minimum number of coins that add up to a give amount of money.
Code
def makeChange(amount):
# You can add more or remove the coins from this list
coins = [.05, .1, .25, .5, 1]
# LEARN: Find out how sort function works on a list, and what does the reverse parameter means. Why do we need to sort it?
# Sort the coins list (incase it's not sorted)
coins.sort(reverse=True)
change = []
for coin in coins:
# LEARN: Find out why we type cast the result of totalCoin to integer,
# LEARN: Find out what do we get by using integer division and modulo to the amount variable
# Notes: // means integer division, / means float division, % means modulo or remainder
totalCoin = int(float(amount) // coin)
amount = amount % coin
# Round the amount to 2 precision point (0.25, 1.00) since there is no $0.001 right
amount = round(amount, 2)
# Append the coin we use for change
for i in range(totalCoin):
change.append(coin)
if amount == 0:
return change
# If we ever reach here, that means, the given coin in coins list
# is not able to return the change
return 'Not changeable'
print('22.76: ', makeChange(22.76)) # 22.76: Not changeable
print('13.8: ', makeChange(13.8)) # 13.8: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0.5, 0.25, 0.05]
print('2.4', makeChange(2.4)) # 2.4 [1, 1, 0.25, 0.1, 0.05]
Conclusion
The code above is working properly, I did not any error to it, BUT I added lots of comment for you to find out yourself, how and why this code works the way it is. Good luck on learning Python!

Here is a start for you:
def makeChange(amount):
pass

You can compute how many bills and coins are needed. You can work on float type. int function converts float to int and truncate decimal part. For example
def makeChange(amount):
amt = float(amount)
n100 = int(amt/100.)
amt = amt - n100*100.
n50 = int(amt/50.)
amt = amt - n50*50.
# and so on... you can add more
n100, n50 # return how many bills

Deciding if all intervals are overlapping

I'm doing a problem that n people is standing on a line and each person knows their own position and speed. I'm asked to find the minimal time to have all people go to any spot.
Basically what I'm doing is finding the minimal time using binary search and have every ith person's furthest distance to go in that time in intervals. If all intervals overlap, there is a spot that everyone can go to.
I have a solution to this question but the time limit exceeded for it for my bad solution to find the intervals. My current solution runs too slow and I'm hoping to get a better solution.
my code:
people = int(input())
peoplel = [list(map(int, input().split())) for _ in range(people)] # first item in people[i] is the position of each person, the second item is the speed of each person
def good(time):
return checkoverlap([[i[0] - time *i[1], i[0] + time * i[1]] for i in peoplel])
# first item,second item = the range of distance a person can go to
def checkoverlap(l):
for i in range(len(l) - 1):
seg1 = l[i]
for i1 in range(i + 1, len(l)):
seg2 = l[i1]
if seg2[0] <= seg1[0] <= seg2[1] or seg1[0] <= seg2[0] <= seg1[1]:
continue
elif seg2[0] <= seg1[1] <= seg2[1] or seg1[0] <= seg2[1] <= seg1[1]:
continue
return False
return True
(this is my first time asking a question so please inform me about anything that is wrong)

One does simply go linear
A while after I finished the answer I found a simplification that removes the need for sorting and thus allows us to further reduce the complexity of finding if all the intervals are overlapping to O(N).
If we look at the steps that are being done after the initial sort we can see that we are basically checking
if max(lower_bounds) < min(upper_bounds):
return True
else:
return False
And since both min and max are linear without the need for sorting, we can simplify the algorithm by:
Creating an array of lower bounds - one pass.
Creating an array of upper bounds - one pass.
Doing the comparison I mentioned above - two passes over the new arrays.
All this could be done together in one one pass to further optimize(and to prevent some unnecessary memory allocation), however this is clearer for the explanation's purpose.
Since the reasoning about the correctness and timing was done in the previous iteration, I will skip it this time and keep the section below since it nicely shows the thought process behind the optimization.
One sort to rule them all
Disclaimer: This section was obsoleted time-wise by the one above. However since it in fact allowed me to figure out the linear solution, I'm keeping it here.
As the title says, sorting is a rather straightforward way of going about this. It will require a little different data structure - instead of holding every interval as (min, max) I opted for holding every interval as (min, index), (max, index).
This allows me to sort these by the min and max values. What follows is a single linear pass over the sorted array. We also create a helper array of False values. These represent the fact that at the beginning all the intervals are closed.
Now comes the pass over the array:
Since the array is sorted, we first encounter the min of each interval. In such case, we increase the openInterval counter and a True value of the interval itself. Interval is now open - until we close the interval, the person can arrive at the party - we are within his(or her) range.
We go along the array. As long as we are opening the intervals, everything is ok and if we manage to open all the intervals, we have our party destination where all the social distancing collapses. If this happens, we return True.
If we close any of the intervals, we have found our party breaker who can't make it anymore. (Or we can discuss that the party breakers are those who didn't bother to arrive yet when someone has to go already). We return False.
The resulting complexity is O(Nlog(N)) caused by the initial sort since the pass itself is linear in nature. This is quite a bit better than the original O(n^2) caused by the "check all intervals pairwise" approach.
The code:
import numpy as np
import cProfile, pstats, io
#random data for a speed test. Not that useful for checking the correctness though.
testSize = 10000
x = np.random.randint(0, 10000, testSize)
y = np.random.randint(1, 100, testSize)
peopleTest = [x for x in zip(x, y)]
#Just a basic example to help with the reasoning about the correctness
peoplel = [(1, 2), (3, 1), (8, 1)]
# first item in people[i] is the position of each person, the second item is the speed of each person
def checkIntervals(people, time):
a = [(x[0] - x[1] * time, idx) for idx, x in enumerate(people)]
b = [(x[0] + x[1] * time, idx) for idx, x in enumerate(people)]
checks = [False for x in range(len(people))]
openCount = 0
intervals = [x for x in sorted(a + b, key=lambda x: x[0])]
for i in intervals:
if not checks[i[1]]:
checks[i[1]] = True
openCount += 1
if openCount == len(people):
return True
else:
return False
print(intervals)
def good(time, people):
return checkoverlap([[i[0] - time * i[1], i[0] + time * i[1]] for i in people])
# first item,second item = the range of distance a person can go to
def checkoverlap(l):
for i in range(len(l) - 1):
seg1 = l[i]
for i1 in range(i + 1, len(l)):
seg2 = l[i1]
if seg2[0] <= seg1[0] <= seg2[1] or seg1[0] <= seg2[0] <= seg1[1]:
continue
elif seg2[0] <= seg1[1] <= seg2[1] or seg1[0] <= seg2[1] <= seg1[1]:
continue
return False
return True
pr = cProfile.Profile()
pr.enable()
print(checkIntervals(peopleTest, 10000))
print(good(10000, peopleTest))
pr.disable()
s = io.StringIO()
sortby = "cumulative"
ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
ps.print_stats()
print(s.getvalue())
The profiling stats for the pass over test array with 10K random values:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.001 0.001 8.933 8.933 (good)
1 8.925 8.925 8.926 8.926 (checkoverlap)
1 0.003 0.003 0.023 0.023 (checkIntervals)
1 0.008 0.008 0.010 0.010 {built-in method builtins.sorted}

How to set LpVariable and Objective Function in pulp for LPP as per the formula?

I want to calculate the Maximised value of the particular user based on his Interest | Popularity | both Interest and Popularity using following Linear Programming Problem(LPP) equation
using pulp package in python3.7.
I have 4 lists
INTEREST = [5,10,15,20,25]
POPULARITY = [4,8,12,16,20]
USER = [1,2,3,4,5]
cost = [2,4,6,8,10]
and 2 variable values as
e=0.5 ; e may take (0 or 1 or 0.5)
budget=20
and
i=0 to n ; n is length of the list
means, the summation want to perform for all list values.
Here, if e==0 means Interest will 0 ; if e==1 means Popularity will 0 ; if e==0.5 means Interest and Popularity will be consider for Max Value
Also xi takes 0 or 1; if xi==1 then the user will be consider else if xi==0 then the user will not be consider.
and my pulp code as below
from pulp import *
INTEREST = [5,10,15,20,25]
POPULARITY = [4,8,12,16,20]
USER = [1,2,3,4,5]
cost = [2,4,6,8,10]
e=0.5
budget=10
#PROBLEM VARIABLE
prob = LpProblem("MaxValue", LpMaximize)
# DECISION VARIABLE
int_vars = LpVariable.dicts("Interest", INTEREST,0,4,LpContinuous)
pop_vars = LpVariable.dicts("Popularity",
POPULARITY,0,4,LpContinuous)
user_vars = LpVariable.dicts("User",
USER,0,4,LpBinary)
#OBJECTIVE fUNCTION
prob += lpSum(USER(i)((INTEREST[i]*e for i in INTEREST) +
(POPULARITY[i]*(1-e) for i in POPULARITY)))
# CONSTRAINTS
prob += USER(i)cost(i) <= budget
#SOLVE
prob.solve()
print("Status : ",LpStatus[prob.status])
# PRINT OPTIMAL SOLUTION
print("The Max Value = ",value(prob.objective))
Now I am getting 2 errors as
1) line 714, in addInPlace for e in other:
2) line 23, in
prob += lpSum(INTEREST[i]e for i in INTEREST) +
lpSum(POPULARITY[i](1-e) for i in POPULARITY)
IndexError: list index out of range
What I did wrong in my code. Guide me to resolve this problem. Thanks in advance.

I think I finally understand what you are trying to achieve. I think the problem with your description is to do with terminology. In a linear program we reserve the term variable for those variables which we want to be selected or chosen as part of the optimisation.
If I understand your needs correctly your python variables e and budget would be considered parameters or constants of the linear program.
I believe this does what you want:
from pulp import *
import numpy as np
INTEREST = [5,10,15,20,25]
POPULARITY = [4,8,12,16,20]
COST = [2,4,6,8,10]
N = len(COST)
set_user = range(N)
e=0.5
budget=10
#PROBLEM VARIABLE
prob = LpProblem("MaxValue", LpMaximize)
# DECISION VARIABLE
x = LpVariable.dicts("user_selected", set_user, 0, 1, LpBinary)
# OBJECTIVE fUNCTION
prob += lpSum([x[i]*(INTEREST[i]*e + POPULARITY[i]*(1-e)) for i in set_user])
# CONSTRAINTS
prob += lpSum([x[i]*COST[i] for i in set_user]) <= budget
#SOLVE
prob.solve()
print("Status : ",LpStatus[prob.status])
# PRINT OPTIMAL SOLUTION
print("The Max Value = ",value(prob.objective))
# Show which users selected
x_soln = np.array([x[i].varValue for i in set_user])
print("user_vars: ")
print(x_soln)
Which should return the following, i.e. with these particular parameters only the last user is selected for inclusion - but this decision will change - for example if you increase the budget to 100 all users will be selected.
Status : Optimal
The Max Value = 22.5
user_vars:
[0. 0. 0. 0. 1.]

Analyzing the time complexity of Coin changing

We're doing the classic problem of determining the number of ways that we can make change that amounts to Z given a set of coins.
For example, Amount=5 and Coins={1, 2, 3}. One way we can make 5 is {2, 3}.
The naive recursive solution has a time complexity of factorial time.
f(n) = n * f(n-1) = n!
My professor argued that it actually has a time complexity of O(2^n), because we only choose to use a coin or not. That intuitively makes sense. However how come my recurence doesn't work out to be O(2^n)?
EDIT:
My recurrence is as follows:
f(5, {1, 2, 3})
/ \ .....
f(4, {2, 3}) f(3, {1, 3}) .....
Notice how the branching factor decreases by 1 at every step.
Formally.
T(n) = n*F(n-1) = n!

The recurrence doesn't work out to what you expect it to work out to because it doesn't reflect the number of operations made by the algorithm.
If the algorithm decides for each coin whether to output it or not, then you can model its time complexity with the recurrence T(n) = 2*T(n-1) + O(1) with T(1)=O(1); the intuition is that for each coin you have two options---output the coin or not; this obviously solves to T(n)=O(2^n).

I too was trying to analyze the time complexity for the brute force which performs depth first search:
def countCombinations(coins, n, amount, k=0):
if amount == 0:
return 1
res = 0
for i in range(k, n):
if coins[k] <= amount:
remaining_amount = amount - coins[i] # considering this coin, try for remaining sum
# in next round include this coin too
res += countCombinations(coins, n, remaining_amount, i)
return res
but we can see that the coins which are used in one round is used again in the next round, so at least for 1st coin we have n items at each stage which is equivalent to permutation with repetition n^r for n items available to arrange into r positions at each stage.
ex: [1, 1, 1, 1]; sum = 4
This will generate a recursive tree where for first path we literally have solutions at each diverged subpath until we have the sum=0. so the time complexity is O(sum^n) ie for each stage in the path towards sum we have n different subpaths.
Note however there is another algorithm which uses take/not-take approach and at most there is 2 branch at a node in recursion tree. Hence the time complexity for this algorithm is O(2^(n*m))
ex: say coins = [1, 1] sum = 2 there are 11 nodes/points to visit in the recursion tree for 6 paths(leaves) then complexity is at most 2^(2*2) => 2^4 => 16 (Hence 11 nodes visiting for a max of 16 possibility is correct but little loose on upper bound).
def get_count(coins, n, sum):
if(n == 0): # no coins left, to try a combination that matches the sum
return 0
if(sum == 0): # no more sum left to match, means that we have completely co-incided with our trial
return 1 # (return success)
# don't-include the last coin in the sum calc so, leave it and try rest
excluded = get_count(coins, n-1, sum)
included = 0
if(coins[n-1] <= sum):
# include the last coin in the sum calc, so reduce by its quantity in the sum
# we assume here that n is constant ie, it is supplied in unlimited(we can choose same coin again and again),
included = get_count(coins, n, sum-coins[n-1])
return included+excluded

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Optimization based on rank and absolute deviation - python-3.x

Related

Implement Equal-Width Intervals feature engineering in Dask

New to python and need help getting started with this function

Deciding if all intervals are overlapping

How to set LpVariable and Objective Function in pulp for LPP as per the formula?

Analyzing the time complexity of Coin changing

Categories

Resources