There are 22 drivers. Each driver has to work min of 7.6 hrs and can work max of 10 hrs. Each driver cost and productivity is different.
if some driver works overtime (more than 7.6 hrs), for first 2 hrs, we need to pay 1.5 times. For remaining 0.4 hrs, we need to pay 2 times.
195 hrs of work has to be completed by 22 drivers. We need to schedule in such a way that cost can be minimized.
Driver,Cost,Productivity
A,70,0.8
B,22,0.8
C,24,0.8
D,26,0.8
E,28,0.8
F,30,0.8
G,32,0.8
H,34,0.8
I,36,0.8
J,38,0.8
K,40,0.8
L,42,0.9
M,44,0.9
N,46,0.9
O,48,0.9
P,50,0.9
Q,52,0.9
R,54,0.9
S,56,0.9
T,58,0.9
U,60,0.9
V,62,0.5
Decision Variables:
X1,X2 ........X22 represents the total number of hours allocated to each driver
Objective Function:
Min Z = 20*X1 +22*X2......62*X22
Constraints:
X1>=7.6,X2>=7.6....X22>=7.6
X1<=10,X2<=10....X22<=10
X1+X2.....+X22 <= 195
I have tried following python program so far.
import pulp
import pandas as pd
def main():
model = pulp.LpProblem("Cost minimising scheduling problem", pulp.LpMinimize)
totalHours = 192
minHourEachDriver = 7.6
maxHourEachDriver = 10
# importing data from CSV
drivers = pd.DataFrame.from_csv('csv/drivers.csv', index_col=['Driver', 'Cost', 'Productivity'])
# Decision Variables
drv = pulp.LpVariable.dicts("driverName", indexs=((i) for i, j, k in drivers.index), lowBound=0,
cat='Continuous')
# Objective
model += pulp.lpSum([j * (1 / k) * drv[i] for i, j, k in drivers.index]), "Cost"
# Constraints
# total no of hours work to be done
model += pulp.lpSum([drv[i] for i, j, k in drivers.index]) == totalHours
for i, j, k in drivers.index:
# minimum hours driver has to work
model += drv[i] >= minHourEachDriver
# Maximum hour driver can work
model += drv[i] <= maxHourEachDriver
model.solve()
# model status
print(pulp.LpStatus[model.status])
# Total Cost
print(pulp.value(model.objective))
# No of hrs allocated to each driver
for i, j, k in drivers.index:
var_value = drv[i].varValue
# print(var_value)
print("The number hours for driver {0} are {1}".format(i, var_value))
if __name__ == '__main__':
main()
But, I am not able to figure out, how do we put following constraint.
if some driver work overtime (more than 7.6 hrs), for first 2 hrs, we
need to pay 1.5 times. For remaining 0.4 hrs, we need to pay 2 times.
If for each driver is mandatory to work 7.6h, there is no need to put it in the conditions. It is just static time (cost) that can be subtracted from total hours (costs) because it always happen:
195 - (NumDrivers * 7.6) = is the remaining time that need to be flexibly distributed between drivers as their overtimes to reach the 195 hours (when total hours > NumDrivers*7,6).
I would represent each driver with two variables (one for time working at 1.5 rate and second working time at double rate) and make following LP:
Xij = represents hours allocated to i-driver in j-working mode (let's say j=1 for 1,5 and j=2 for 2)
Based on the provided input file:
Min Z = 70*1,5*X11 + 70*2*X12 + 22*1,5*X21 + 22*2*X22 + ... 62*1,5*X221 + 62*2*X222
Constraints:
X11+X12+X21+X22+...X221+X222 = 27,8 (195 - (22*7,6))
X11+X12 <= 3,4
X21+X22 <= 3,4
...
X221+X222 <= 3,4
X11<=2
X21<=2
...
X221<=2
For the completeness there should be also set of conditions representing that each driver can start with j mode (2*) only after completing 2 hours at 1.5* but in this case objective function should make it automatically.
Related
I have the following snippets of code which is a subroutine of the K-means clustering algorithm; specifically, it tries to assign each point to the closest centroid.
import numpy as np
n = 20000
D = 30
K = 250
points = np.random.rand(n, D)
centroids = np.random.rand(K, D)
membership = np.zeros(shape=n, dtype=int)
for i in range(n):
distances = np.apply_along_axis(lambda x: np.linalg.norm(x, ord=2), 1, centroids - points[i])
membership[i] = np.argmin(distances)
The running time here should be O(NKD) where D is the dimension of the data points, so naturally I expect when D increases or decreases, the running time would change proportionally as well. To my surprise, I see very little time being changed when changing D, for example when testing on my local machine:
D = 1
python3 benchmark.py 12.10s user 0.39s system 118% cpu 10.564 total
D = 30
python3 benchmark.py 12.17s user 0.36s system 117% cpu 10.703 total
D = 300
python3 benchmark.py 13.30s user 0.31s system 115% cpu 11.784 total
D = 1000
python3 benchmark.py 16.51s user 1.76s system 110% cpu 16.524 total
Is there something that I'm missing here?
Edit: per #Warren's suggestion, I modified the code to use np.linalg.norm with axis parameter directly; the performance is following:
D = 1
python3 benchmark.py 1.45s user 0.37s system 634% cpu 0.287 total
D = 30
python3 benchmark.py 1.67s user 0.29s system 592% cpu 0.331 total
D = 300
python3 benchmark.py 3.03s user 0.32s system 234% cpu 1.428 total
D = 1000
python3 benchmark.py 6.32s user 2.73s system 126% cpu 7.177 total
so the performance was better.
This is due to the overhead of Numpy functions.
Indeed, np.apply_along_axis is called 20_000 times and each call to this function internally does a loop calling the target Python function 250 times (ie. it is not vectorized), and so np.linalg.norm. In the end, np.linalg.norm is called, 20_000 * 250 = 5000000 times. The thing is each call to a Numpy function takes typically about 1 µs. On my machine, np.linalg.norm takes 4-5 µs on an array of size 1. This time is due to many internal checks (types and values), allocations, functions calls, conversion, etc.
There are two simple ways to reduce this overhead: vectorization and using a JIT compiler like Numba. The later is often more efficient as it avoid creating expensive big temporary arrays.
Here is a much faster implementation:
import numpy as np
import numba as nb
#nb.njit('(float64[:,::1], float64[:,::1], int_[::1])')
def compute(points, centroids, membership):
n, K, D = points.shape[0], centroids.shape[0], points.shape[1]
assert centroids.shape[1] == D and membership.shape[0] == n
distances = np.empty(K, np.float64)
for i in range(n):
for j in range(K):
distances[j] = np.linalg.norm(centroids[j] - points[i], ord=2)
membership[i] = np.argmin(distances)
n = 20000
D = 30
K = 250
points = np.random.rand(n, D)
centroids = np.random.rand(K, D)
membership = np.zeros(shape=n, dtype=int)
compute(points, centroids, membership)
In fact, while this code is much faster, it still have a similar issue: the cost of allocating the temporary arrays centroids[j] - points[i] is significant compared to the actual time required to compute the norm. In fact, each allocations takes only few hundred of nanoseconds, but the number of loop iteration is huge. One solution is simply to compute the norm manually:
from math import sqrt
#nb.njit('(float64[:,::1], float64[:,::1], int_[::1])', fastmath=True)
def compute_fast(points, centroids, membership):
n, K, D = points.shape[0], centroids.shape[0], points.shape[1]
assert centroids.shape[1] == D and membership.shape[0] == n
distances = np.empty(K, np.float64)
for i in range(n):
for j in range(K):
s = 0.0
for k in range(D):
tmp = centroids[j,k] - points[i,k]
s += tmp * tmp
distances[j] = sqrt(s)
membership[i] = np.argmin(distances)
Here are results on my i5-9600KF processor:
D=1:
initial code: 26.56 seconds
compute: 1.44 seconds
compute_fast: 0.02 seconds (x1328)
D=30:
initial code: 27.09 seconds
compute: 1.65 seconds
compute_fast: 0.13 seconds (x208)
D=1000:
initial code: 39.34 seconds
compute: 3.74 seconds
compute_fast: 4.57 seconds (x8.6)
The last implementation is much faster for small values of D since the Numpy overhead are the main bottleneck in this case and the implementation can almost completely remove such overheads (thanks to the JIT compilation).
It is probably O(NKD).
But the thing is you are iterating 3 loops here. One explicitly. One semi-explicitly. And the last one implicitly, inside numpy functions.
The outer one is your explicit for loop, for N.
The middle one is the np.apply_along_axis one, which applies on the K rows of centroids-points[i] (btw, there is another one here, with some broadcasting. But we don't need to count all of them for big-O consideration)
And the inner one is the one on the D columns that occur inside norm.
The inner one is obviously the most important to optimized, and that's good, because it is the only one that is vectorized here.
But that means that for small enough value of D, what we really see is more some constant overhead (times N×K, since it is inside a double for loop). Your inefficient outer for loops drive most of the cost, which, then, looks like O(NK).
Note that np.apply_along_axis is just a for loop by another name. It is not as bad. But almost so. It is still calling several times some python code. It is not vectorization.
But, well, I bet that with D big enough, you'll see that it is O(NKD)
Edit:
Here is what I get when I increase D (with smaller n, so that it remains computable in realistic time)
You see that it looks really linear (affine, to be accurate, since it doesn't pass through 0, which is the reason why it doesn't look very linear to you; and which is explained by my previous comment: most of the inner cost inside the for/along_axis double loop is mainly constant overhead of those loops, when D is small. The "proportional to D" part begins to show when the overhead become negligible)
I've been working on code to calculate the distance between 33 3D points and calculate the shortest route is between them. The initial code took in all 33 points and paired them consecutively and calculated the distances between the pairs using math.sqrt and sum them all up to get a final distance.
My problem is that with the sheer number of permutations of a list with 33 points (33 factorial!) the code is going to need to be at its absolute best to find the answer within a human lifetime (assuming I can use as many CPUs as I can get my hands on to increase the sheer computational power).
I've designed a simple web server to hand out an integer and convert it to a list and have the code perform a set number of lexicographical permutations from that point and send back the resulting shortest distance of that block. This part is fine but I have concerns over the code that does the distance calculations
I've put together a test version of my code so I could change things and see if it made the execution time faster or slower. This code starts at the beginning of the permutation list (0 to 32) in order and performs 50 million lexicographical iterations on it, checking the distance of the points at every iteration. the code is detailed below.
import json
import datetime
import math
def next_lexicographic_permutation(x):
i = len(x) - 2
while i >= 0:
if x[i] < x[i+1]:
break
else:
i -= 1
if i < 0:
return False
j = len(x) - 1
while j > i:
if x[j] > x[i]:
break
else:
j-= 1
x[i], x[j] = x[j], x[i]
reverse(x, i + 1)
return x
def reverse(arr, i):
if i > len(arr) - 1:
return
j = len(arr) - 1
while i < j:
arr[i], arr[j] = arr[j], arr[i]
i += 1
j -= 1
# ip for initial permutation
ip = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32]
lookup = '{"0":{"name":"van Maanen\'s Star","x":-6.3125,"y":-11.6875,"z":-4.125},\
"1":{"name":"Wolf 124","x":-7.25,"y":-27.1562,"z":-19.0938},\
"2":{"name":"Midgcut","x":-14.625,"y":10.3438,"z":13.1562},\
"3":{"name":"PSPF-LF 2","x":-4.40625,"y":-17.1562,"z":-15.3438},\
"4":{"name":"Wolf 629","x":-4.0625,"y":7.6875,"z":20.0938},\
"5":{"name":"LHS 3531","x":1.4375,"y":-11.1875,"z":16.7812},\
"6":{"name":"Stein 2051","x":-9.46875,"y":2.4375,"z":-15.375},\
"7":{"name":"Wolf 25","x":-11.0625,"y":-20.4688,"z":-7.125},\
"8":{"name":"Wolf 1481","x":5.1875,"y":13.375,"z":13.5625},\
"9":{"name":"Wolf 562","x":1.46875,"y":12.8438,"z":15.5625},\
"10":{"name":"LP 532-81","x":-1.5625,"y":-27.375,"z":-32.3125},\
"11":{"name":"LP 525-39","x":-19.7188,"y":-31.125,"z":-9.09375},\
"12":{"name":"LP 804-27","x":3.3125,"y":17.8438,"z":43.2812},\
"13":{"name":"Ross 671","x":-17.5312,"y":-13.8438,"z":0.625},\
"14":{"name":"LHS 340","x":20.4688,"y":8.25,"z":12.5},\
"15":{"name":"Haghole","x":-5.875,"y":0.90625,"z":23.8438},\
"16":{"name":"Trepin","x":26.375,"y":10.5625,"z":9.78125},\
"17":{"name":"Kokary","x":3.5,"y":-10.3125,"z":-11.4375},\
"18":{"name":"Akkadia","x":-1.75,"y":-33.9062,"z":-32.9688},\
"19":{"name":"Hill Pa Hsi","x":29.4688,"y":-1.6875,"z":25.375},\
"20":{"name":"Luyten 145-141","x":13.4375,"y":-0.8125,"z":6.65625},\
"21":{"name":"WISE 0855-0714","x":6.53125,"y":-2.15625,"z":2.03125},\
"22":{"name":"Alpha Centauri","x":3.03125,"y":-0.09375,"z":3.15625},\
"23":{"name":"LHS 450","x":-12.4062,"y":7.8125,"z":-1.875},\
"24":{"name":"LP 245-10","x":-18.9688,"y":-13.875,"z":-24.2812},\
"25":{"name":"Epsilon Indi","x":3.125,"y":-8.875,"z":7.125},\
"26":{"name":"Barnard\'s Star","x":-3.03125,"y":1.375,"z":4.9375},\
"27":{"name":"Epsilon Eridani","x":1.9375,"y":-7.75,"z":-6.84375},\
"28":{"name":"Narenses","x":-1.15625,"y":-11.0312,"z":21.875},\
"29":{"name":"Wolf 359","x":3.875,"y":6.46875,"z":-1.90625},\
"30":{"name":"LAWD 26","x":20.9062,"y":-7.5,"z":3.75},\
"31":{"name":"Avik","x":13.9688,"y":-4.59375,"z":-6.0},\
"32":{"name":"George Pantazis","x":-12.0938,"y":-16.0,"z":-14.2188}}'
lookup = json.loads(lookup)
lowest_total = 9999
# create 2D array for the distances and called it b to keep code looking clean.
b = [[0 for i in range(33)] for j in range(33)]
for x in range(33):
for y in range(33):
if x == y:
continue
else:
b[x][y] = math.sqrt(((lookup[str(x)]["x"] - lookup[str(y)]['x']) ** 2) + ((lookup[str(x)]['y'] - lookup[str(y)]['y']) ** 2) + ((lookup[str(x)]['z'] - lookup[str(y)]['z']) ** 2))
# begin timer
start_date = datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")
start = datetime.datetime.now()
print("[{}] Start".format(start_date))
# main iteration loop
for x in range(50_000_000):
distance = b[ip[0]][ip[1]] + b[ip[1]][ip[2]] + b[ip[2]][ip[3]] +\
b[ip[3]][ip[4]] + b[ip[4]][ip[5]] + b[ip[5]][ip[6]] +\
b[ip[6]][ip[7]] + b[ip[7]][ip[8]] + b[ip[8]][ip[9]] +\
b[ip[9]][ip[10]] + b[ip[10]][ip[11]] + b[ip[11]][ip[12]] +\
b[ip[12]][ip[13]] + b[ip[13]][ip[14]] + b[ip[14]][ip[15]] +\
b[ip[15]][ip[16]] + b[ip[16]][ip[17]] + b[ip[17]][ip[18]] +\
b[ip[18]][ip[19]] + b[ip[19]][ip[20]] + b[ip[20]][ip[21]] +\
b[ip[21]][ip[22]] + b[ip[22]][ip[23]] + b[ip[23]][ip[24]] +\
b[ip[24]][ip[25]] + b[ip[25]][ip[26]] + b[ip[26]][ip[27]] +\
b[ip[27]][ip[28]] + b[ip[28]][ip[29]] + b[ip[29]][ip[30]] +\
b[ip[30]][ip[31]] + b[ip[31]][ip[32]]
if distance < lowest_total:
lowest_total = distance
ip = next_lexicographic_permutation(ip)
# end timer
finish_date = datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")
finish = datetime.datetime.now()
print("[{}] Finish".format(finish_date))
diff = finish - start
print("Time taken => {}".format(diff))
print("Lowest distance => {}".format(lowest_total))
This is the result of a lot of work to make things faster. I was initially using string look-ups to find the distance to be calculated with a dict having keys like "1-2", but very quickly found out that it was very slow, I then moved onto hashed versions of the "1-2" key and the speed increased but the fastest way I have found so far is using a 2D array and looking up the values from there.
I have also found that manually constructing the distance calculation saved time over having a for x in ranges(32): loop adding the distances up and incrementing a variable to get the total.
Another great speed up was using pypy3 instead of python3 to execute it.
This usually takes 11 seconds to complete using pypy3
running 50 million of the distance calculation on its own takes 5.2 seconds
running 50 million of the next_lexicographic_permutation function on its own takes 6 seconds
I can't think of any way to make this faster and I believe there may be optimizations to be made in the next_lexicographic_permutation function. From what I've read about this the main bottleneck seems to be the switching of positions in the array:
x[i], x[j] = x[j], x[i]
Edit : added clarification of lifetime to represent human lifetime
The brute-force approach of calculating all the distances is going to be slower than a partitioning approach. Here is a similar question for the 3D case.
In equal-width discretization, the variable values are assigned to intervals of the same width. The number of intervals is user-defined and the width is determined by the minimum/maximum values and the number of intervals.
For example, given the values 10, 20, 100, 130 the minimum is 10 and the maximum is 130. If the user defines the number of intervals as six, given the formula:
Interval Width = (Max(x) - Min(x)) / N
The width is (130 - 10) / 6 = 20
And the six zero-based intervals are: [ 10, 30, 50, 70, 90, 110, 130]
Finally, the interval assignments are defined for each element in the dataset:
Value in the dataset New feature engineered value
10 0
20 0
57 2
101 4
130 5
I have the following code that uses a pandas dataframe with a sklean function to divide the dataframe in equal width intervals:
from sklearn.preprocessing import KBinsDiscretizer
discretizer = KBinsDiscretizer(n_bins=10, encode='ordinal', strategy='uniform')
df['output_col'] = discretizer.fit_transform(df[['input_col']])
This works fine, but I need to implement an equivalent dask function that will trigger the process in parallel in multiple partitions, and I cannot find KBinsDiscretizer in dask_ml.preprocessing Any suggestions? I cannot use map_partitions because it will apply the function to each partition independently, and I need the intervals applied to the entire dataframe.
You're facing a common tradeoff with distributed workflows. Do you want to spend the time/resource/compute required to determine the exact min/max, which is a pre-requisite for the binning scheme you describe, or is an approximate answer alright? If the latter, how do you design an algorithm which adequately captures the data's min/max while remaining efficient?
We can start with the exact solution, since it's easier to implement. The key is simply to find the min and max first, then digitize the data. Note that this requires computing all values in the column twice. If persisting the data is an option (e.g. you are working with a distributed cluster or can fit the column to be binned in memory), it would help avoid unecessary repetition:
def discretize_exact(
s: dask.dataframe.Series, K: int
) -> dask.dataframe.Series:
"""
Discretize values in dask.dataframe Series into K equal-width bins
Parameters
----------
s : dask.dataframe.Series
Series with values to be binned
K : int
Number of equal-width bins to generate
Returns
-------
binned : dask.dataframe.Series
dask.dataframe.Series with scheduled np.digitize operation
called using map_partitions. The values in ``binned`` will
be in [0, K] giving the index of the K bins in the interval
[vmin, vmax].
"""
# schedule the min/max computation
vmin, vmax = s.min(), s.max()
# compute vmin and vmax together so we only compute once
vmin, vmax = dask.compute(vmin, vmax)
# will create K - 1 equal width bins, with
# the outer ends open, such that the first bin will be
# (-inf, vmin + step) and the last will be [vmax - step, inf)
bins = np.linspace(vmin, vmax, (K + 1))[1:-1]
return s.map_partitions(
np.digitize,
bins=bins,
meta=('binned', 'uint16'),
)
This does (I think) what you're looking for, but does involve computing the min and max first prior to scheduling the binning operation. Using an example frame:
import dask.dataframe, pandas as pd, numpy as np
N = 10000
df = dask.dataframe.from_pandas(
pd.DataFrame({'a': np.random.random(size=N)}),
chunksize=1000,
)
We can use the above function to discretize our data:
In [68]: df['binned_a'] = discretize_exact(df['a'], K=10)
In [69]: df
Out[69]:
Dask DataFrame Structure:
a binned_a
npartitions=10
0 float64 uint16
1000 ... ...
... ... ...
9000 ... ...
9999 ... ...
Dask Name: assign, 40 tasks
In [70]: df.compute()
Out[70]:
a binned_a
0 0.548415 5
1 0.872668 8
2 0.466869 4
3 0.133986 1
4 0.833126 8
... ... ...
9995 0.223438 2
9996 0.575271 5
9997 0.922593 9
9998 0.030127 0
9999 0.204283 2
[10000 rows x 2 columns]
Alternatively, you could try to approximate the bin edges. You could do this a number of ways, including sampling the dataframe to identify the min/max of one or more partitions, or you the user could provide an overly wide-estimate of the range. Note that, depending on your workflow, computing the first partition may still involve computing a large part of the overall graph, or even the entire graph if e.g. the dataframe was reshuffled in a recent step.
def find_minmax_of_first_partition(
s: dask.dataframe.Series
) -> tuple[float, float]:
"""
Find the min and max of the first partition of a dask.dataframe.Series
"""
partition_0_stats = (
s.partitions[0].compute().agg(['min', 'max'])
)
return (
partition_0_stats['min'].item(),
partition_0_stats['max'].item(),
)
You could widen this range if desired, using your intuition about the spread of the values:
vmin_p0, vmax_p0 = find_minmax_of_first_partition(df['a'])
range_p0 = (vmax_p0 - vmin_p0)
mean_p0 = (vmin_p0 + vmax_p0) / 2
# guess that the overall data is within 10x the range of partition 1
min_est, max_est = mean_p0 - 5*range_p0, mean_p0 + 5*range_p0
# now, bin all values using this estimated min, max. Note that
# any data falling outside your estimated min/max value will be
# coded as values 0 or K + 1.
bins = np.linspace(min_est, max_est, (K + 1))
binned = s.map_partitions(
np.digitize,
bins=bins,
meta=('binned', 'uint16'),
)
these bins will be equally spaced, but will not necessarily start/end at the min/max and therefore may either not catch all the data or may have empty bins at the edges. You may need to take a look at how your bin specification performs and iterate based on your data.
I'm doing a problem that n people is standing on a line and each person knows their own position and speed. I'm asked to find the minimal time to have all people go to any spot.
Basically what I'm doing is finding the minimal time using binary search and have every ith person's furthest distance to go in that time in intervals. If all intervals overlap, there is a spot that everyone can go to.
I have a solution to this question but the time limit exceeded for it for my bad solution to find the intervals. My current solution runs too slow and I'm hoping to get a better solution.
my code:
people = int(input())
peoplel = [list(map(int, input().split())) for _ in range(people)] # first item in people[i] is the position of each person, the second item is the speed of each person
def good(time):
return checkoverlap([[i[0] - time *i[1], i[0] + time * i[1]] for i in peoplel])
# first item,second item = the range of distance a person can go to
def checkoverlap(l):
for i in range(len(l) - 1):
seg1 = l[i]
for i1 in range(i + 1, len(l)):
seg2 = l[i1]
if seg2[0] <= seg1[0] <= seg2[1] or seg1[0] <= seg2[0] <= seg1[1]:
continue
elif seg2[0] <= seg1[1] <= seg2[1] or seg1[0] <= seg2[1] <= seg1[1]:
continue
return False
return True
(this is my first time asking a question so please inform me about anything that is wrong)
One does simply go linear
A while after I finished the answer I found a simplification that removes the need for sorting and thus allows us to further reduce the complexity of finding if all the intervals are overlapping to O(N).
If we look at the steps that are being done after the initial sort we can see that we are basically checking
if max(lower_bounds) < min(upper_bounds):
return True
else:
return False
And since both min and max are linear without the need for sorting, we can simplify the algorithm by:
Creating an array of lower bounds - one pass.
Creating an array of upper bounds - one pass.
Doing the comparison I mentioned above - two passes over the new arrays.
All this could be done together in one one pass to further optimize(and to prevent some unnecessary memory allocation), however this is clearer for the explanation's purpose.
Since the reasoning about the correctness and timing was done in the previous iteration, I will skip it this time and keep the section below since it nicely shows the thought process behind the optimization.
One sort to rule them all
Disclaimer: This section was obsoleted time-wise by the one above. However since it in fact allowed me to figure out the linear solution, I'm keeping it here.
As the title says, sorting is a rather straightforward way of going about this. It will require a little different data structure - instead of holding every interval as (min, max) I opted for holding every interval as (min, index), (max, index).
This allows me to sort these by the min and max values. What follows is a single linear pass over the sorted array. We also create a helper array of False values. These represent the fact that at the beginning all the intervals are closed.
Now comes the pass over the array:
Since the array is sorted, we first encounter the min of each interval. In such case, we increase the openInterval counter and a True value of the interval itself. Interval is now open - until we close the interval, the person can arrive at the party - we are within his(or her) range.
We go along the array. As long as we are opening the intervals, everything is ok and if we manage to open all the intervals, we have our party destination where all the social distancing collapses. If this happens, we return True.
If we close any of the intervals, we have found our party breaker who can't make it anymore. (Or we can discuss that the party breakers are those who didn't bother to arrive yet when someone has to go already). We return False.
The resulting complexity is O(Nlog(N)) caused by the initial sort since the pass itself is linear in nature. This is quite a bit better than the original O(n^2) caused by the "check all intervals pairwise" approach.
The code:
import numpy as np
import cProfile, pstats, io
#random data for a speed test. Not that useful for checking the correctness though.
testSize = 10000
x = np.random.randint(0, 10000, testSize)
y = np.random.randint(1, 100, testSize)
peopleTest = [x for x in zip(x, y)]
#Just a basic example to help with the reasoning about the correctness
peoplel = [(1, 2), (3, 1), (8, 1)]
# first item in people[i] is the position of each person, the second item is the speed of each person
def checkIntervals(people, time):
a = [(x[0] - x[1] * time, idx) for idx, x in enumerate(people)]
b = [(x[0] + x[1] * time, idx) for idx, x in enumerate(people)]
checks = [False for x in range(len(people))]
openCount = 0
intervals = [x for x in sorted(a + b, key=lambda x: x[0])]
for i in intervals:
if not checks[i[1]]:
checks[i[1]] = True
openCount += 1
if openCount == len(people):
return True
else:
return False
print(intervals)
def good(time, people):
return checkoverlap([[i[0] - time * i[1], i[0] + time * i[1]] for i in people])
# first item,second item = the range of distance a person can go to
def checkoverlap(l):
for i in range(len(l) - 1):
seg1 = l[i]
for i1 in range(i + 1, len(l)):
seg2 = l[i1]
if seg2[0] <= seg1[0] <= seg2[1] or seg1[0] <= seg2[0] <= seg1[1]:
continue
elif seg2[0] <= seg1[1] <= seg2[1] or seg1[0] <= seg2[1] <= seg1[1]:
continue
return False
return True
pr = cProfile.Profile()
pr.enable()
print(checkIntervals(peopleTest, 10000))
print(good(10000, peopleTest))
pr.disable()
s = io.StringIO()
sortby = "cumulative"
ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
ps.print_stats()
print(s.getvalue())
The profiling stats for the pass over test array with 10K random values:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.001 0.001 8.933 8.933 (good)
1 8.925 8.925 8.926 8.926 (checkoverlap)
1 0.003 0.003 0.023 0.023 (checkIntervals)
1 0.008 0.008 0.010 0.010 {built-in method builtins.sorted}
I am trying to apply a function to a large range of numbers - and the version where I use a pool from multiprocessing takes much longer to finish than what I estimate for a "single process" version -
Is this a problem with my code? Or Python? Or Linux?
The function that I am using is is_solution defined below-
as_ten_digit_string = lambda x: f"0000000000{x}"[-10:]
def sum_of_digits(nstr):
return sum([int(_) for _ in list(nstr)])
def is_solution(x):
return sum_of_digits(as_ten_digit_string(x)) == 10
When I run is_solution on a million numbers - it takes about 2 seconds
In [13]: %timeit [is_solution(x) for x in range(1_000_000)]
1.9 s ± 18.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Based on this - for ~10 Billion numbers - it should take about 20,000 seconds or around 6 hours. But the multiprocessing version doesn't end even after 9 hours.
I am using the multiprocessing module like this -
from multiprocessing import Pool
with Pool(processes=24) as p:
for solution in p.imap_unordered(is_solution, range(1_000_000_000, 9_999_999_999)):
if solution:
print(solution)
The python version I am using is 3.8 on linux.
I don't know if this is relevant - when I run the top command in linux - I see that when my main program has run for ~200 minutes - each of my worker processes has a CPU Time of about 20 minutes.
Multiprocessing is not free. If you have X cpu cores then spawning more than X processes will eventually lead to performance degradation. If your processes do I/O then spawning even 10*X process may be ok. Because they don't strain cpu. However if your processes do calculations and memory manipulation then it might be that any process above X only degrades performance. In comments you've said that you have 4 cores, so you should set Pool(processes=4). You may experiment with different values as well. Multiprocessing is hard, it may be that 5 or even 8 processes will still increase performance. But it is extremely likely that 24 processes over 4 cpu cores only hurts performance.
The other thing that you can do is to send data to subprocesses in batches. At the moment you send data one by one and since your calculation is fast (for a single datapoint) it may be that the interprocess communication dominates total execution time. This is a price that you do not pay in the single process scenario but you always pay when multiprocessing. To minimize its effect use chunksize parameter of imap_unordered.
Finally, try to reimplement your algorithm to avoid brute force. As suggested by #Alex.
def solution(n, sum):
"""Generates numbers of n digits with the given total sum"""
if n == 1 and sum < 10:
yield str(sum)
return
if n < 1 or (sum > 9 and n < 2):
return
if sum == 0:
yield "0" * n
return
for digit in range(min(sum + 1, 10)):
for s in solution(n - 1, sum - digit):
yield str(digit) + s
# Print all 4-digits numbers with total sum 10
for s in solution(4, 10):
print(s)
# Print all 4-digits numbers with total sum 10, not starting with zero
for digit in range(1, 10):
for s in solution(3, 10 - digit):
print(str(digit) + s)