numpy apply_along_axis vectorisation - python-3.x

I am trying to implement a function that takes each row in a numpy 2d array and returns me scalar result of a certain calculations. My current code looks like the following:
img = np.array([
[0, 5, 70, 0, 0, 0 ],
[10, 50, 4, 4, 2, 0 ],
[50, 10, 1, 42, 40, 1 ],
[10, 0, 0, 6, 85, 64],
[0, 0, 0, 1, 2, 90]]
)
def get_y(stride):
stride_vals = stride[stride > 0]
pix_thresh = stride_vals.max() - 1.5*stride_vals.std()
return np.argwhere(stride>pix_thresh).mean()
np.apply_along_axis(get_y, 0, img)
>> array([ 2. , 1. , 0. , 2. , 2.5, 3.5])
It works as expected, however, performance isn't great as in real dataset there are ~2k rows and ~20-50 columns for each frame, coming 60 times a second.
Is there a way to speed-up the process, perhaps by not using np.apply_along_axis function?

Here's one vectorized approach setting the zeros as NaN and that let's us use np.nanmax and np.nanstd to compute those max and std values avoiding the zeros, like so -
imgn = np.where(img==0, np.nan, img)
mx = np.nanmax(imgn,0) # np.max(img,0) if all are positive numbers
st = np.nanstd(imgn,0)
mask = img > mx - 1.5*st
out = np.arange(mask.shape[0]).dot(mask)/mask.sum(0)
Runtime test -
In [94]: img = np.random.randint(-100,100,(2000,50))
In [95]: %timeit np.apply_along_axis(get_y, 0, img)
100 loops, best of 3: 4.36 ms per loop
In [96]: %%timeit
...: imgn = np.where(img==0, np.nan, img)
...: mx = np.nanmax(imgn,0)
...: st = np.nanstd(imgn,0)
...: mask = img > mx - 1.5*st
...: out = np.arange(mask.shape[0]).dot(mask)/mask.sum(0)
1000 loops, best of 3: 1.33 ms per loop
Thus, we are seeing a 3x+ speedup.

Related

Neighbors sum of numpy array with mask

I have two large arrays, one containing values, and one being a mask basically. The code below shows the function I want to implement.
from scipy.signal import convolve2d
import numpy as np
sample = np.array([[6, 4, 5, 5, 5],
[7, 1, 0, 8, 3],
[2, 5, 4, 8, 4],
[2, 0, 2, 6, 0],
[5, 7, 2, 3, 2]])
mask = np.array([[1, 0, 1, 1, 0],
[0, 0, 1, 0, 1],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 0],
[1, 1, 0, 0, 1]])
neighbors_sum = convolve2d(sample, np.ones((3,3), dtype=int), mode='same', boundary='wrap')
# neighbors_sum = np.array([[40, 37, 35, 33, 44],
# [37, 34, 40, 42, 48],
# [24, 23, 34, 35, 40],
# [27, 29, 37, 31, 32],
# [31, 33, 34, 30, 34]])
result = np.where(mask, neighbors_sum, 0)
print(result)
This code works, and gets me what I expects:
np.array([[40, 0, 35, 33, 0],
[ 0, 0, 40, 0, 48],
[ 0, 23, 0, 0, 0],
[ 0, 0, 0, 31, 0],
[31, 33, 0, 0, 34]])
So far, so good. However, where I'm encountering some large issue is when I increase the size of the arrays. In my case, instead of a 5x5 input and a 3x3 summing mask, I need a 50,000x20,000 input and a 100x100 summing mask. And when I move to that, the convolve2d function is in all kinds of trouble and the calculation is extremely long.
Given that I only care about the masked result, and thus only care about the summation from convolve2d at those points, can anyone think of a smart approach to take here? Going to a for loop and selecting only the points of interest would lose the speed advantage of the vectorization so I'm not convinced this would be worth it.
Any suggestion welcome!
convolve2d is very inefficient in this case. Since the mask is np.ones, you can split the filter in two trivial ones thanks to separable filtering: one np.ones(100, 1) filter and one np.ones(1, 100) filter. Moreover, a rolling sum can be used to speed up even more the computation.
Here is a simple solution without a rolling sum:
# Simple faster implementation
tmp = convolve2d(sample, np.ones((1,100), dtype=int), mode='same', boundary='wrap')
neighbors_sum = convolve2d(tmp, np.ones((100,1), dtype=int), mode='same', boundary='wrap')
result = np.where(mask, neighbors_sum, 0)
You can compute the rolling sum efficiently using Numba. The strategy is to split the computation in 3 parts: the horizontal rolling sum, the vertical rolling sum and the final masking. Each step can be fully parallelized using multiple threads (although parallelizing the vertical rolling sum is harder with Numba). Each part needs to work line by line so to be cache friendly.
# Complex very-fast implementation
import numba as nb
# Numerical results may diverge if the input contains big
# values with many small ones.
# Does not support inputs containing NaN values or +/- Inf ones.
#nb.njit('float64[:,::1](float64[:,::1], int_)', parallel=True, fastmath=True)
def horizontalRollingSum(sample, filterSize):
n, m = sample.shape
fs = filterSize
# Make the wrapping part of the rolling sum much simpler
assert fs >= 1
assert n >= fs and m >= fs
# Horizontal rolling sum.
tmp = np.empty((n, m), dtype=np.float64)
for i in nb.prange(n):
s = 0.0
lShift = fs//2
rShift = (fs-1)//2
for j in range(m-lShift, m):
s += sample[i, j]
for j in range(0, rShift+1):
s += sample[i, j]
tmp[i, 0] = s
for j in range(1, m):
jLeft, jRight = (j-1-lShift)%m, (j+rShift)%m
s += sample[i, jRight] - sample[i, jLeft]
tmp[i, j] = s
return tmp
#nb.njit('float64[:,::1](float64[:,::1], int_)', fastmath=True)
def verticaltalRollingSum(sample, filterSize):
n, m = sample.shape
fs = filterSize
# Make the wrapping part of the rolling sum much simpler
assert fs >= 1
assert n >= fs and m >= fs
# Horizontal rolling sum.
tmp = np.empty((n, m), dtype=np.float64)
tShift = fs//2
bShift = (fs-1)//2
for j in range(m):
tmp[0, j] = 0.0
for i in range(n-tShift, n):
for j in range(m):
tmp[0, j] += sample[i, j]
for i in range(0, bShift+1):
for j in range(m):
tmp[0, j] += sample[i, j]
for i in range(1, n):
iTop = (i-1-tShift)%n
iBot = (i+bShift)%n
for j in range(m):
tmp[i, j] = tmp[i-1, j] + (sample[iBot, j] - sample[iTop, j])
return tmp
#nb.njit('float64[:,::1](float64[:,::1], int_[:,::1], int_)', parallel=True, fastmath=True)
def compute(sample, mask, filterSize):
n, m = sample.shape
tmp = horizontalRollingSum(sample, filterSize)
neighbors_sum = verticaltalRollingSum(tmp, filterSize)
res = np.empty((n, m), dtype=np.float64)
for i in nb.prange(n):
for j in range(n):
res[i, j] = neighbors_sum[i, j] * mask[i, j]
return res
Benchmark & Notes
Here is the testing code:
n, m = 5000, 2000
sample = np.random.rand(n, m)
mask = (np.random.rand(n, m) < 0.05).astype(int)
Here are the results on my 6-core machine:
Initial solution: 174366 ms (x1)
With separate filters: 5710 ms (x31)
Final Numba solution: 40 ms (x4359)
Optimal theoretical time: 10 ms (optimistic)
Thus, the Numba implementation is 4359 times faster than the initial one.
That being said, be careful of possible numerical issues that this last implementation can have regarding the input array (see the comments in the code). It should be fine as long as np.std(sample) is relatively small and np.all(np.isfinite(sample)) is true.
Note that the code can be further optimized: the vertical rolling sum can be parallelized; modulus operations can be avoided in the horizontal rolling sum; the vertical rolling sum and the masking steps can be merged together (ie. by computing res on-the-fly and not storing tmp); tiling can be used to compute all the steps simultaneously in a more cache-friendly way. However, these optimizations make the code more complex and some of them are very hard to perform (especially the last one with Numba).
Note that using a boolean mask (instead of an integer-based one) should make the algorithm faster since it takes less memory and processors can fetch values faster.

Pytorch, retrieving values from a tensor using several indices. Most computationally efficient solution

If I have an example 3d tensor
a = [[4, 2, 1, 6],[1, 2, 3, 8], [92, 4, 23, 54]]
tensor_a = torch.tensor(a)
I can get 2 of the 1D tensors along the first dimension using
tensor_a[[0, 1]]
tensor([[4, 2, 1, 6],
[1, 2, 3, 8]])
But how about using several indices?
So I have something like this
list_indices = [[0, 0], [0,2], [1, 2]]
I could do something like
combos = []
for indi in list_indices:
combos.append(tensor_a[indi])
But I'm wondering if since there's a for loop, if there's a more computationally way to do this, perhaps also using pytorch
It is more computationally effecient to use the predefined Pytorch function "torch.index_select" to select tensor elements using a list of indices:
a = [[4, 2, 1, 6],[1, 2, 3, 8], [92, 4, 23, 54]]
tensor_a = torch.tensor(a)
list_indices = [[0, 0], [0,2], [1, 2]]
#convert list_indices to Tensor
indices = torch.tensor(list_indices)
#get elements from tensor_a using indices.
tensor_a=torch.index_select(tensor_a, 0, indices.view(-1))
print(tensor_a)
if you want the result to be a list not a tensors, you can convert tensor_a to a list:
tensor_a_list = tensor_a.tolist()
To test the computational efficiency I created 1000000 indices and I compared the execution time. Using the loop takes more time then using my suggested pytorch approach:
import time
import torch
start_time = time.time()
a = [[4, 2, 1, 6],[1, 2, 3, 8], [92, 4, 23, 54]]
tensor_a = torch.tensor(a)
indices = torch.randint(0, 2, (1000000,)).tolist()
for indi in indices:
combos.append(tensor_a[indi])
print("--- %s seconds ---" % (time.time() - start_time))
--- 3.3966853618621826 seconds ---
start_time = time.time()
indices = torch.tensor(indices)
tensor_a=torch.index_select(tensor_a, 0, indices)
print("--- %s seconds ---" % (time.time() - start_time))
--- 0.10641193389892578 seconds ---

How to initialize an 3D array variable in Gekko?

I'm trying to solve a step from a three-dimensional master timetabling model, which involvs periods(5), courses(19) and locations(8).
So I have a problem to initialize these variables with an 3D array in Gekko. Without this initialization the algorithm doesn't converge, after more than 15 minutes run and 1000 iterations.
When I try initialize, this error appears:
"
raise Exception(response)
Exception: #error: Equation Definition
Equation without an equality (=) or inequality (>,<)
true
STOPPING...
"
How can I fix this problem? Follows a version of my code:
import numpy as np
from gekko import GEKKO
# Input data
# Schedule of periods and courses
sched = np.array([ [0, 1, 0, 0, 1], [0, 0, 1, 1, 0], [0, 0, 1, 1, 0], \
[0, 0, 0, 0, 1], [1, 0, 0, 0, 1], [0, 0, 0, 1, 1], [0, 1, 1, 0, 0], \
[1, 0, 0, 1, 0], [0, 1, 0, 0, 1], [1, 1, 0, 0, 0], [0, 1, 1, 0, 0], \
[0, 1, 1, 0, 0], [1, 0, 0, 1, 0], [1, 0, 0, 1, 0], [0, 0, 1, 0, 1], \
[1, 0, 1, 0, 0], [0, 1, 0, 1, 0], [0, 0, 1, 1, 0], [0, 1, 0, 0, 1] ], dtype=np.int64)
# Initial allocation of all periods, courses and locations
alloc=np.array([0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,\
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,\
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,\
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,\
0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,\
0,0,0,0,0,0,1,1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,\
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,\
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,\
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,\
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,\
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,\
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,\
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,\
0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,\
0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,\
0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,\
0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,\
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], dtype=np.int64)
# Number of students enrolled in each course
enrol = np.array([ 60, 60, 60, 40, 40, 110, 120, 50, 60, 55, 50, \
55, 40, 64, 72, 50, 50, 55, 55], dtype=np.float64)
# Capacity of each location (classroom)
capac = np.array([ 60, 60, 120, 60, 80, 60, 60, 65], dtype=np.float64)
# Total costs of using each location
costs = np.array([ 9017.12, 9017.12, 12050.24, 9017.12, 9413.68, 9017.12, \
9017.12, 9188.96 ])
# Estimated cost of each location by period and student
ecost = np.repeat(np.array([[costs[i]*pow(enrol[j]*5,-1) for j in range(19)] for i in range(8)]), 5)
# The model construction
m = GEKKO()
# Constant arrays
x = m.Array(m.Const,(19,5))
y = m.Array(m.Const,(8,19,5))
N = m.Array(m.Const,(19))
C = m.Array(m.Const,(8))
Ec = m.Array(m.Const,(8,19,5))
Ecy = m.Array(m.Const,(8,19,5))
Alt = m.Array(m.Const,(8,19,5))
for k in range(5):
for j in range(19):
N[j] = enrol[j]
x[j,k] = sched[j,k]
for i in range(8):
C[i] = capac[i]
Ec[i,j,k] = ecost[k+j*5+i*19*5]
y[i,j,k] = alloc[k+j*5+i*19*5]
Ecy[i,j,k] = Ec[i,j,k]*y[i,j,k]
if sched[j,k]==1:
Alt[i,j,np.where(sched[j,:]==1)[0][0]]=-sched[j,k]*(1-sum(sched[j,:]))
if sum(sched[j,:])==2:
Alt[i,j,np.where(sched[j,:]==1)[0][1]]=sched[j,k]*(1-sum(sched[j,:]))
else:
Alt[i,j,k]=0
# Initialize the variable z with the initial value y:
# These commented approaches produce the error.
z = m.Array(m.Var,(8,19,5),lb=0,ub=1,integer=True)
#for i in range(8):
# for j in range(19):
# for k in range(5):
# z[i,j,k] = y[i,j,k]
# nor
#z = m.Array(m.Var,(8,19,5),value=y,lb=0,ub=1,integer=True)
# Intermediate equations
Ecz = m.Array(m.Var,(8,19,5),lb=0)
Altz = m.Array(m.Var,(8,19))
for i in range(8):
for j in range(19):
Altz[i,j]=m.Intermediate(m.sum(Alt[i,j,:]*z[i,j,:]))
for k in range(5):
Ecz[i,j,k]=m.Intermediate(Ec[i,j,k]*z[i,j,k])
# Constraints
m.Equation(m.sum(m.sum(m.sum(Ecz)))<=m.sum(m.sum(m.sum(Ecy))))
for j in range(19):
for k in range(5):
m.Equation(m.sum(z[:,j,k])==x[j,k])
for i in range(8):
for k in range(5):
m.Equation(m.sum(z[i,:,k])==m.sum(y[i,:,k]))
for i in range(8):
for j in range(19):
m.Equation(m.sum((C[i]/N[j]-x[j,:])*z[i,j,:])>=0)
# Objective: to minimize the quantity of courses allocated in different locations
# Example: with the solution y, I have 12 courses in different locations in the periods
# print(sum([sum(Alt[i,j,:]*y[i,j,:])**2 for j in range(19) for i in range(8)])/2)
for i in range(8):
for j in range(19):
m.Obj(Altz[i,j]**2/2)
# Options and final results
m.options.SOLVER=1
m.options.IMODE=2
m.solve()
print(z)
print(m.options.OBJFCNVAL)
Note: My original problem has 20 periods, 171 courses, and 18 locations.
Use z[i,j,k].value = y[i,j,k] to give an initial guess for z. Using z[i,j,k] = y[i,j,k] redefines z entries as floating point numbers instead of gekko variable types.
One other issue is that the variables Ecz and Altz are defined as Variables as m.Var and then overridden as Intermediates. Instead, try allocating them and assigning them as intermediates:
Ecz = np.empty((8,19,5),dtype=object)
Altz = np.empty((8,19),dtype=object)
Use flatten() to simplify the summation of all elements of the 3 dimensional array.
m.Equation(m.sum(Ecz.flatten())<=sum(Ecy.flatten()))
The constant arrays can be defined as numpy arrays to avoid additional symbolic processing by Gekko. This speeds up the model compile time but has no effect on the final solution.
x = np.empty((19,5))
y = np.empty((8,19,5))
N = np.empty((19))
C = np.empty((8))
Ec = np.empty((8,19,5))
Ecy = np.empty((8,19,5))
Alt = np.empty((8,19,5))
The IMODE should be 3 for optimization. IMODE=2 is for parameter regression. IMODE=2 should also work for this problem but 3 is the correct option because you aren't trying to fit to data.
m.options.IMODE=3
Try using IPOPT to obtain an initial non-integer solution and then use APOPT to find an integer solution.
m.solver_options = ['minlp_gap_tol 1.0e-2',\
'minlp_maximum_iterations 10000',\
'minlp_max_iter_with_int_sol 500',\
'minlp_branch_method 1']
Mixed Integer Nonlinear Programming (MINLP) problems can be challenging to solve so you may need to use some of the solver options to speed up the solution. Try minlp_branch_method 1 to help the solver find an initial integer solution to do better pruning. The gap tolerance can also help to speed up the solution if a sub-optimal solution is okay. Below is the complete script. Consider using remote=False to run locally instead of using the public servers, especially for large optimization problems.
import numpy as np
from gekko import GEKKO
# Input data
# Schedule of periods and courses
sched = np.array([ [0, 1, 0, 0, 1], [0, 0, 1, 1, 0], [0, 0, 1, 1, 0], \
[0, 0, 0, 0, 1], [1, 0, 0, 0, 1], [0, 0, 0, 1, 1], [0, 1, 1, 0, 0], \
[1, 0, 0, 1, 0], [0, 1, 0, 0, 1], [1, 1, 0, 0, 0], [0, 1, 1, 0, 0], \
[0, 1, 1, 0, 0], [1, 0, 0, 1, 0], [1, 0, 0, 1, 0], [0, 0, 1, 0, 1], \
[1, 0, 1, 0, 0], [0, 1, 0, 1, 0], [0, 0, 1, 1, 0], [0, 1, 0, 0, 1] ], dtype=np.int64)
# Initial allocation of all periods, courses and locations
alloc=np.array([0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,\
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,\
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,\
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,\
0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,\
0,0,0,0,0,0,1,1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,\
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,\
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,\
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,\
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,\
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,\
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,\
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,\
0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,\
0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,\
0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,\
0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,\
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], dtype=np.int64)
# Number of students enrolled in each course
enrol = np.array([ 60, 60, 60, 40, 40, 110, 120, 50, 60, 55, 50, \
55, 40, 64, 72, 50, 50, 55, 55], dtype=np.float64)
# Capacity of each location (classroom)
capac = np.array([ 60, 60, 120, 60, 80, 60, 60, 65], dtype=np.float64)
# Total costs of using each location
costs = np.array([ 9017.12, 9017.12, 12050.24, 9017.12, 9413.68, 9017.12, \
9017.12, 9188.96 ])
# Estimated cost of each location by period and student
ecost = np.repeat(np.array([[costs[i]*pow(enrol[j]*5,-1) for j in range(19)] for i in range(8)]), 5)
# The model construction
m = GEKKO(remote=True)
# Constant arrays
x = np.empty((19,5))
y = np.empty((8,19,5))
N = np.empty((19))
C = np.empty((8))
Ec = np.empty((8,19,5))
Ecy = np.empty((8,19,5))
Alt = np.empty((8,19,5))
for k in range(5):
for j in range(19):
N[j] = enrol[j]
x[j,k] = sched[j,k]
for i in range(8):
C[i] = capac[i]
Ec[i,j,k] = ecost[k+j*5+i*19*5]
y[i,j,k] = alloc[k+j*5+i*19*5]
Ecy[i,j,k] = Ec[i,j,k]*y[i,j,k]
if sched[j,k]==1:
Alt[i,j,np.where(sched[j,:]==1)[0][0]]=-sched[j,k]*(1-sum(sched[j,:]))
if sum(sched[j,:])==2:
Alt[i,j,np.where(sched[j,:]==1)[0][1]]=sched[j,k]*(1-sum(sched[j,:]))
else:
Alt[i,j,k]=0
# Initialize the variable z with the initial value y:
# These commented approaches produce the error.
z = m.Array(m.Var,(8,19,5),lb=0,ub=1,integer=True)
for i in range(8):
for j in range(19):
for k in range(5):
z[i,j,k].value = y[i,j,k]
# nor
#z = m.Array(m.Var,(8,19,5),value=y,lb=0,ub=1,integer=True)
# Intermediate equations
Ecz = np.empty((8,19,5),dtype=object)
Altz = np.empty((8,19),dtype=object)
for i in range(8):
for j in range(19):
Altz[i,j]=m.Intermediate(m.sum(Alt[i,j,:]*z[i,j,:]))
for k in range(5):
Ecz[i,j,k]=m.Intermediate(Ec[i,j,k]*z[i,j,k])
# Constraints
m.Equation(m.sum(Ecz.flatten())<=sum(Ecy.flatten()))
for j in range(19):
for k in range(5):
m.Equation(m.sum(z[:,j,k])==x[j,k])
for i in range(8):
for k in range(5):
m.Equation(m.sum(z[i,:,k])==m.sum(y[i,:,k]))
for i in range(8):
for j in range(19):
m.Equation(m.sum((C[i]/N[j]-x[j,:])*z[i,j,:])>=0)
# Objective: to minimize the quantity of courses allocated in different locations
# Example: with the solution y, I have 12 courses in different locations in the periods
# print(sum([sum(Alt[i,j,:]*y[i,j,:])**2 for j in range(19) for i in range(8)])/2)
for i in range(8):
for j in range(19):
m.Obj(Altz[i,j]**2/2)
# Options and final results
m.options.IMODE=3
# Initialize with IPOPT
m.options.SOLVER=3
m.solve()
# Integer solution with APOPT
m.options.SOLVER=1
m.solver_options = ['minlp_gap_tol 1.0e-2',\
'minlp_maximum_iterations 10000',\
'minlp_max_iter_with_int_sol 500',\
'minlp_branch_method 1']
m.solve()
print(z)
print(m.options.OBJFCNVAL)

Is there any Softmax implementation with sections along the dim (blocky Softmax) in PyTorch?

For example, given logits, dim, and boundary,
boundary = torch.tensor([[0, 3, 4, 8, 0]
[1, 3, 5, 7, 9]]
# representing sections look like:
# [[00012222_]
# [_00112233]
# in shape: (2, 9)
# (sections cannot be sliced)
logits = torch.rand(2, 9, 100)
result = blocky_softmax(logits, dim = 1, boundary = boundary)
# result[:, :, 0] may look like:
# [[0.33, 0.33, 0.33, 1.00, 0.25, 0.25, 0.25, 0.25, 0.0 ]
# [0.0, 0.50, 0.50, 0.50, 0.50, 0.50, 0.50, 0.50, 0.50]]
# other 99 slices looks similar with each blocks sum to 1.
we hope the Softmax is applied to dim = 1, but sections are also applied to this dim.
My current implementation with PyTorch is using for. It is slow and cost too much memory,
which looks like:
def blocky_softmax(logits, splits, map_inf_to = None):
_, batch_len, _ = logits.shape
exp_logits = logits.exp() # [2, 9, 100]
batch_seq_idx = torch.arange(batch_len, device = logits.device)[None, :]
base = torch.zeros_like(logits)
_, n_blocks = splits.shape
for nid in range(1, n_blocks):
start = splits[:, nid - 1, None]
end = splits[:, nid, None]
area = batch_seq_idx >= start
area &= batch_seq_idx < end
area.unsqueeze_(dim = 2)
blocky_z = area * blocky_z
base = base + blocky_z
if map_inf_to is not None:
good_base = base > 0
ones = torch.ones_like(base)
base = torch.where(good_base, base, ones)
exp_logits = torch.where(good_base, exp_logits, ones * map_inf_to)
return exp_logits / base
This implementation is slowed and fattened by n_blocks times. But it could be parallel with each section.
If there is no off-the-shelf function, should I write a CUDA/C++ library? I hope you could help with my issue.
For further generalization, I hope there are discontinuities in boundary/sections.
sections = torch.tensor([[ 0, 0, 0, -1, 2, 3, 2, 3, 0, 3]
[-1, 0, 0, 1, 2, 1, 2, 1, -1, 1]]
# [[000_232303]
# [_0012121_1]]
Thank you for reading:)
I realize that scatter_add and gather perfectly solve the problem.

How to create 2-D array from 3-D Numpy array?

I have a 3 dimensional Numpy array corresponding to an RGB image. I need to create a 2 dimensional Numpy array from it such that if any pixel in the R, G, or B channel is 1, then the corresponding pixel in the 2-D array is 255.
I know how to use something like a list comprehension on a Numpy array, but the result is the same shape as the original array. I need the new shape to be 2-D.
Ok, assuming you want the output pixel to be 0 where it shouldn't be 255 and your input is MxNx3.
RGB = RGB == 1 # you can skip this if your original (RGB) contains only 0's and 1's anyway
out = np.where(np.logical_or.reduce(RGB, axis=-1), 255, 0)
One approach could be with using any() along the third dim and then multiplying by 255, so that the booleans are automatically upscaled to int type, like so -
(img==1).any(axis=2)*255
Sample run -
In [19]: img
Out[19]:
array([[[1, 8, 1],
[2, 4, 7]],
[[4, 0, 6],
[4, 3, 1]]])
In [20]: (img==1).any(axis=2)*255
Out[20]:
array([[255, 0],
[ 0, 255]])
Runtime test -
In [45]: img = np.random.randint(0,5,(1024,1024,3))
# #Paul Panzer's soln
In [46]: %timeit np.where(np.logical_or.reduce(img==1, axis=-1), 255, 0)
10 loops, best of 3: 22.3 ms per loop
# #nanoix9's soln
In [47]: %timeit np.apply_along_axis(lambda a: 255 if 1 in a else 0, 0, img)
10 loops, best of 3: 40.1 ms per loop
# Posted soln here
In [48]: %timeit (img==1).any(axis=2)*255
10 loops, best of 3: 19.1 ms per loop
Additionally, we could convert to np.uint8 and then multiply it with 255 for some further performance boost -
In [49]: %timeit (img==1).any(axis=2).astype(np.uint8)*255
100 loops, best of 3: 18.5 ms per loop
And more, if we work with individual slices along the third dim -
In [68]: %timeit ((img[...,0]==1) | (img[...,1]==1) | (img[...,2]==1))*255
100 loops, best of 3: 7.3 ms per loop
In [69]: %timeit ((img[...,0]==1) | (img[...,1]==1) | (img[...,2]==1)).astype(np.uint8)*255
100 loops, best of 3: 5.96 ms per loop
use apply_along_axis. e.g.
In [28]: import numpy as np
In [29]: np.random.seed(10)
In [30]: img = np.random.randint(2, size=12).reshape(3, 2, 2)
In [31]: img
Out[31]:
array([[[1, 1],
[0, 1]],
[[0, 1],
[1, 0]],
[[1, 1],
[0, 1]]])
In [32]: np.apply_along_axis(lambda a: 255 if 1 in a else 0, 0, img)
Out[32]:
array([[255, 255],
[255, 255]])
see the doc of numpy for details.

Resources