Is it possible to implement the following problem in mystic?
The problem minimizes a sum of squares between two (10000 by 40) matrices: Σ(X-A)^2 where X is a concatenation of four matrices (10000 by 10) and each segment is weighted (W) individually. There is also a constraint where the sum of the weights must equal 1 i.e. (W1 + W2 + W3 + W4 = 1). I'm currently using the SLSQP method in scipy optimize to get the optimal weight values but I want to determine if mystic can improve its performance. I also need to retrieve the final weight values.
from scipy.optimize import minimize
import numpy
def objective(W,X1,X2,X3,X4,A):
W1=W[0]
W2=W[1]
W3=W[2]
W4=W[3]
X=numpy.vstack((W1*X1,W2*X2,W3*X3,W4*X4))
return numpy.sum((X-A)**2)
def constraint1(W):
W1=W[0]
W2=W[1]
W3=W[2]
W4=W[3]
return W1+W2+W3+W4-1
x0=[[0.25,0.25,0.25,0.25]]
cons = {'type': 'eq', 'fun':constraint1}
#Random data only used for purposes of example
segment_1 = numpy.random.rand(10000, 10)
segment_2 = numpy.random.rand(10000, 10)
segment_3 = numpy.random.rand(10000, 10)
segment_4 = numpy.random.rand(10000, 10)
A = numpy.random.rand(10000, 40)
sol=minimize(objective,x0[0],args=(segment_1,segment_2,segment_3,segment_4,A),method='SLSQP',constraints=cons)
print(sol.x)
I'm very new to mystic so take my advice/questions with a grain of salt. I think this should be easily doable. You can almost use the same syntax, just replace the minimize function from scipy with a mystic minimizer.
Mystic offers a few minimal interface minimizers that are almost the same as the minimize function from scipy.
Some examples are diffev2, sparsity, buckshot, lattice, fmin and fminpowell.. All of these functions apply different minimization algorithms. Depending on which algorithm you'd like to use you can just insert one. For more information you should check https://mystic.readthedocs.io/en/latest/mystic.html#minimal-interface.
Cheers
Related
I would like to project a tensor into a space with an additional dimension.
I tried
torch.nn.Linear(
in_features=num_inputs,
out_features=(num_inputs, num_additional),
)
But this results in an error
A workaround would be to
torch.nn.Linear(
in_features=num_inputs,
out_features=num_inputs*num_additional,
)
and then change the view the output
output.view(batch_size, num_inputs, num_additional)
But I imagine this workaround will get tricky to read, especially when a projection into more than one additional dimension is desired.
Is there a more direct way to code this operation?
Perhaps the source code for linear can be changed
https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear
To accept more dimensions for the weight and bias initialization, and F.linear seems like it would need to be replaced with a different function.
IMO the workaround you provided is already clear enough. However, if you want to express this as a single operation, you can always write your own module by subclassing torch.nn.Linear:
import numpy as np
import torch
class MultiDimLinear(torch.nn.Linear):
def __init__(self, in_features, out_shape, **kwargs):
self.out_shape = out_shape
out_features = np.prod(out_shape)
super().__init__(in_features, out_features, **kwargs)
def forward(self, x):
out = super().forward(x)
return out.reshape((len(x), *self.out_shape))
if __name__ == '__main__':
tmp = torch.empty((32, 10))
linear = MultiDimLinear(in_features=10, out_shape=(10, 10))
out = linear(tmp)
print(out.shape) # (32, 10, 10)
Another way would be to use torch.einsum
https://pytorch.org/docs/stable/generated/torch.einsum.html
torch.einsum can prevent summation across dimensions in tensor to tensor multiplication operations. This can allow separate multiplication operations to happen in parallel. [ I do not know if this would necessarily result in GPU efficiency; if the operations are still occurring in the same kernel. In fact, it may be slower https://github.com/pytorch/pytorch/issues/32591 ]
How this would work is to directly initialize the weight and bias tensors (look at source code for the torch linear layer for that code)
Say that the input (X) has dimensions (a, b), where a is the batch size.
Say that you want to pass this input through a series of classifiers, represented in a single weight tensor (W) with dimensions (c, d, e), where c is the number of classifiers, and e is the number of classes for the classifier
import torch
x = torch.arange(2*4).view(2, 4)
w = torch.arange(5*4*6).view(5, 4, 2)
torch.einsum('ab, cbe -> ace', x, w)
in the last line, a and b are the dimensions of the input as mentioned above. What might be the tricky part is c, b, and e are the dimensions of the classifiers weight tensor; I didn't use d, I used b instead. That is because the vector multiplication is happening along that dimension for the inputs tensor and the weight tensor. So that's why the left side of the einsum equation is ab, cbe. The right side of the einsum equation is simply what dimensions to exclude from summation.
The final dimensions we want is (a, c, e). a is the batch size, c is the number of classifiers, and e is the number of classes for each classifier. We do not want to add those values, so to preserve their separation, the left side of the equation is ace.
For those unfamiliar with einsum, this will be harder to read than the word around I created (though I highly recommend learning it, because it gets very easy and intuitive very fast even though it's a bit tricky at first https://www.youtube.com/watch?v=pkVwUVEHmfI )
However, for paralyzing certain operations (especially on GPU), it seems that einsum is the only way to do it. For example so that in my previous example, I didn't want to use a classification head yet, I just wanted to project to multiple dimensions.
import torch
x = torch.arange(2*4).view(2, 4)
w = torch.arange(5*4*6).view(5, 4, 4)
y = torch.einsum('ab, cbe -> ace', x, w)
And say I do a few other operations to y, perhaps some non linear operations, activations, etc.
z = f(y)
z will still have the dimensions 2, 5, 4. Batch size two, 5 hidden states per batch, and the dimension of those hidden states are 4.
And then I want to apply a classifier to each separate tensor.
w2 = torch.arange(4*2).view(4, 2)
final = torch.einsum('fgh, hj -> fgj', z, w2)
Quick refresh, 2 is the batch size, 5 is the number of classifier, and 2 is the number of outputs for each classifier.
The output dimensions, f, g, j (2, 5, 2) will not be summed across, and thus will be preserved in the output.
As cited in the github link, this may be slower than just using regular linear layers. There may be efficiencies in a very large number of parallel operations.
I have mostly worked with Observational data where the treatment assignment was not randomized. In the past, I have used PSM, IPTW to balance and then calculate ATE.
My problem is:
Now I am working on a problem where the treatment assignment is randomized meaning there won't be a confounding effect. But treatment and control groups have different sizes. There's a bucket imbalance.
Now should I just analyze the data as it is and run statistical significance and Statistical power test?
Or shall I balance the imbalance of sizes between the treatment and control using let's say covariate matching and then run significance tests?
In general, you don't need equal group sizes to estimate treatment effects.
Unequal groups will not bias the estimate, it will just affect its variance - namely, reducing the precision (recall the statistical power is determined by the smallest group, so unequal groups is less sample-efficient, but not categorically wrong).
you can further convince yourself with a simple simulation (code below). Showing that for repeated draws, the estimation is not biased (both distributions perfectly overlay), but having equal groups have improved precision (smaller standard error).
import statsmodels.api as sm
import numpy as np
import pandas as pd
import seaborn as sns
n_trials = 100
balanced = {
True: (100, 100),
False: (190, 10),
}
effect = 2.0
res = []
for i in range(n_trials):
np.random.seed(i)
noise = np.random.normal(size=sum(balanced))
for is_balanced, ratio in balanced.items():
t = np.array([0]*ratio[0] + [1]*ratio[1])
y = effect * t + noise
m = sm.OLS(y, t).fit()
res.append((is_balanced, m.params[0], m.bse[0]))
res = pd.DataFrame(res, columns=["is_balanced", "beta", "se"])
g = sns.jointplot(
x="se", y="beta",
hue="is_balanced",
data=res
)
# Annotate the true effect:
g.fig.axes[0].axhline(y=effect, color='grey', linestyle='--')
g.fig.axes[0].text(y=effect, x=res["se"].max(), s="True effect")
I am trying to multiply two complex matrices in PyTorch and it seems the torch.matmul functions is not added yet to PyTorch library for complex numbers.
Do you have any recommendation or is there another method to multiply complex matrices in PyTorch?
Currently torch.matmul is not supported for complex tensors such as ComplexFloatTensor but you could do something as compact as the following code:
def matmul_complex(t1,t2):
return torch.view_as_complex(torch.stack((t1.real # t2.real - t1.imag # t2.imag, t1.real # t2.imag + t1.imag # t2.real),dim=2))
When possible avoid using for loops as these will result in much slower implementations.
Vectorization is achieved by using built-in methods as demonstrated in the code I have attached.
For example, your code takes roughly 6.1s on CPU while the vectorized version takes only 101ms (~60 times faster) for 2 random complex matrices with dimensions 1000 X 1000.
Update:
Since PyTorch 1.7.0 (as #EduardoReis mentioned) you can do matrix multiplication between complex matrices similarly to real-valued matrices as follows:
t1 # t2
(for t1, t2 complex matrices).
I implemented this function for pytorch.matmul for complex numbers using torch.mv and it's working fine for time-being:
def matmul_complex(t1, t2):
m = list(t1.size())[0]
n = list(t2.size())[1]
t = torch.empty((1,n), dtype=torch.cfloat)
t_total = torch.empty((m,n), dtype=torch.cfloat)
for i in range(0,n):
if i == 0:
t_total = torch.mv(t1,t2[:,i])
else:
t_total = torch.cat((t_total, torch.mv(t1,t2[:,i])), 0)
t_final = torch.reshape(t_total, (m,n))
return t_final
I am new to PyTorch, so please correct me if I am wrong.
I'm seeing here that imode=3 is equivalent to the steady-state simulation (which I guess imode=2) except that additional degrees of freedom are allowed.
How do I decide to use imode=3 instead of imode=2?
I'm doing optimization using imode=2 where I'm defining variables calculated by solver to meet constraint using m.Var & other using m.Param. What changes I need to do in variables to use imode=3 ?
Niladri,
IMODE 2 is for steady state problems with multiple data points.
Here is an example:
from gekko import GEKKO
import numpy as np
xm = np.array([0,1,2,3,4,5])
ym = np.array([0.1,0.2,0.3,0.5,1.0,0.9])
m = GEKKO()
m.x = m.Param(value=np.linspace(-1,6))
m.y = m.Var()
m.options.IMODE=2
m.cspline(m.x,m.y,xm,ym)
m.solve(disp=False)
This is a Cubic Spline approximation with multiple data points. When you switch to IMODE 3, it is very similar but it only considers one instance of your model. All of the value properties should only have 1 value such as when you optimize the Cubic spline to find the maximum value.
p = GEKKO()
p.x = p.Var(value=1,lb=0,ub=5)
p.y = p.Var()
p.cspline(p.x,p.y,xm,ym)
p.Obj(-p.y)
p.solve(disp=False)
Here is additional information on IMODE:
https://apmonitor.com/wiki/index.php/Main/OptionApmImode
https://apmonitor.com/wiki/index.php/Main/Modes
https://gekko.readthedocs.io/en/latest/imode.html
Best regards,
John Hedengren
I am using scipy's curvefit module to fit a function and wanted to know if there is a way to tell it the the only possible entries are integers not real numbers? Any ideas as to another way of doing this?
In its general form, an integer programming problem is NP-hard ( see here ). There are some efficient heuristic or approximate algorithm to solve this problem, but none guarantee an exact optimal solution.
In scipy you may implement a grid search over the integer coefficients and use, say, curve_fit over the real parameters for the given integer coefficients. As for grid search. scipy has brute function.
For example if y = a * x + b * x^2 + some-noise where a has to be integer this may work:
Generate some test data with a = 5 and b = -1.5:
coef, n = [5, - 1.5], 50
xs = np.linspace(0, 10, n)[:,np.newaxis]
xs = np.hstack([xs, xs**2])
noise = 2 * np.random.randn(n)
ys = np.dot(xs, coef) + noise
A function which given the integer coefficients fits the real coefficient using curve_fit method:
def optfloat(intcoef, xs, ys):
from scipy.optimize import curve_fit
def poly(xs, floatcoef):
return np.dot(xs, [intcoef, floatcoef])
popt, pcov = curve_fit(poly, xs, ys)
errsqr = np.linalg.norm(poly(xs, popt) - ys)
return dict(errsqr=errsqr, floatcoef=popt)
A function which given the integer coefficients, uses the above function to optimize the float coefficient and returns the error:
def errfun(intcoef, *args):
xs, ys = args
return optfloat(intcoef, xs, ys)['errsqr']
Minimize errfun using scipy.optimize.brute to find optimal integer coefficient and call optfloat with the optimized integer coefficient to find the optimal real coefficient:
from scipy.optimize import brute
grid = [slice(1, 10, 1)] # grid search over 1, 2, ..., 9
# it is important to specify finish=None in below
intcoef = brute(errfun, grid, args=(xs, ys,), finish=None)
floatcoef = optfloat(intcoef, xs, ys)['floatcoef'][0]
Using this method I obtain [5.0, -1.50577] for the optimal coefficients, which is exact for the integer coefficient, and close enough for the real coefficient.
In general, the answer is No: scipy.optimize.curve_fit() and leastsq() that it is based on, and (AFAIK) all the other solvers in scipy.optimize work strictly on floating point numbers.
You could try increasing the value of epsfcn (which has a default value of numpy.finfo('double').eps ~ 2.e-16), which would be used as the initial step to all variables in the problem. The basic issue is that the fitting algorithm will adjust a floating point number, and if you do
int_var = int(float_var)
and the algorithm changes float_var from 1.0 to 1.00000001, it will see no difference in the result and decide that that value does not actually alter the fit metric.
Another approach would be to have a floating point parameter 'tmp_float_var' that is freely adjusted by the fitting algorithm but then in your objective function use
int_var = int(tmp_float_var / numpy.finfo('double').eps)
as the value for your integer variable. That might need a little tweaking, and might be a little unstable, but ought to work.