I'm currently working on an optimization problem and I'm stuck at some point.
Here is a simple case:
I have a set of p parameters in c combinations and try to optimize them on i inputs. This leads to a numpy array of shape (p, c, i). For each of them, I calculate an error leading to a second array of shape (c, i). Now I have to get rid of some combinations before running the next iteration.
For the case i = 1, I use the following code which works fine (arrays are just (p, c) and (c, )):
ensemble = np.vstack((ensemble, error))
# Remove some functions that do not fit
ensemble = ensemble[:, ensemble[p, :] < thres]
# Delete error column
function_ensemble = np.delete(function_ensemble, p, axis=0)
Now I try to generalize this for i > 1:
error[error >= 0.015] = 0
error = error[np.newaxis, :, :]
# Maybe keep them seperate
ensemble = np.concatenate((ensemble, error))
And thats where I'm stuck. What I'm currently thinking of is sorting my ensemble by the error and removing all entries where the error is 0 for all i. So that the ensemble gets smaller. However, as far as I know, np.sort does not work here. Maybe I could use structured arrays but I'm not sure if this would destroy code in other places.
Does anyone have an idea for my issue?
Edit
For case i=1 a running example with p=3 and c=100 could just be:
error = np.random.rand(100)
ensemble = np.ones([3, 100])
ensemble = np.vstack((ensemble, error))
ensemble = ensemble[:, ensemble[3, :] < 0.5]
ensemble = np.delete(ensemble, 3, axis=0)
The result in this case is a subset of my ensemble where the error is smaller than 0.5.
For case i=2 (multiple ensembles) an example with p=3 and c=100 could be:
error = np.random.rand(100, 2)
ensemble = np.ones([3, 100, 2])
error[error >= 0.015] = 0
error = error[np.newaxis, :, :]
ensemble = np.concatenate((ensemble, error))
Again, I want a subset of my ensembles containing all sets of parameters with an error < 0.5. This will require some padding since each ensemble i will have a different size afterwards.
However, the ultimate goal is to iterate and adjust the parameters until only one set remains ending up with an array of size (3,1,2). (Examles above do not show iteration).
Best, Julz
Related
I would like to project a tensor into a space with an additional dimension.
I tried
torch.nn.Linear(
in_features=num_inputs,
out_features=(num_inputs, num_additional),
)
But this results in an error
A workaround would be to
torch.nn.Linear(
in_features=num_inputs,
out_features=num_inputs*num_additional,
)
and then change the view the output
output.view(batch_size, num_inputs, num_additional)
But I imagine this workaround will get tricky to read, especially when a projection into more than one additional dimension is desired.
Is there a more direct way to code this operation?
Perhaps the source code for linear can be changed
https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear
To accept more dimensions for the weight and bias initialization, and F.linear seems like it would need to be replaced with a different function.
IMO the workaround you provided is already clear enough. However, if you want to express this as a single operation, you can always write your own module by subclassing torch.nn.Linear:
import numpy as np
import torch
class MultiDimLinear(torch.nn.Linear):
def __init__(self, in_features, out_shape, **kwargs):
self.out_shape = out_shape
out_features = np.prod(out_shape)
super().__init__(in_features, out_features, **kwargs)
def forward(self, x):
out = super().forward(x)
return out.reshape((len(x), *self.out_shape))
if __name__ == '__main__':
tmp = torch.empty((32, 10))
linear = MultiDimLinear(in_features=10, out_shape=(10, 10))
out = linear(tmp)
print(out.shape) # (32, 10, 10)
Another way would be to use torch.einsum
https://pytorch.org/docs/stable/generated/torch.einsum.html
torch.einsum can prevent summation across dimensions in tensor to tensor multiplication operations. This can allow separate multiplication operations to happen in parallel. [ I do not know if this would necessarily result in GPU efficiency; if the operations are still occurring in the same kernel. In fact, it may be slower https://github.com/pytorch/pytorch/issues/32591 ]
How this would work is to directly initialize the weight and bias tensors (look at source code for the torch linear layer for that code)
Say that the input (X) has dimensions (a, b), where a is the batch size.
Say that you want to pass this input through a series of classifiers, represented in a single weight tensor (W) with dimensions (c, d, e), where c is the number of classifiers, and e is the number of classes for the classifier
import torch
x = torch.arange(2*4).view(2, 4)
w = torch.arange(5*4*6).view(5, 4, 2)
torch.einsum('ab, cbe -> ace', x, w)
in the last line, a and b are the dimensions of the input as mentioned above. What might be the tricky part is c, b, and e are the dimensions of the classifiers weight tensor; I didn't use d, I used b instead. That is because the vector multiplication is happening along that dimension for the inputs tensor and the weight tensor. So that's why the left side of the einsum equation is ab, cbe. The right side of the einsum equation is simply what dimensions to exclude from summation.
The final dimensions we want is (a, c, e). a is the batch size, c is the number of classifiers, and e is the number of classes for each classifier. We do not want to add those values, so to preserve their separation, the left side of the equation is ace.
For those unfamiliar with einsum, this will be harder to read than the word around I created (though I highly recommend learning it, because it gets very easy and intuitive very fast even though it's a bit tricky at first https://www.youtube.com/watch?v=pkVwUVEHmfI )
However, for paralyzing certain operations (especially on GPU), it seems that einsum is the only way to do it. For example so that in my previous example, I didn't want to use a classification head yet, I just wanted to project to multiple dimensions.
import torch
x = torch.arange(2*4).view(2, 4)
w = torch.arange(5*4*6).view(5, 4, 4)
y = torch.einsum('ab, cbe -> ace', x, w)
And say I do a few other operations to y, perhaps some non linear operations, activations, etc.
z = f(y)
z will still have the dimensions 2, 5, 4. Batch size two, 5 hidden states per batch, and the dimension of those hidden states are 4.
And then I want to apply a classifier to each separate tensor.
w2 = torch.arange(4*2).view(4, 2)
final = torch.einsum('fgh, hj -> fgj', z, w2)
Quick refresh, 2 is the batch size, 5 is the number of classifier, and 2 is the number of outputs for each classifier.
The output dimensions, f, g, j (2, 5, 2) will not be summed across, and thus will be preserved in the output.
As cited in the github link, this may be slower than just using regular linear layers. There may be efficiencies in a very large number of parallel operations.
Here Get intersecting rows across two 2D numpy arrays they got intersecting rows by using the function np.intersect1d. So i changed the function to use np.setdiff1d to get the set difference but it doesn't work properly. The following is the code.
def set_diff2d(A, B):
nrows, ncols = A.shape
dtype={'names':['f{}'.format(i) for i in range(ncols)],
'formats':ncols * [A.dtype]}
C = np.setdiff1d(A.view(dtype), B.view(dtype))
return C.view(A.dtype).reshape(-1, ncols)
The following data is used for checking the issue:
min_dis=400
Xt = np.arange(50, 3950, min_dis)
Yt = np.arange(50, 3950, min_dis)
Xt, Yt = np.meshgrid(Xt, Yt)
Xt[::2] += min_dis/2
# This is the super set
turbs_possible_locs = np.vstack([Xt.flatten(), Yt.flatten()]).T
# This is the subset
subset = turbs_possible_locs[np.random.choice(turbs_possible_locs.shape[0],50, replace=False)]
diffs = set_diff2d(turbs_possible_locs, subset)
diffs is supposed to have a shape of 50x2, but it is not.
Ok, so to fix your issue try the following tweak:
def set_diff2d(A, B):
nrows, ncols = A.shape
dtype={'names':['f{}'.format(i) for i in range(ncols)], 'formats':ncols * [A.dtype]}
C = np.setdiff1d(A.copy().view(dtype), B.copy().view(dtype))
return C
The problem was - A after .view(...) was applied was broken in half - so it had 2 tuple columns, instead of 1, like B. I.e. as a consequence of applying dtype you essentially collapsed 2 columns into tuple - which is why you could do the intersection in 1d in the first place.
Quoting after documentation:
"
a.view(some_dtype) or a.view(dtype=some_dtype) constructs a view of the array’s memory with a different data-type. This can cause a reinterpretation of the bytes of memory.
"
Src https://numpy.org/doc/stable/reference/generated/numpy.ndarray.view.html
I think the "reinterpretation" is exactly what happened - hence for the sake of simplicity I would just .copy() the array.
NB however I wouldn't square it - it's always A which gets 'broken' - whether it's an assignment, or inline B is always fine...
I am working on a new routine inside some codes based on OOP, and encountered a problem while modifying the array of the data (short example of the code is below).
Basically, this routine is about taking the array R, transposing it and then sorting it, and then filter out the data below the pre-determined value of thres. Then, I re-transpose back this array into its original dimension, and then plot each of its rows with the first element of T.
import numpy as np
import matplotlib.pyplot as plt
R = np.random.rand(3,8)
R = R.transpose() # transpose the random matrix
R = R[R[:,0].argsort()] # sort this matrix
print(R)
T = ([i for i in np.arange(1,9,1.0)],"temps (min)")
thres = float(input("Define the threshold of coherence: "))
if thres >= 0.0 and thres <= 1.0 :
R = R[R[:, 0] >= thres] # how to filter unwanted values? changing to NaN / zeros ?
else :
print("The coherence value is absurd or you're not giving a number!")
print("The final results are ")
print(R)
print(R.transpose())
R.transpose() # re-transpose this matrix
ax = plt.subplot2grid( (4,1),(0,0) )
ax.plot(T[0],R[0])
ax.set_ylabel('Coherence')
ax = plt.subplot2grid( (4,1),(1,0) )
ax.plot(T[0],R[1],'.')
ax.set_ylabel('Back-azimuth')
ax = plt.subplot2grid( (4,1),(2,0) )
ax.plot(T[0],R[2],'.')
ax.set_ylabel('Velocity\nkm/s')
ax.set_xlabel('Time (min)')
However, I encounter an error
ValueError: x and y must have same first dimension, but have shapes (8,) and (3,)
I comment the part of where I think the problem might reside (how to filter unwanted values?), but then the question remains.
How can I plot this two arrays (R and T) while still being able to filter out unwanted values below thres? Can I transform these unwanted values to zero or NaN and then successfully plot them? If yes, how can I do that?
Your help would be much appreciated.
With the help of a techie friend, the problem is simply resolved by keeping this part
R = R[R[:, 0] >= thres]
because removing unwanted elements is more preferable than changing them to NaN or zero. And then the problem with plotting is fixed by adding a slight modification in this part
ax.plot(T[0][:len(R[0])],R[0])
and also for the subsequent plotting part. This slices T into the same dimension as R.
I want to use python3 to build a zeroinflatedpoisson model. I found in library statsmodel the function statsmodels.discrete.count_model.ZeroInflatePoisson.
I just wonder how to use it. It seems I should do:
ZIFP(Y_train,X_train).fit().
But when I wanted to do prediction using X_test.
It told me the length of X_test doesn't fit X_train.
Or is there another package to fit this model?
Here is the code I used:
X1 = [random.randint(0,1) for i in range(200)]
X2 = [random.randint(1,2) for i in range(200)]
y = np.random.poisson(lam = 2,size = 100).tolist()
for i in range(100):y.append(0)
df['x1'] = x1
df['x2'] = x2
df['y'] = y
df_x = df.iloc[:,:-1]
x_train,x_test,y_train,y_test = train_test_split(df_x,df['y'],test_size = 0.3)
clf = ZeroInflatedPoisson(endog = y_train,exog = x_train).fit()
clf.predict(x_test)
ValueError:operands could not be broadcat together with shapes (140,)(60,)
also tried:
clf.predict(x_test,exog = np.ones(len(x_test)))
ValueError: shapes(60,) and (1,) not aligned: 60 (dim 0) != 1 (dim 0)
This looks like a bug to me.
As far as I can see:
If there are no explanatory variables, exog_infl, specified for the inflation model, then a array of ones is used to model a constant inflation probability.
However, if exog_infl in predict is None, then it uses the model.exog_infl which is an array of ones with the length equal to the training sample.
As work around specifying a 1-D array of ones of correct length in predict should work.
Try:
clf.predict(test_x, exog_infl=np.ones(len(test_x))
I guess the same problem will occur if exposure was used in the model, but is not explicitly specified in predict.
I ran into the same problem, landing me on this thread. As noted by Josef, it seems like you need to provide exog_infl with a 1-D array of ones of correct length to work.
However, the code Josef provided misses the 1-D array-part, so the full line required to generate the required array is actually
clf.predict(test_x, exog_infl=np.ones((len(test_x),1))
I want to make use of Theano's logistic regression classifier, but I would like to make an apples-to-apples comparison with previous studies I've done to see how deep learning stacks up. I recognize this is probably a fairly simple task if I was more proficient in Theano, but this is what I have so far. From the tutorials on the website, I have the following code:
def errors(self, y):
# check if y has same dimension of y_pred
if y.ndim != self.y_pred.ndim:
raise TypeError(
'y should have the same shape as self.y_pred',
('y', y.type, 'y_pred', self.y_pred.type)
)
# check if y is of the correct datatype
if y.dtype.startswith('int'):
# the T.neq operator returns a vector of 0s and 1s, where 1
# represents a mistake in prediction
return T.mean(T.neq(self.y_pred, y))
I'm pretty sure this is where I need to add the functionality, but I'm not certain how to go about it. What I need is either access to y_pred and y for each and every run (to update my confusion matrix in python) or to have the C++ code handle the confusion matrix and return it at some point along the way. I don't think I can do the former, and I'm unsure how to do the latter. I've done some messing around with an update function along the lines of:
def confuMat(self, y):
x=T.vector('x')
classes = T.scalar('n_classes')
onehot = T.eq(x.dimshuffle(0,'x'),T.arange(classes).dimshuffle('x',0))
oneHot = theano.function([x,classes],onehot)
yMat = T.matrix('y')
yPredMat = T.matrix('y_pred')
confMat = T.dot(yMat.T,yPredMat)
confusionMatrix = theano.function(inputs=[yMat,yPredMat],outputs=confMat)
def confusion_matrix(x,y,n_class):
return confusionMatrix(oneHot(x,n_class),oneHot(y,n_class))
t = np.asarray(confusion_matrix(y,self.y_pred,self.n_out))
print (t)
But I'm not completely clear on how to get this to interface with the function in question and give me a numpy array I can work with.
I'm quite new to Theano, so hopefully this is an easy fix for one of you. I'd like to use this classifer as my output layer in a number of configurations, so I could use the confusion matrix with other architectures.
I suggest using a brute force sort of a way. You need an output for a prediction first. Create a function for it.
prediction = theano.function(
inputs = [index],
outputs = MLPlayers.predicts,
givens={
x: test_set_x[index * batch_size: (index + 1) * batch_size]})
In your test loop, gather the predictions...
labels = labels + test_set_y.eval().tolist()
for mini_batch in xrange(n_test_batches):
wrong = wrong + int(test_model(mini_batch))
predictions = predictions + prediction(mini_batch).tolist()
Now create confusion matrix this way:
correct = 0
confusion = numpy.zeros((outs,outs), dtype = int)
for index in xrange(len(predictions)):
if labels[index] is predictions[index]:
correct = correct + 1
confusion[int(predictions[index]),int(labels[index])] = confusion[int(predictions[index]),int(labels[index])] + 1
You can find this kind of an implementation in this repository.