I'm using the simplest model available to make this testcase in Node.js:
model = tf.sequential();
model.add(tf.layers.dense({units: 1, inputShape: [1]}));
And training it with a formula X+X=Y for testing purposes:
let xsData = [];
let ysData = [];
for (let i = 1; i < 17; i++) { // Please note the 16 iterations here!
xsData.push(i);
ysData.push(i+i);
}
const xs = tf.tensor2d(xsData, [xsData.length, 1]);
const ys = tf.tensor2d(ysData, [ysData.length, 1]);
await model.fit(xs, ys, {epochs: 500});
After that completes, I'm testing the model using the number ten:
model.predict(tf.tensor2d([10], [1, 1])).dataSync();
Which gives me a value of around 20. Which is correct (10+10=20).
Now for the problem: Whenever I change the iterations to 17+, the model collapses. I test the same number (10), output results in -1.9837386463351284e+25 and such. Even higher dataset iterations result in Infinity and NaN.
Does anyone have a clue what is going on here? Would be great if anyone could point me in the right direction. Thank you in advance.
Using SGD in regression could be tricky as outputs don't have an upper bound and that can lead NaN values in loss, in other words exploding gradients etc.
Changing optimizer to Adam or RMSProp works most of the times.
Related
I am building an Actor-Critic neural network model in pytorch in order to train an agent to play the game of Quoridor (hopefully). For this reason, I have a neural network with two heads, one for the actor output which does a softmax on all the possible moves and one for the critic output which is just one neuron (for regressing the value of the input state).
Now, in quoridor, most of the times not all moves will be legal and as such I am wondering if I can exclude output neurons on the actor's head that correspond to illegal moves for the input state e.g. by passing a list of indices of all the neurons that correspond to legal moves. Thus, I want to not sum these outputs on the denominator of softmax.
Is there a functionality like this on pytorch (because I cannot find one)? Should I attempt to implement such a Softmax myself (kinda scared to, pytorch probably knows best, I ve been adviced to use LogSoftmax as well)?
Furthermore, do you think this approach of dealing with illegal moves is good? Or should I just let him guess illegal moves and penalize him (negative reward) for it in the hopes that eventually it will not pick illegal moves?
Or should I let the softmax be over all the outputs and then just set illegal ones to zero? The rest won't sum to 1 but maybe I can solve that by plain normalization (i.e. dividing by the L2 norm)?
An easy solution would be to mask out illegal moves with a large negative value, this will practically force very low (log)softmax values (example below).
# 3 dummy actions for a batch size of 2
>>> actions = torch.rand(2, 3)
>>> actions
tensor([[0.9357, 0.2386, 0.3264],
[0.0179, 0.8989, 0.9156]])
# dummy mask assigning 0 to valid actions and 1 to invalid ones
>>> mask = torch.randint(low=0, high=2, size=(2, 3))
>>> mask
tensor([[1, 0, 0],
[0, 0, 0]])
# set actions marked as invalid to very large negative value
>>> actions = actions.masked_fill_(mask.eq(1), value=-1e10)
>>> actions
tensor([[-1.0000e+10, 2.3862e-01, 3.2636e-01],
[ 1.7921e-02, 8.9890e-01, 9.1564e-01]])
# softmax assigns no probability mass to illegal actions
>>> actions.softmax(dim=-1)
tensor([[0.0000, 0.4781, 0.5219],
[0.1704, 0.4113, 0.4183]])
I'm not qualified to say if this is a good idea, but I had the same one and ended up implementing it.
The code is using rust's bindings for pytorch, so it should be directly translatable to python based pytorch.
/// As log_softmax(dim=1) on a 2d tensor, but takes a {0, 1} `filter` of the same shape as `xs`
/// and has the softmax only look at values where filter[idx] = 1.
///
/// The output is 0 where the filter is 0.
pub fn filtered_log_softmax(xs: &Tensor, filter: &Tensor) -> Tensor {
// We are calculating `log softmax(xs, ys)` except that we only want to consider
// the values of xs and ys where the corresponding `filter` bit is set to 1.
//
// log_softmax on one element of the batch = for_each_i log(e^xs[i] / sum_j e^xs[j]))
//
// To filter that we need to remove (zero out) elements that are being filtered both after the log is
// taken, and before summing into the denominator. We can do this with two multiplications
//
// filtered_log_softmax = for_each_i filter[i] * log(e^xs[i] / sum_j filter[j] * e^xs[j]))
//
// This is mathematically correct, but it turns out there's a numeric stability trick we need to do,
// without it we're seeing NaNs. Sourcing the trick from: https://stackoverflow.com/a/52132033
//
// We can do the same transformation here, and come out with the following expression:
//
// let xs_max = max_i xs[i]
// for_each_i filter[i] * (xs[i] - xs_max - log(sum_j filter[j] * e^(xs[j] - xs_max))
//
// Keep in mind that the actual implementation below is further vectorized over an initial batch dimension.
let (xs_max, _) = xs.max_dim(1, true);
let xs_offset = xs - xs_max;
// TODO: Replace with Tensor::linalg_vecdot(&filter, &xs_offset.exp(), 1).log();
// when we update tch-rs (linalg_vecdot is new in pytorch 1.13)
let constant_sub = (filter * &xs_offset.exp()).sum_to_size(&[xs.size()[0], 1]).log();
filter * (&xs_offset - constant_sub)
}
I'm currently working on a use case using RandomForestRegressor. To get training and test data separately based on one column, let's say Home, the dataframe was split into dictionary. Almost done with the modelling, but stuck in getting the feature importance for each of the key in dictionary (number of keys = 21). Please have a look at the codes below:
hp = pd.get_dummies(hp)
hp = {i: g for i, g in hp.set_index(["Home"]).groupby(level = [0])}
feature = {}; feature_train = {}; feature_test = {}
target = {}; target_train = {}; target_test = {}; target_pred = {}
importances = {}
for k, v in hp.items():
target[k] = np.array(v["HP"])
feature[k] = v.drop(["HP", "Corr"], axis = 1)
feature_list = list(feature[1].columns)
for k, v in zip(feature, target):
feature[k] = np.array(feature[v])
for k, v in zip(feature_train, target_train):
feature_train[k], feature_test[k], target_train[k], target_test[k] = train_test_split(
feature[v], target[v], test_size = 0.25, random_state = 42)
What I've tried after a help from Random Forest Feature Importance Chart using Python
for name, importance in zip(feature_list, list(rf.feature_importances_)):
print(name, "=", importance)
but this prints importance for only one of the dictionary (and I don't know which). What I want is to get it printed for all the keys in dictionary "importances". Thanks in advance!
If I understand you correctly, you want feature's importance for both train and test data.
That's not how it works, first it creates RandomForest from your training data, and after that operation it can calculate importance of each feature based on how many times it was used to split the space (and how 'good' were the splits, e.g. how low was, for example, the gini impurity, for many trees of course).
So you obtain feature's importance for training data, for test data the learned tree architecture is used in order to predict values.
I want to use python3 to build a zeroinflatedpoisson model. I found in library statsmodel the function statsmodels.discrete.count_model.ZeroInflatePoisson.
I just wonder how to use it. It seems I should do:
ZIFP(Y_train,X_train).fit().
But when I wanted to do prediction using X_test.
It told me the length of X_test doesn't fit X_train.
Or is there another package to fit this model?
Here is the code I used:
X1 = [random.randint(0,1) for i in range(200)]
X2 = [random.randint(1,2) for i in range(200)]
y = np.random.poisson(lam = 2,size = 100).tolist()
for i in range(100):y.append(0)
df['x1'] = x1
df['x2'] = x2
df['y'] = y
df_x = df.iloc[:,:-1]
x_train,x_test,y_train,y_test = train_test_split(df_x,df['y'],test_size = 0.3)
clf = ZeroInflatedPoisson(endog = y_train,exog = x_train).fit()
clf.predict(x_test)
ValueError:operands could not be broadcat together with shapes (140,)(60,)
also tried:
clf.predict(x_test,exog = np.ones(len(x_test)))
ValueError: shapes(60,) and (1,) not aligned: 60 (dim 0) != 1 (dim 0)
This looks like a bug to me.
As far as I can see:
If there are no explanatory variables, exog_infl, specified for the inflation model, then a array of ones is used to model a constant inflation probability.
However, if exog_infl in predict is None, then it uses the model.exog_infl which is an array of ones with the length equal to the training sample.
As work around specifying a 1-D array of ones of correct length in predict should work.
Try:
clf.predict(test_x, exog_infl=np.ones(len(test_x))
I guess the same problem will occur if exposure was used in the model, but is not explicitly specified in predict.
I ran into the same problem, landing me on this thread. As noted by Josef, it seems like you need to provide exog_infl with a 1-D array of ones of correct length to work.
However, the code Josef provided misses the 1-D array-part, so the full line required to generate the required array is actually
clf.predict(test_x, exog_infl=np.ones((len(test_x),1))
I'm trying to create a decoder for the a seq2seq model I have.
So far what I have is:
dec_op_reshaped = tf.reshape(decoder_outputs, [-1, state_size])
logits = tf.matmul(dec_op_reshaped, V) + bo
feed_dict = {
self.xs_ : query,
self.dec_inputs_length_ : [query.shape[-1]*2], # this bothers me!
self.keep_prob_ : 1.
}
translated_arr = self._sess.run(tf.argmax(tf.nn.softmax(logits), axis=1), feed_dict = feed_dict)
I do not know what I've been doing wrong, but everytime a query is passed, it returns an array of zeroes instead of idx2w (what's expected).
EDIT:
I am so sorry. Sometimes it's right in front of you and you miss it.
The state-size was 0 - hence the problem.
I want to make use of Theano's logistic regression classifier, but I would like to make an apples-to-apples comparison with previous studies I've done to see how deep learning stacks up. I recognize this is probably a fairly simple task if I was more proficient in Theano, but this is what I have so far. From the tutorials on the website, I have the following code:
def errors(self, y):
# check if y has same dimension of y_pred
if y.ndim != self.y_pred.ndim:
raise TypeError(
'y should have the same shape as self.y_pred',
('y', y.type, 'y_pred', self.y_pred.type)
)
# check if y is of the correct datatype
if y.dtype.startswith('int'):
# the T.neq operator returns a vector of 0s and 1s, where 1
# represents a mistake in prediction
return T.mean(T.neq(self.y_pred, y))
I'm pretty sure this is where I need to add the functionality, but I'm not certain how to go about it. What I need is either access to y_pred and y for each and every run (to update my confusion matrix in python) or to have the C++ code handle the confusion matrix and return it at some point along the way. I don't think I can do the former, and I'm unsure how to do the latter. I've done some messing around with an update function along the lines of:
def confuMat(self, y):
x=T.vector('x')
classes = T.scalar('n_classes')
onehot = T.eq(x.dimshuffle(0,'x'),T.arange(classes).dimshuffle('x',0))
oneHot = theano.function([x,classes],onehot)
yMat = T.matrix('y')
yPredMat = T.matrix('y_pred')
confMat = T.dot(yMat.T,yPredMat)
confusionMatrix = theano.function(inputs=[yMat,yPredMat],outputs=confMat)
def confusion_matrix(x,y,n_class):
return confusionMatrix(oneHot(x,n_class),oneHot(y,n_class))
t = np.asarray(confusion_matrix(y,self.y_pred,self.n_out))
print (t)
But I'm not completely clear on how to get this to interface with the function in question and give me a numpy array I can work with.
I'm quite new to Theano, so hopefully this is an easy fix for one of you. I'd like to use this classifer as my output layer in a number of configurations, so I could use the confusion matrix with other architectures.
I suggest using a brute force sort of a way. You need an output for a prediction first. Create a function for it.
prediction = theano.function(
inputs = [index],
outputs = MLPlayers.predicts,
givens={
x: test_set_x[index * batch_size: (index + 1) * batch_size]})
In your test loop, gather the predictions...
labels = labels + test_set_y.eval().tolist()
for mini_batch in xrange(n_test_batches):
wrong = wrong + int(test_model(mini_batch))
predictions = predictions + prediction(mini_batch).tolist()
Now create confusion matrix this way:
correct = 0
confusion = numpy.zeros((outs,outs), dtype = int)
for index in xrange(len(predictions)):
if labels[index] is predictions[index]:
correct = correct + 1
confusion[int(predictions[index]),int(labels[index])] = confusion[int(predictions[index]),int(labels[index])] + 1
You can find this kind of an implementation in this repository.