Related
I'm new enough to TensorFlow and Keras that I might be missing something obvious, but this is driving me nuts. I inherited an app that trains a custom convolutional LSTM, and I just spent the last two months or so straightening out some really atrocious data wrangling, only to discover I can't get the model to train properly.
Here's the model definition (with hard values substituted for the variables in my actual code):
inputs = layers.Input(shape = (2, 23, 23, 10))
outputs = layers.ConvLSTM2D(filters = 32,
kernel_size = (5,5),
padding = "same",
return_sequences = True,
stateful = False,
activation = "relu")(inputs)
outputs = layers.BatchNormalization()(outputs)
outputs = layers.ConvLSTM2D(filters = 32,
kernel_size = (3,3),
padding = "same",
return_sequences = True,
stateful = False,
activation = "relu")(outputs)
outputs = layers.ConvLSTM2D(filters = 32,
kernel_size = (3,3),
padding = "same",
return_sequences = True,
stateful = False,
activation = "relu")(outputs)
outputs = layers.Conv3D(filters = 1,
kernel_size = (3, 3, 3),
padding = "same",
activation = "sigmoid")(outputs)
If something looks squirrely there, let me know--I didn't create this model (though I did change the input grid from 100 x 100 to 23 x 23, if that makes a difference). The idea is to predict the intensity of a particular weather phenomenon (that's a single number for each grid point at each time).
Here's the model summary produced after the model is defined:
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 2, 23, 23, 10)] 0
conv_lstm2d (ConvLSTM2D) (None, 2, 23, 23, 32) 134528
batch_normalization (BatchN (None, 2, 23, 23, 32) 128
ormalization)
conv_lstm2d_1 (ConvLSTM2D) (None, 2, 23, 23, 32) 73856
conv_lstm2d_2 (ConvLSTM2D) (None, 2, 23, 23, 32) 73856
conv3d (Conv3D) (None, 2, 23, 23, 1) 865
=================================================================
Total params: 283,233
Trainable params: 283,169
Non-trainable params: 64
_________________________________________________________________
The model is fit using data from a custom Sequence sub-class, which produces X input of shape ([batch size], 2, 23, 23, 10) and Y input of shape ([batch size], 2, 23, 23, 1). The batch size is usually 8, but because of the way the data is lazily loaded, the last batch in a particular block of files may be smaller, which is why I don't specify a batch size in model definition. For the record, the original coders had a constant batch size, though, as with mine, it wasn't specified in the model definition.
When I try to fit the model, I get a crash pretty quickly, with this traceback:
Traceback (most recent call last):
File "C:/code/Python/edapts/ConvLSTM2D.py", line 175, in <module>
history = model.fit(training_data,
File "C:\Python\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "C:\Python\lib\site-packages\tensorflow\python\eager\execute.py", line 54, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:
Detected at node 'gradient_tape/model/conv_lstm2d/transpose_1/transpose' defined at (most recent call last):
File "C:/code/Python/edapts/ConvLSTM2D.py", line 175, in <module>
history = model.fit(training_data,
File "C:\Python\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "C:\Python\lib\site-packages\keras\engine\training.py", line 1409, in fit
tmp_logs = self.train_function(iterator)
File "C:\Python\lib\site-packages\keras\engine\training.py", line 1051, in train_function
return step_function(self, iterator)
File "C:\Python\lib\site-packages\keras\engine\training.py", line 1040, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "C:\Python\lib\site-packages\keras\engine\training.py", line 1030, in run_step
outputs = model.train_step(data)
File "C:\Python\lib\site-packages\keras\engine\training.py", line 893, in train_step
self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File "C:\Python\lib\site-packages\keras\optimizers\optimizer_v2\optimizer_v2.py", line 537, in minimize
grads_and_vars = self._compute_gradients(
File "C:\Python\lib\site-packages\keras\optimizers\optimizer_v2\optimizer_v2.py", line 590, in _compute_gradients
grads_and_vars = self._get_gradients(tape, loss, var_list, grad_loss)
File "C:\Python\lib\site-packages\keras\optimizers\optimizer_v2\optimizer_v2.py", line 471, in _get_gradients
grads = tape.gradient(loss, var_list, grad_loss)
Node: 'gradient_tape/model/conv_lstm2d/transpose_1/transpose'
transpose expects a vector of size 4. But input(1) is a vector of size 5
[[{{node gradient_tape/model/conv_lstm2d/transpose_1/transpose}}]] [Op:__inference_train_function_9880]
I'm so lost. The only thing I can think of is that the original coders made the X and Y features the same, whereas I'm only trying to predict 1 of the 10 (there's not actually a reason to predict the others--not sure why they were trying to). If that's the problem, how do I redefine the model to take account of the different output shape?
EDIT: Well, that's interesting. I just compared the model definition from the original team with the model definition used by the co-worker I inherited the code from. Said co-worker seems to have inserted an extra layer in the outputs, duplicating the ConvLSTM2D with 32 filters and kernel size (3,3). When I remove one of those duplicates, everything runs just fine....
Great, so it's "fixed". But is there someone who can explain why it wasn't working in the first place? My level of understanding the issue at this point is to cross myself and throw salt over my shoulder.
EDIT #2: Does the problem result from having such a small grid (23 x 23)? So either a bigger grid or smaller kernels would solve the problem without deleting a layer? That seems intuitively likely, but I'd like to know the match to calculate how the outputs from each layer match up with the definition of the next layer.
I am following PyTorch tutorial on speech command recogniton and trying to implement my own recognition of 22 sentences in german language. In the tutorial they use padding for audio tensors, but for labels they use only torch.stack. Because of that, I have an error, as I start training the network:
RuntimeError: stack expects each tensor to be equal size, but got [456] at entry 0 and [470] at entry 1.
I do understand what this says, but since I am new to PyTorch can't unfortunately implement padding function for sentences from scratch. Therefore I would be happy if you could give me some hints and tipps for this.
Here is the code for collate_fn and pad_sequence functions:
def pad_sequence(batch):
# Make all tensor in a batch the same length by padding with zeros
batch = [item.t() for item in batch]
batch = torch.nn.utils.rnn.pad_sequence(batch, batch_first=True, padding_value=0.)
return batch.permute(0, 2, 1)
def collate_fn(batch):
# A data tuple has the form:
# waveform, label
tensors, targets = [], []
# Gather in lists, and encode labels as indices
for waveform, label in batch:
tensors += [waveform]
targets += [label]
# Group the list of tensors into a batched tensor
tensors = pad_sequence(tensors)
targets = torch.stack(targets)
return tensors, targets
As I started working directly with pad_sequence, I understood how simple it works. So, in my case I needed only bunch of strings (batch), which were automatically compared by PyTorch and extended to the maximal length of the one of the several strings in the batch.
My code looks now like this:
def pad_AudioSequence(batch):
# Make all tensor in a batch the same length by padding with zeros
batch = [item.t() for item in batch]
batch = torch.nn.utils.rnn.pad_sequence(batch, batch_first=True, padding_value=0.)
return batch.permute(0, 2, 1)
def pad_TextSequence(batch):
return torch.nn.utils.rnn.pad_sequence(batch,batch_first=True, padding_value=0)
def collate_fn(batch):
# A data tuple has the form:
# waveform, label
tensors, targets = [], []
# Gather in lists, and encode labels as indices
for waveform, label in batch:
tensors += [waveform]
targets += [label]
# Group the list of tensors into a batched tensor
tensors = pad_AudioSequence(tensors)
targets = pad_TextSequence(targets)
return tensors, targets
For those, who still don't understand how that works, here is little example:
encDecClass2 = dummyEncoderDecoder()
sent1 = audioWorkerClass.sentences[4] # wie viel Prozent hat der Akku noch?
sent2 = audioWorkerClass.sentences[5] # Wie spät ist es?
sent3 = audioWorkerClass.sentences[6] # Mach einen Timer für 5 Sekunden.
# encode sentences into tensor of numbers, representing words, using my own enc-dec class
sent1 = encDecClass2.encode(sent1) # tensor([11, 94, 21, 94, 22, 94, 23, 94, 24, 94, 25, 94, 26, 94, 15, 94])
sent2 = encDecClass2.encode(sent2) # tensor([27, 94, 28, 94, 12, 94, 29, 94, 15, 94])
sent3 = encDecClass2.encode(sent3) # tensor([30, 94, 31, 94, 32, 94, 33, 94, 34, 94, 35, 94, 19, 94])
print(sent1.shape) # torch.Size([16])
print(sent2.shape) # torch.Size([10])
print(sent3.shape) # torch.Size([14])
batch = []
# add sentences to the batch as separate arrays
batch +=[sent1]
batch +=[sent2]
batch +=[sent3]
output = pad_sequence(batch,batch_first=True, padding_value=0)
print(f"{output}\n{output.shape}")
#############################################################################
# output:
# tensor([[11, 94, 21, 94, 22, 94, 23, 94, 24, 94, 25, 94, 26, 94, 15, 94],
# [27, 94, 28, 94, 12, 94, 29, 94, 15, 94, 0, 0, 0, 0, 0, 0],
# [30, 94, 31, 94, 32, 94, 33, 94, 34, 94, 35, 94, 19, 94, 0, 0]])
# torch.Size([3, 16])
#############################################################################
As you may see all arrays were equalized to the maximum length of those three arrays and padded with zeros. Shape of the output is 3x16, because we had three sentences and longest array had sequence of 16 in the batch.
My dataset looks like the following:
on the left, my inputs, and on the right the outputs.
The inputs are tokenized and converted to a list of indices, for instance, the molecule input:
'CC1(C)Oc2ccc(cc2C#HN3CCCC3=O)C#N'
is converted to:
[28, 28, 53, 69, 28, 70, 40, 2, 54, 2, 2, 2, 69, 2, 2, 54, 67, 28, 73, 33, 68, 69, 67, 28, 73, 73, 33, 68, 53, 40, 70, 39, 55, 28, 28, 28, 28, 55, 62, 40, 70, 28, 63, 39, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
I use the following list of chars as my map from strings to indices
cs = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z', 'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z',
'0','1','2','3','4','5','6','7','8','9',
'=','#',':','+','-','[',']','(',')','/','\'
, '#','.','%']
Thus, for every char in the input string, there is an index, and if the length of the input string is less than the max length of all inputs which is 100, I complement with zeros. (like in the above-shown example)
My model looks like this:
class LSTM_regr(torch.nn.Module) :
def __init__(self, vocab_size, embedding_dim, hidden_dim) :
super().__init__()
self.embeddings = nn.Embedding(vocab_size, embedding_dim, padding_idx=0)
self.lstm = nn.LSTM(embedding_dim, hidden_dim, batch_first=True)
self.linear = nn.Linear(hidden_dim, 1)
self.dropout = nn.Dropout(0.2)
def forward(self, x, l):
x = self.embeddings(x)
x = self.dropout(x)
lstm_out, (ht, ct) = self.lstm(x)
return self.linear(ht[-1])
vocab_size = 76
model = LSTM_regr(vocab_size, 20, 256)
My problem is, after training, every input I give to the model to test it, gives me the same output (i.e., 3.3318). Why is that?
My training loop:
def train_model_regr(model, epochs=10, lr=0.001):
parameters = filter(lambda p: p.requires_grad, model.parameters())
optimizer = torch.optim.Adam(parameters, lr=lr)
for i in range(epochs):
model.train()
sum_loss = 0.0
total = 0
for x, y, l in train_dl:
x = x.long()
y = y.float()
y_pred = model(x, l)
optimizer.zero_grad()
loss = F.mse_loss(y_pred, y.unsqueeze(-1))
loss.backward()
optimizer.step()
sum_loss += loss.item()*y.shape[0]
total += y.shape[0]
EDIT:
I figured it out, I reduced the learning rate from 0.01 to 0.0005 and reduced the batch size from 100 to 10 and it worked fine.
I think this makes sense, the model was training on large batch size, thus it was learning to output the mean always since that's what the loss function does.
Your LSTM_regr returns the last hidden state regardless of the true sequence length. That is, if your true sequence is of length 3, x is of length 100, and the output is the last hidden state after processing 97 padding elements.
You should compute the loss for the prediction that matches the true length of each sequence.
I figured it out, I reduced the learning rate from 0.01 to 0.0005 and reduced the batch size from 100 to 10 and it worked fine.
I think this makes sense, the model was training on large batch size, thus it was learning to output the mean always since that's what the loss function does.
I am trying to predict the best parameters for the random forest RandomizedSearchCV that is supposed to predict a continuous varible.
I've been looking at the following approach, in particular, changing scoring function and eventually settling for regression logistic function median_absolute_error. However, I think that the KFold cross-validation is not appropriate for my data, but I do not understand how I can, for example, use an iterable cv (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html) as I can not run (as far as I understand) fit and predict on my model before the RandomizedSearchCV
def my_custom_score(y_true, y_pred, dates_, features, labels):
return median_absolute_error(y_true, y_pred)
...
for i in range(0, 3): #predict 3 10-point intervals
prediction_colour = ['g','r','c','m','y','k','w'][i%7]
date_for_test = randint(11, 200) #end of the trend
dates_for_test = range(date_for_test-10, date_for_test) #one predicted interval should have 10 date points
for idx, date_for_test_ in enumerate(sorted(dates_for_test, reverse=True)):
train_features = features[sorted(dates_for_test, reverse=True)[0]-2:]
train_labels = labels[sorted(dates_for_test, reverse=True)[0]-2:]
test_features = np.atleast_2d(features[date_for_test_])
test_labels = labels[date_for_test_] if date_for_test != 0 else 1.0
rf = RanzomForestRegressor(bootstrap=False, criterion='mse', max_features=5, min_weight_fraction_leaf=0, n_jobs=1, oob_score=False, random_state=None, verbose=0, warm_start=False)
parameters = {"max_leaf_nodes": [2,5,10,15,20,25,30,35,40,45,50], "min_samples_leaf": [1,50,100,150,200,250,300,350,400,450,500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000], "min_samples_split": [2,50,100,150,200,250,300,350,400,450,500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000], 'n_estimators': [10, 100, 250, 500, 750, 1000, 1250, 1500, 1750, 2000, 2250, 2500, 2750, 3000, 3250, 3500, 3750, 4000, 4250, 4500, 4750, 5000, 5250, 5500, 5750, 6000, 6250, 6500, 6750, 7000, 7250, 7500, 7750, 8000, 8250, 8500, 8750, 9000, 9250, 9500, 9750, 10000], 'max_depth':[1,50,100,150,200,250,300,350,400,450,500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000]}
grid_search = RandomizedSearchCV(cv=5, estimator=rf, param_distributions=parameters, n_iter=10, scoring=make_scorer(median_absolute_error))#, scoring=make_scorer(lambda x,y: my_custom_score(x, y, sorted(dates_for_test, reverse=True), features, labels), greater_is_better=False)))
grid_search.fit(train_features, train_labels)
rf = grid_search.best_estimator_
best_parameters=rf.get_params()
print ("best parameters")
for param_name in sorted(parameters.keys()):
print("\t%s: %r" % (param_name, best_parameters[param_name]))
predictions = rf.predict(test_features)
Also, with the current approach I get the same continuous value predicted on out-of-sample temporal data of few dates into the future (different colours on the graph):
Documentation is quite detailed on this matter, but I find it too detailed. I just get lost there. Maybe someone could point in the right direction?
I am doing a grid search on a RandomForestClassifier and my code has been working until I changed the features and suddenly the code generates the following error (at line classifier.fit)
I did not change any code, but reduced the feature dimensions from 16 to 8. I am totally confused as to what I should look into. What does this error mean?
Error:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/zqz/Programs/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 344, in __call__
return self.func(*args, **kwargs)
File "/home/zqz/Programs/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/home/zqz/Programs/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 131, in <listcomp>
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/home/zqz/Programs/anaconda3/lib/python3.5/site-packages/sklearn/ensemble/forest.py", line 120, in _parallel_build_trees
tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
File "/home/zqz/Programs/anaconda3/lib/python3.5/site-packages/sklearn/tree/tree.py", line 739, in fit
X_idx_sorted=X_idx_sorted)
File "/home/zqz/Programs/anaconda3/lib/python3.5/site-packages/sklearn/tree/tree.py", line 246, in fit
raise ValueError("max_features must be in (0, n_features]")
ValueError: max_features must be in (0, n_features]
Code:
classifier = RandomForestClassifier(n_estimators=20, n_jobs=-1)
rfc_tuning_params = {"max_depth": [3, 5, None],
"max_features": [1, 3, 5, 7, 10],
"min_samples_split": [2, 5, 10],
"min_samples_leaf": [1, 3, 10],
"bootstrap": [True, False],
"criterion": ["gini", "entropy"]}
classifier = GridSearchCV(classifier, param_grid=rfc_tuning_params, cv=nfold,
n_jobs=cpus)
model_file = os.path.join(os.path.dirname(__file__), "random-forest_classifier-%s.m" % task)
classifier.fit(X_train, y_train) #line that causes the error
nfold_predictions=cross_val_predict(classifier.best_estimator_, X_train, y_train, cv=nfold)
In your rfc_tuning_params, you have "max_features": [1, 3, 5, 7, 10]. That includes 10, which is bigger than the number of features (8). Hence you get the error
ValueError: max_features must be in (0, n_features]
So you need to remove the 10 from "max_features".