scikit-learn RandomForestClassifier, stopped working, advice on how to debug - scikit-learn

I am doing a grid search on a RandomForestClassifier and my code has been working until I changed the features and suddenly the code generates the following error (at line classifier.fit)
I did not change any code, but reduced the feature dimensions from 16 to 8. I am totally confused as to what I should look into. What does this error mean?
Error:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/zqz/Programs/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 344, in __call__
return self.func(*args, **kwargs)
File "/home/zqz/Programs/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/home/zqz/Programs/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 131, in <listcomp>
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/home/zqz/Programs/anaconda3/lib/python3.5/site-packages/sklearn/ensemble/forest.py", line 120, in _parallel_build_trees
tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
File "/home/zqz/Programs/anaconda3/lib/python3.5/site-packages/sklearn/tree/tree.py", line 739, in fit
X_idx_sorted=X_idx_sorted)
File "/home/zqz/Programs/anaconda3/lib/python3.5/site-packages/sklearn/tree/tree.py", line 246, in fit
raise ValueError("max_features must be in (0, n_features]")
ValueError: max_features must be in (0, n_features]
Code:
classifier = RandomForestClassifier(n_estimators=20, n_jobs=-1)
rfc_tuning_params = {"max_depth": [3, 5, None],
"max_features": [1, 3, 5, 7, 10],
"min_samples_split": [2, 5, 10],
"min_samples_leaf": [1, 3, 10],
"bootstrap": [True, False],
"criterion": ["gini", "entropy"]}
classifier = GridSearchCV(classifier, param_grid=rfc_tuning_params, cv=nfold,
n_jobs=cpus)
model_file = os.path.join(os.path.dirname(__file__), "random-forest_classifier-%s.m" % task)
classifier.fit(X_train, y_train) #line that causes the error
nfold_predictions=cross_val_predict(classifier.best_estimator_, X_train, y_train, cv=nfold)

In your rfc_tuning_params, you have "max_features": [1, 3, 5, 7, 10]. That includes 10, which is bigger than the number of features (8). Hence you get the error
ValueError: max_features must be in (0, n_features]
So you need to remove the 10 from "max_features".

Related

Pytorch transformation for just certain batch

Hi is there any method for apply trasnformation for certain batch?
It means, I want apply trasnformation for just last batch in every epochs.
What I tried is here
import torch
class test(torch.utils.data.Dataset):
def __init__(self):
self.source = [i for i in range(10)]
def __len__(self):
return len(self.source)
def __getitem__(self, idx):
print(idx)
return self.source[idx]
ds = test()
dl = torch.utils.data.DataLoader(dataset = ds, batch_size = 3,
shuffle = False, num_workers = 5)
for i in dl:
print(i)
because I thought that if I could get idx number, it would be possible to apply for certain batchs.
However If using num_workers outputs are
0
1
2
3
964
57
8
tensor([0, 1, 2])
tensor([3, 4, 5])
tensor([6, 7, 8])
tensor([9])
which are not I thought
without num_worker
0
1
2
tensor([0, 1, 2])
3
4
5
tensor([3, 4, 5])
6
7
8
tensor([6, 7, 8])
9
tensor([9])
So the question is
Why idx works so with num_workers?
How can I apply trasnform for certain batchs (or certain idx)?
When you have num_workers > 1, you have multiple subprocesses doing data loading in parallel. So what is likely happening is that there is a race condition for the print step, and the order you see in the output depends on which subprocess goes first each time.
For most transforms, you can apply them on a specific batch simply by calling the transform after the batch has been loaded. To do this just for the last batch, you could do something like:
for batch_idx, batch_data in dl:
# check if batch is the last batch
if ((batch_idx+1) * batch_size) >= len(ds):
batch_data = transform(batch_data)
I found that
class test_dataset(torch.utils.data.Dataset):
def __init__(self):
self.a = [i for i in range(100)]
def __len__(self):
return len(self.a)
def __getitem__(self, idx):
a = torch.tensor(self.a[idx])
#print(idx)
return idx
a = torch.utils.data.DataLoader(
test_dataset(), batch_size = 10, shuffle = False,
num_workers = 10, pin_memory = True)
for i in a:
print(i)
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
tensor([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
tensor([20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
tensor([30, 31, 32, 33, 34, 35, 36, 37, 38, 39])
tensor([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])
tensor([50, 51, 52, 53, 54, 55, 56, 57, 58, 59])
tensor([60, 61, 62, 63, 64, 65, 66, 67, 68, 69])
tensor([70, 71, 72, 73, 74, 75, 76, 77, 78, 79])
tensor([80, 81, 82, 83, 84, 85, 86, 87, 88, 89])
tensor([90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

Embedding: argument indices must be a Tensor, not a list

I am trying to train a RNN, but I am having trouble with my embedding.
I am getting the following error message:
TypeError: embedding(): argument 'indices' (position 2) must be Tensor, not list
The code in the forward method starts like that:
def forward(self, word_indices: [int]):
print("sentences")
print(len(word_indices))
print(word_indices)
word_ind_tensor = torch.tensor(word_indices, device="cpu")
print(word_ind_tensor)
print(word_ind_tensor.size())
embeds_word = self.embedding_word(word_indices)
The output of all of that is:
sentences
29
[261, 15, 5149, 44, 287, 688, 1125, 4147, 9874, 582, 15, 9875, 3, 2, 6732, 34, 2, 6733, 9, 2, 485, 7, 6734, 3, 741, 2, 2179, 1571, 1]
tensor([ 261, 15, 5149, 44, 287, 688, 1125, 4147, 9874, 582, 15, 9875,
3, 2, 6732, 34, 2, 6733, 9, 2, 485, 7, 6734, 3,
741, 2, 2179, 1571, 1])
torch.Size([29])
Traceback (most recent call last):
File "/home/lukas/Documents/HU/Materialen/21SoSe-Studienprojekt/flair-Studienprojekt/TestModel.py", line 68, in <module>
embeddings_storage_mode = "CPU") #auf cuda ändern
File "/home/lukas/Documents/HU/Materialen/21SoSe-Studienprojekt/flair-Studienprojekt/flair/trainers/trainer.py", line 423, in train
loss = self.model.forward_loss(batch_step)
File "/home/lukas/Documents/HU/Materialen/21SoSe-Studienprojekt/flair-Studienprojekt/flair/models/sandbox/srl_tagger.py", line 122, in forward_loss
features = self.forward(word_indices = sent_word_ind, frame_indices = sent_frame_ind)
File "/home/lukas/Documents/HU/Materialen/21SoSe-Studienprojekt/flair-Studienprojekt/flair/models/sandbox/srl_tagger.py", line 147, in forward
embeds_word = self.embedding_word(word_indices)
File "/home/lukas/miniconda3/envs/studienprojekt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/lukas/miniconda3/envs/studienprojekt/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 114, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "/home/lukas/miniconda3/envs/studienprojekt/lib/python3.7/site-packages/torch/nn/functional.py", line 1724, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
TypeError: embedding(): argument 'indices' (position 2) must be Tensor, not list
I originally initialised the embedding the following way:
self.embedding_word = torch.nn.Embedding(self.word_dict_size, embedding_size)
word_dict_size and embedding_size are both integers.
Is there something obviously I did wrong or is that a deeper mistake?
You're passing in a list to self.embedding_word: word_indices, not the tensor you just created for that purpose word_ind_tensor.

How can I feed my ndarray data to the TensorShape input of the basic Keras model?

I want to feed data to the basic keras model. The input has the following shape and type. I do not know if I made the wrong settings in the model layer, but I get the following error.
My environmnet is Windows 10 (64-bits), Python 3.6.7 (Anaconda), TensorFlow 1.12.0, Keras 2.2.4, PyCharm 2018.3.3.
Input:
x input shape: (23714, 160), y input shape: (23714, 7)
In: x
Out:
array([[ 7, 19, 6, ..., 0, 0, 0],
[11, 1, 16, ..., 0, 0, 0],
[ 6, 13, 10, ..., 0, 0, 0],
...,
[ 6, 13, 7, ..., 0, 0, 0],
[11, 13, 9, ..., 10, 0, 0],
[ 9, 13, 9, ..., 0, 0, 0]])
In: y
Out:
array([[0, 1, 0, ..., 0, 0, 0],
[0, 1, 0, ..., 0, 0, 0],
[0, 1, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 1],
[0, 0, 0, ..., 0, 0, 1],
[0, 0, 0, ..., 0, 0, 1]], dtype=int8)
Model:
model = Sequential()
model.add(Dense(64, input_dim=(160, ), activation='relu'))
model.add(Dense(7, activation="softmax"))
model.summary()
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
performance_test = model.evaluate(x_test, y_test, batch_size=100)
print(model.summary())
print('Test Loss and Accuracy ->', performance_test)
Error:
Traceback (most recent call last):
File "C:\Users\terry\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\eager\execute.py", line 141, in make_shape
shape = tensor_shape.as_shape(v)
File "C:\Users\terry\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\tensor_shape.py", line 947, in as_shape
return TensorShape(shape)
File "C:\Users\terry\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\tensor_shape.py", line 542, in __init__
self._dims = [as_dimension(d) for d in dims_iter]
File "C:\Users\terry\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\tensor_shape.py", line 542, in <listcomp>
self._dims = [as_dimension(d) for d in dims_iter]
File "C:\Users\terry\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\tensor_shape.py", line 482, in as_dimension
return Dimension(value)
File "C:\Users\terry\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\tensor_shape.py", line 37, in __init__
self._value = int(value)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'tuple'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\terry\Anaconda3\envs\tensorflow\lib\site-packages\IPython\core\interactiveshell.py", line 3267, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-721312fb06f9>", line 1, in <module>
runfile('C:/Users/terry/Desktop/Project/Test/Test.py', wdir='C:/Users/terry/Desktop/Project/Test')
File "C:\Program Files\JetBrains\PyCharm 2018.3.3\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "C:\Program Files\JetBrains\PyCharm 2018.3.3\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/terry/Desktop/Project/Test/Test.py", line 187, in <module>
model.add(Dense(64, input_dim=(160, ), activation='relu'))
File "C:\Users\terry\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\sequential.py", line 161, in add
name=layer.name + '_input')
File "C:\Users\terry\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\input_layer.py", line 178, in Input
input_tensor=tensor)
File "C:\Users\terry\Anaconda3\envs\tensorflow\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Users\terry\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\input_layer.py", line 87, in __init__
name=self.name)
File "C:\Users\terry\Anaconda3\envs\tensorflow\lib\site-packages\keras\backend\tensorflow_backend.py", line 517, in placeholder
x = tf.placeholder(dtype, shape=shape, name=name)
File "C:\Users\terry\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1747, in placeholder
return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
File "C:\Users\terry\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 6250, in placeholder
shape = _execute.make_shape(shape, "shape")
File "C:\Users\terry\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\eager\execute.py", line 143, in make_shape
raise TypeError("Error converting %s to a TensorShape: %s." % (arg_name, e))
TypeError: Error converting shape to a TensorShape: int() argument must be a string, a bytes-like object or a number, not 'tuple'.
I've read about keras input_shape, input_dim, tensorflow tensorshape, python tuple, list, numpy ndarray, but I can not find a solution. I would really appreciate your help.
I think the parameter to Dense should be input_shape and not input_dim. Refer the doc

keras Concatenate mulitple layers cause AttributeError: 'NoneType' object has no attribute '_inbound_nodes'

I am trying to add some fixed kernels in my CNN, please see my codes below.
This is how I create my kernels:
# Kernels
def create_kernel(x):
t = pipe(
x,
lambda x: tf.constant(x, dtype=tf.float32),
lambda x: tf.reshape(x, [3, 3, 1, 1]))
return t
k_edge1 = create_kernel([1, 0, -1, 0, 0, 0, -1, 0, 1])
k_edge2 = create_kernel([0, 1, 0, 1, -4, 1, 0, 1, 0])
k_edge3 = create_kernel([-1, -1, -1, -1, 8, -1, -1, -1, -1])
and my convolution network is like:
# Convolution network
# Input layer
l_input = Input(shape=(28**2, ))
# Reshape layer
l_reshape = Reshape(target_shape=(28, 28, 1))(l_input)
# Convolution layers
l_conv1 = Conv2D(filters=20, kernel_size=(3, 3), padding='valid')(l_reshape)
l_edge1 = tf.nn.conv2d(l_reshape, k_edge1, strides=[1, 1, 1, 1], padding='VALID')
l_edge2 = tf.nn.conv2d(l_reshape, k_edge2, strides=[1, 1, 1, 1], padding='VALID')
l_edge3 = tf.nn.conv2d(l_reshape, k_edge3, strides=[1, 1, 1, 1], padding='VALID')
l_conv1a = Concatenate(axis=3)([l_conv1, l_edge1, l_edge2, l_edge3]) # <- The error should be caused by this line.
l_conv2 = Conv2D(filters=20, kernel_size=(3, 3), padding='valid')(l_conv1a)
l_pool1 = MaxPooling2D(pool_size=(2, 2), border_mode='valid')(l_conv2)
# Flatten layer
l_flat = Flatten()(l_pool1)
# Fully connected layers
l_fc1 = Dense(50, kernel_initializer='he_normal')(l_flat)
l_act1 = PReLU()(l_fc1)
l_fc3 = Dense(10, kernel_initializer='he_normal')(l_act1)
l_output = Activation('softmax')(l_fc1)
# Model
cnn_model = Model(l_input, l_output)
However, I got the following error:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "C:\Users\Perry Cheng\AppData\Local\conda\conda\envs\ml_py_3_6\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Users\Perry Cheng\AppData\Local\conda\conda\envs\ml_py_3_6\lib\site-packages\keras\engine\network.py", line 93, in __init__
self._init_graph_network(*args, **kwargs)
File "C:\Users\Perry Cheng\AppData\Local\conda\conda\envs\ml_py_3_6\lib\site-packages\keras\engine\network.py", line 237, in _init_graph_network
self.inputs, self.outputs)
File "C:\Users\Perry Cheng\AppData\Local\conda\conda\envs\ml_py_3_6\lib\site-packages\keras\engine\network.py", line 1353, in _map_graph_network
tensor_index=tensor_index)
File "C:\Users\Perry Cheng\AppData\Local\conda\conda\envs\ml_py_3_6\lib\site-packages\keras\engine\network.py", line 1340, in build_map
node_index, tensor_index)
File "C:\Users\Perry Cheng\AppData\Local\conda\conda\envs\ml_py_3_6\lib\site-packages\keras\engine\network.py", line 1340, in build_map
node_index, tensor_index)
File "C:\Users\Perry Cheng\AppData\Local\conda\conda\envs\ml_py_3_6\lib\site-packages\keras\engine\network.py", line 1340, in build_map
node_index, tensor_index)
[Previous line repeated 2 more times]
File "C:\Users\Perry Cheng\AppData\Local\conda\conda\envs\ml_py_3_6\lib\site-packages\keras\engine\network.py", line 1312, in build_map
node = layer._inbound_nodes[node_index]
AttributeError: 'NoneType' object has no attribute '_inbound_nodes'
After some testing, I think the error comes from:
l_conv1a = Concatenate(axis=3)([l_conv1, l_edge1, l_edge2, l_edge3])
Is there any way to solve it?
Keras layers accepts Keras Tensors and not Tensors as their input. So if you would like to use tf.nn.conv2d instead of Conv2D layers in Keras, you need to wrap them inside a Lambda layer:
l_edge1 = Lambda(lambda x: tf.nn.conv2d(x, k_edge1, strides=[1, 1, 1, 1], padding='VALID'))(l_reshape)
l_edge2 = Lambda(lambda x: tf.nn.conv2d(x, k_edge2, strides=[1, 1, 1, 1], padding='VALID'))(l_reshape)
l_edge3 = Lambda(lambda x: tf.nn.conv2d(x, k_edge3, strides=[1, 1, 1, 1], padding='VALID'))(l_reshape)
You cannot use TF functions directly on Keras tensors as you are doing here:
l_edge1 = tf.nn.conv2d(l_reshape, k_edge1, strides=[1, 1, 1, 1], padding='VALID')
l_edge2 = tf.nn.conv2d(l_reshape, k_edge2, strides=[1, 1, 1, 1], padding='VALID')
l_edge3 = tf.nn.conv2d(l_reshape, k_edge3, strides=[1, 1, 1, 1], padding='VALID')
What you should do is to just use the Conv2D layer and then set the weights manually using layer.set_weights(array). To keep weights non-trainable, just set layer.trainable = False, like:
conv = Conv2D(filters=1, kernel_size(3, 3), padding='valid')
conv.set_weights(your_weight_array)
conv.trainable = False
l_edge1 = conv(l_reshape)
And similarly for the other two Conv2D layers.

Convert python sequence with multiple datatypes to tensor

I'm using TensorFlow r1.7 and python3.6.5. I am also very new to TensorFlow, so I'd like easy to read explanations if possible.
I'm trying to convert my input data into a dataset of tensors with this function tf.data.Dataset.from_tensor_slices(). I pass my tuple with mixed datatypes into this function. However, when running my code I get this error: ValueError: Can't convert Python sequence with mixed types to Tensor.
I want to know why I am receiving this error, and how I can convert my data to a dataset of tensors even with mixed datatypes.
Here's a printout of the top 5 entries in my tuple.
(13501, 2, None, 51, '2232', 'S35', '734.72', 'CLA', '240', 1035, 2060, 1252, 1182, 10, '967.28', '338.50', None, 14, 102, 3830)
(15124, 2, None, 57, '2641', 'S35', '234.80', 'DDA', '240', 743, 1597, 4706, 156, 0, None, None, None, 3, 27, 981)
(40035, 2, None, None, '21', 'K00', '60.06', 'CHK', '520', 76, 1863, 12, None, 1, '85.06', '25.00', None, 1, 5, 245)
(42331, 3, None, 62, '121', 'S50', '1859.01', 'ACT', '420', 952, 1583, 410, 255, 0, None, None, None, 6, 117, 1795)
(201721, 3, None, 42, '2472', 'S35', '1413.84', 'CLA', '350', 868, 1746, 963, 264, 0, None, None, None, 18, 65, 4510)
As you can see, I have a mix of integers, floats, and strings in my input data.
Here is a traceback of the error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/miikey101/Documents/Khalen_Case_Loader/tensorflow/k_means/k_means.py", line 10, in prepare_dataset
dataset = tf.data.Dataset.from_tensor_slices(dm_data)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 222, in from_tensor_slices
return TensorSliceDataset(tensors)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1017, in __init__
for i, t in enumerate(nest.flatten(tensors))
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1017, in <listcomp>
for i, t in enumerate(nest.flatten(tensors))
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 950, in convert_to_tensor
as_ref=False)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1040, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 235, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 185, in constant
t = convert_to_eager_tensor(value, ctx, dtype)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 131, in convert_to_eager_tensor
return ops.EagerTensor(value, context=handle, device=device, dtype=dtype)
ValueError: Can't convert Python sequence with mixed types to Tensor.
In tensorflow you can't have a tensor with more than one data type.
Quoting the documentation:
It is not possible to have a tf.Tensor with more than one data type. It is possible, however, to serialize arbitrary data structures as strings and store those in tf.Tensors.
Hence a workaround could be to create a tensor with data type tf.String and, on the occurrence, cast the field to the desired data type
You want a tensor for each of your features (columns). Only if it's a multi-dimensional feature (like an image, a video, list of strings, vector) would you have more dimensions in the tensor and even then they would all have the same datatype.
tf.data.Dataset.from_tensor_slices() will accept your input as a dictionary of lists (key is the name of the feature, value is a list of the values in that feature), or as a list of lists. I can't remember if it eats Pandas dataframes but if it doesn't you can easily convert it to a dictionary df.to_dict().
However, you can't input None values. You will have to find some value for those before converting into a tensor. Classic approaches to that is median value, zero value, most common value, "missing"/"unknown" value for strings or categories, or imputation.

Resources