What n_estimators and max_features means in RandomForestRegressor

What n_estimators and max_features means in RandomForestRegressor - scikit-learn

I was reading about fine tuning the model using GridSearchCV and I came across a Parameter Grid Shown below :
param_grid = [
{'n_estimators': [3, 10, 30], 'max_features': [2, 4, 6, 8]},
{'bootstrap': [False], 'n_estimators': [3, 10], 'max_features': [2, 3, 4]},
]
forest_reg = RandomForestRegressor(random_state=42)
# train across 5 folds, that's a total of (12+6)*5=90 rounds of training
grid_search = GridSearchCV(forest_reg, param_grid, cv=5,
scoring='neg_mean_squared_error')
grid_search.fit(housing_prepared, housing_labels)
Here I am not getting the concept of n_estimator and max_feature. Is it like n_estimator means number of records from data and max_features means number of attributes to be selected from data?
After Going further I got this result :
>> grid_search.best_params_
{'max_feature':8, 'n_estimator':30}
So the thing is I am not getting what Actually this result want to say..

After reading the documentation for RandomForest Regressor you can see that n_estimators is the number of trees to be used in the forest. Since Random Forest is an ensemble method comprising of creating multiple decision trees, this parameter is used to control the number of trees to be used in the process.
max_features on the other hand, determines the maximum number of features to consider while looking for a split. For more information on max_features read this answer.

n_estimators: This is the number of trees (in general the number of samples on which this algorithm will work then it will aggregate them to give you the final answer) you want to build before taking the maximum voting or averages of predictions. The higher number of trees give you better performance but makes your code slower.
max_features: The number of features to consider when looking for the best split.
>> grid_search.best_params_ :- {'max_feature':8, 'n_estimator':30}
This means they are best hyperparameter you should run model among n_estimators{3,10,30} or max_features {2, 4, 6, 8}

Related

How to perform Grid Search on a multi-label classification problem?

I am learning Scikit-Learn and I am trying to perform a grid search on a multi label classification problem. This is what I wrote:
from sklearn.model_selection import GridSearchCV
param_grid = [
{'randomforestclassifier__n_estimators': [3, 10, 30], 'randomforestclassifier__max_features': [2, 4, 5, 8]},
{'randomforestclassifier__bootstrap': [False], 'randomforestclassifier__n_estimators': [3, 10], 'randomforestclassifier__max_features': [2, 3, 4]}
]
rf_classifier = OneVsRestClassifier(
make_pipeline(RandomForestClassifier(random_state=42))
)
grid_search = GridSearchCV(rf_classifier, param_grid=param_grid, cv=5, scoring = 'f1_micro')
grid_search.fit(X_train_prepared, y_train)
However when I run it I get the following error:
ValueError: Invalid parameter randomforestclassifier for estimator
OneVsRestClassifier(estimator=Pipeline(steps=[('randomforestclassifier',
RandomForestClassifier(random_state=42))])). Check the list of available parameters
with `estimator.get_params().keys()`.
I tried to run also the grid_search.estimator.get_params().keys() command but I just get a list of parameters containing the ones that I have written, therefore I am not sure what I should do.
Would you be able to suggest what the issue is and how I can run the grid search properly?

You would have been able to define param_grid as you did in case rf_classifier was a Pipeline object. Quoting the Pipeline's docs
The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a '__'.
In your case, instead, rf_classifier is OneVsRestClassifier instance. Therefore, before setting the parameters for the RFC, you'll need to be able to access the pipeline instance, which you can do via OneVsRestClassifier estimator's parameter, as follows:
param_grid = [
{'estimator__randomforestclassifier__n_estimators': [3, 10, 30],
'estimator__randomforestclassifier__max_features': [2, 4, 5, 8]
},
{'estimator__randomforestclassifier__bootstrap': [False],
'estimator__randomforestclassifier__n_estimators': [3, 10],
'estimator__randomforestclassifier__max_features': [2, 3, 4]
}
]

PyTorch: How to do inference in batches (inference in parallel)

How to do inference in batches in PyTorch? How to do inference in parallel to speed up that part of the code.
I've started with the standard way of doing inference:
with torch.no_grad():
for inputs, labels in dataloader['predict']:
inputs = inputs.to(device)
output = model(inputs)
output = output.to(device)
And I've researched and the only mention of doing inference in parallel (in the same machine) seems to be with the library Dask: https://examples.dask.org/machine-learning/torch-prediction.html
Currently attempting to understand that library and create a working example. In the meanwhile do you know of a better way?

In pytorch, the input tensors always have the batch dimension in the first dimension. Thus doing inference by batch is the default behavior, you just need to increase the batch dimension to larger than 1.
For example, if your single input is [1, 1], its input tensor is [[1, 1], ] with shape (1, 2). If you have two inputs [1, 1] and [2, 2], generate the input tensor as [[1, 1], [2, 2], ] with shape (2, 2). This is usually done in the batch generator function such as your dataloader.

Multilabel classification of a sequence, how to do it?

I am quite new to the deep learning field especially Keras. Here I have a simple problem of classification and I don't know how to solve it. What I don't understand is how the general process of the classification, like converting the input data into tensors, the labels, etc.
Let's say we have three classes, 1, 2, 3.
There is a sequence of classes that need to be classified as one of those classes. The dataset is for example
Sequence 1, 1, 1, 2 is labeled 2
Sequence 2, 1, 3, 3 is labeled 1
Sequence 3, 1, 2, 1 is labeled 3
and so on.
This means the input dataset will be
[[1, 1, 1, 2],
[2, 1, 3, 3],
[3, 1, 2, 1]]
and the label will be
[[2],
[1],
[3]]
Now one thing that I do understand is to one-hot encode the class. Because we have three classes, every 1 will be converted into [1, 0, 0], 2 will be [0, 1, 0] and 3 will be [0, 0, 1]. Converting the example above will give a dataset of 3 x 4 x 3, and a label of 3 x 1 x 3.
Another thing that I understand is that the last layer should be a softmax layer. This way if a test data like (e.g. [1, 2, 3, 4]) comes out, it will be softmaxed and the probabilities of this sequence belonging to class 1 or 2 or 3 will be calculated.
Am I right? If so, can you give me an explanation/example of the process of classifying these sequences?
Thank you in advance.

Here are a few clarifications that you seem to be asking about.
This point was confusing so I deleted it.
If your input data has the shape (4), then your input tensor will have the shape (batch_size, 4).
Softmax is the correct activation for your prediction (last) layer
given your desired output, because you have a classification problem
with multiple classes. This will yield output of shape (batch_size,
3). These will be the probabilities of each potential classification, summing to one across all classes. For example, if the classification is class 0, then a single prediction might look something like [0.9714,0.01127,0.01733].
Batch size isn't hard-coded to the network, hence it is represented in model.summary() as None. E.g. the network's last-layer output shape can be written (None, 3).
Unless you have an applicable alternative, a softmax prediction layer requires a categorical_crossentropy loss function.
The architecture of a network remains up to you, but you'll at least need a way in and a way out. In Keras (as you've tagged), there are a few ways to do this. Here are some examples:
Example with Keras Sequential
model = Sequential()
model.add(InputLayer(input_shape=(4,))) # sequence of length four
model.add(Dense(3, activation='softmax')) # three possible classes
Example with Keras Functional
input_tensor = Input(shape=(4,))
x = Dense(3, activation='softmax')(input_tensor)
model = Model(input_tensor, x)
Example including input tensor shape in first functional layer (either Sequential or Functional):
model = Sequential()
model.add(Dense(666, activation='relu', input_shape=(4,)))
model.add(Dense(3, activation='softmax'))
Hope that helps!

After choosing K-components in PCA how do we find out which components(names of the columns) have algorithm selected?

I am new to Data Science and I need some help to understand PCA.I know that each of columns constitute one axis,but when PCA is done and components are reduced to some k value,How to know which all columns got selected?

In PCA you compute the eigenvectors and eigenvalues of the covariance matrix to identify the principal components.
Principal components are new variables that are constructed as linear combinations or mixtures of the initial variables. These combinations are done in such a way that the new variables (i.e., principal components) are uncorrelated and most of the information within the initial variables is squeezed or compressed into the first components. So, the idea is 10-dimensional data gives you 10 principal components, but PCA tries to put maximum possible information in the first component, then maximum remaining information in the second and so on.
Geometrically speaking, principal components represent the directions of the data that explain a maximal amount of variance, that is to say, the lines that capture most information of the data. s there are as many principal components as there are variables in the data, principal components are constructed in such a manner that the first principal component accounts for the largest possible variance in the data set.
According to my experience, if the percentage of cumulative sum of Eigen values can over 80% or 90%, the transformed vectors will be enough to represent the old vectors.
To explain clearly lets use #Nicholas M's code.
import numpy as np
from sklearn.decomposition import PCA
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
pca = PCA(n_components=1)
pca.fit(X)
You must increase the n_components to get %90 variance.
Input:
pca.explained_variance_ratio_
Output:
array([0.99244289])
On this example just 1 component is enough.
I hope its all clear to understand.
Resources:
https://towardsdatascience.com/pca-using-python-scikit-learn-e653f8989e60
https://towardsdatascience.com/a-step-by-step-explanation-of-principal-component-analysis-b836fb9c97e2

You have to look at Eigenvectors of the PCA. Each Eigenvalues are the "force" of each "new axis" and the eigenvector provide the linear combination of your original features.
With scikit-learn, you should look at the attribute components_
import numpy as np
from sklearn.decomposition import PCA
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
pca = PCA(n_components=2)
pca.fit(X)
print(pca.components_) # << eigenvector matrix

GRU/LSTM in Keras with input sequence of varying length

I'm working on a smaller project to better understand RNN, in particualr LSTM and GRU. I'm not at all an expert, so please bear that in mind.
The problem I'm facing is given as data in the form of:
>>> import numpy as np
>>> import pandas as pd
>>> pd.DataFrame([[1, 2, 3],[1, 2, 1], [1, 3, 2],[2, 3, 1],[3, 1, 1],[3, 3, 2],[4, 3, 3]], columns=['person', 'interaction', 'group'])
person interaction group
0 1 2 3
1 1 2 1
2 1 3 2
3 2 3 1
4 3 1 1
5 3 3 2
6 4 3 3
this is just for explanation. We have different person interacting with different groups in different ways. I've already encoded the various features. The last interaction of a user is always a 3, which means selecting a certain group. In the short example above person 1 chooses group 2, person 2 chooses group 1 and so on.
My whole data set is much bigger but I would like to understand first the conceptual part before throwing models at it. The task I would like to learn is given a sequence of interaction, which group is chosen by the person. A bit more concrete, I would like to have an output a list with all groups (there are 3 groups, 1, 2, 3) sorted by the most likely choice, followed by the second and third likest group. The loss function is therefore a mean reciprocal rank.
I know that in Keras Grus/LSTM can handle various length input. So my three questions are.
The input is of the format:
(samples, timesteps, features)
writing high level code:
import keras.layers as L
import keras.models as M
model_input = L.Input(shape=(?, None, 2))
timestep=None should imply the varying size and 2 is for the feature interaction and group. But what about the samples? How do I define the batches?
For the output I'm a bit puzzled how this should look like in this example? I think for each last interaction of a person I would like to have a list of length 3. Assuming I've set up the output
model_output = L.LSTM(3, return_sequences=False)
I then want to compile it. Is there a way of using the mean reciprocal rank?
model.compile('adam', '?')
I know the questions are fairly high level, but I would like to understand first the big picture and start to play around. Any help would therefore be appreciated.

The concept you've drawn in your question is a pretty good start already. I'll add a few things to make it work, as well as a code example below:
You can specify LSTM(n_hidden, input_shape=(None, 2)) directly, instead of inserting an extra Input layer; the batch dimension is to be omitted for the definition.
Since your model is going to perform some kind of classification (based on time series data) the final layer is what we'd expect from "normal" classification as well, a Dense(num_classes, action='softmax'). Chaining the LSTM and the Dense layer together will first pass the time series input through the LSTM layer and then feed its output (determined by the number of hidden units) into the Dense layer. activation='softmax' allows to compute a class score for each class (we're going to use one-hot-encoding in a data preprocessing step, see code example below). This means class scores are not ordered, but you can always do so via np.argsort or np.argmax.
Categorical crossentropy loss is suited for comparing the classification score, so we'll use that one: model.compile(loss='categorical_crossentropy', optimizer='adam').
Since the number of interactions. i.e. the length of model input, varies from sample to sample we'll use a batch size of 1 and feed in one sample at a time.
The following is a sample implementation w.r.t to the above considerations. Note that I modified your sample data a bit, in order to provide more "reasoning" behind group choices. Also each person needs to perform at least one interaction before choosing a group (i.e. the input sequence cannot be empty); if this is not the case for your data, then introducing an additional no-op interaction (e.g. 0) can help.
import pandas as pd
import tensorflow as tf
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.LSTM(10, input_shape=(None, 2))) # LSTM for arbitrary length series.
model.add(tf.keras.layers.Dense(3, activation='softmax')) # Softmax for class probabilities.
model.compile(loss='categorical_crossentropy', optimizer='adam')
# Example interactions:
# * 1: Likes the group,
# * 2: Dislikes the group,
# * 3: Chooses the group.
df = pd.DataFrame([
[1, 1, 3],
[1, 1, 3],
[1, 2, 2],
[1, 3, 3],
[2, 2, 1],
[2, 2, 3],
[2, 1, 2],
[2, 3, 2],
[3, 1, 1],
[3, 1, 1],
[3, 1, 1],
[3, 2, 3],
[3, 2, 2],
[3, 3, 1]],
columns=['person', 'interaction', 'group']
)
data = [person[1][['interaction', 'group']].values for person in df.groupby('person')]
x_train = [x[:-1] for x in data]
y_train = tf.keras.utils.to_categorical([x[-1, 1]-1 for x in data]) # Expects class labels from 0 to n (-> subtract 1).
print(x_train)
print(y_train)
class TrainGenerator(tf.keras.utils.Sequence):
def __init__(self, x, y):
self.x = x
self.y = y
def __len__(self):
return len(self.x)
def __getitem__(self, index):
# Need to expand arrays to have batch size 1.
return self.x[index][None, :, :], self.y[index][None, :]
model.fit_generator(TrainGenerator(x_train, y_train), epochs=1000)
pred = [model.predict(x[None, :, :]).ravel() for x in x_train]
for p, y in zip(pred, y_train):
print(p, y)
And the corresponding sample output:
[...]
Epoch 1000/1000
3/3 [==============================] - 0s 40ms/step - loss: 0.0037
[0.00213619 0.00241093 0.9954529 ] [0. 0. 1.]
[0.00123938 0.99718493 0.00157572] [0. 1. 0.]
[9.9632275e-01 7.5039308e-04 2.9268670e-03] [1. 0. 0.]
Using custom generator expressions: According to the documentation we can use any generator to yield the data. The generator is expected to yield batches of the data and loop over the whole data set indefinitely. When using tf.keras.utils.Sequence we do not need to specify the parameter steps_per_epoch as this will default to len(train_generator). Hence, when using a custom generator, we shall provide this parameter as well:
import itertools as it
model.fit_generator(((x_train[i % len(x_train)][None, :, :],
y_train[i % len(y_train)][None, :]) for i in it.count()),
epochs=1000,
steps_per_epoch=len(x_train))

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

What n_estimators and max_features means in RandomForestRegressor - scikit-learn

Related

How to perform Grid Search on a multi-label classification problem?

PyTorch: How to do inference in batches (inference in parallel)

Multilabel classification of a sequence, how to do it?

After choosing K-components in PCA how do we find out which components(names of the columns) have algorithm selected?

GRU/LSTM in Keras with input sequence of varying length

Categories

Resources