How to adding output dimension of `nn.Linear` while freezing the original dimension? - pytorch

I am implementing an incremental learning task with pytorch. Let's say, in a simple scenario, the number of base classes is 5, and the number of incremental classes is 2. Namely, I want the model could incrementall learning 2 new classes each time.
Simply, suppose the model is composed of a feature extractor resnet18 and a classifier, a 1 layer mlp = nn.Linear(126,5). For classifying the novel classes, I must adding 2 extra output neuron responsible for the 2 incremental classes. That is to say, I want a new classifier mlp_inc = nn.Linear(126,7). But, importantly, I want to freeze the trained weights (126 * 5) for base classes while only update the parameters(126 * 2) for incremental classes.
A straight way is to cat the output of base classifier and incremental classifier:
self.mlp_base = nn.Linear(126,5)
self.mlp_inc = nn.Linear(126,2)
'''
def forward(x):
x_base = self.mlp_base(x)
x_inc = self.mlp_inc(x)
output = torch.cat((x_base,x_inc),1)
'''
But this way will add a new module self.mlp_inc to original model. Noting mlp_base1 and mlp_inc1 as the trained classifer for incremental task 1.
When adapting to newer incremental task (another novel 2 classes, taks 2), I can not directly merge mlp_base1 and mlp_inc1 as mlp_base and load the state_dict of mlp_base1 and mlp_inc1 as mlp_base. Which means I should adding mlp_inc2, mpl_inc3 ... for other tasks. This is not easily maintainable.
So an simple way like the code below is a better choice.
# for task 1
self.mlp = nn.Linear(126,5)
# for task 2
self.mlp = nn.Linear(126,5+2)
# self.mlp.load_state_dict(), load the partial parameters for 5 base classes
# self.mlp.requires_grad(False), freeze the partial parameters for 5 base classes
But this seems not achievable.

Related

Keras fitting setting in TensorFlow Extended (TFX)

I try to construct a TFX pipeline with a trainer component with a Keras model defined like this:
def run_fn(fn_args: components.FnArgs):
transform_output = TFTransformOutput(fn_args.transform_output)
train_dataset = input_fn(fn_args.train_files,
fn_args.data_accessor,
transform_output,
num_batch)
eval_dataset = input_fn(fn_args.eval_files,
fn_args.data_accessor,
transform_output,
num_batch)
history = model.fit(train_dataset,
epochs=num_epochs,
steps_per_epoch=fn_args.train_steps,
validation_data=eval_dataset,
validation_steps=fn_args.eval_steps)
This works. However, if I change fitting to the following, this doesn't work:
history = model.fit(train_dataset,
epochs=num_epochs,
batch_size=num_batch,
validation_split=0.1)
Now, I have two questions:
Why does fitting work only with steps_per_epochs only? I couldn't find any explicit statement supporting this but this is the only way. Somehow I conclude that it must be something TFX specific (TFX handles input data only in a generator-like way?).
Let's say my train_dataset contains 100 instances and steps_per_epoch=1000 (with epochs=1). Is that mean that my 100 input instances are feed 10x each in order to reach the defined 1000 step? Isn't that counter-productive from training perspective?

Training multiple pytorch models on GPUs

I'm trying to implement something with pytorch.
I have 2 GPUs and I want to train 2 models as below:
model0 = Mymodel().to('cuda:0')
model1 = Mymodel().to('cuda:1')
opt0 = torch.optim.Adam(model0.parameters(), lr=0.01)
opt1 = torch.optim.Adam(model0.parameters(), lr=0.01)
# 1.Forward data into model0 on GPU0
out = model0(x.to('cuda:0'))
# 2.Calculate the loss on model0, update model0's parameters
model0.loss.backward()
opt0.step()
opt0.zero_grad()
# 3.Use model0's output as input of model1 on GPU1
out = model1(out.to('cuda:1'))
# 4.Calculate the loss on model1, update model1's parameters
model1.loss.backward()
opt1.step()
opt1.zero_grad()
I want to train them simultaneously to speed up the whole procedure, but I think the code now will wait step 2(or 4) finished and finally do step 3(or 1). How can I implement my idea? Or which technique is I need(e.g. model parel, thread, multiprocessing...)?
I've consider some article like this, but I think there is some worng with the result, and I think it actually doesn't train models simultaneously.
You have a strong dependency between the 2 models, the 2nd one always needs the output from the previous one, so that part of the code will always be sequential.
I think you might need some sort of multiprocessing (take a look at torch.multiprocessing) or some kind of queue, where you can store the output from the first model.

cross Validation in Sklearn using a Custom CV

I am dealing with a binary classification problem.
I have 2 lists of indexes listTrain and listTest, which are partitions of the training set (the actual test set will be used only later). I would like to use the samples associated with listTrain to estimate the parameters and the samples associated with listTest to evaluate the error in a cross validation process (hold out set approach).
However, I am not be able to find the correct way to pass this to the sklearn GridSearchCV.
The documentation says that I should create "An iterable yielding (train, test) splits as arrays of indices". However, I do not know how to create this.
grid_search = GridSearchCV(estimator = model, param_grid = param_grid,cv = custom_cv, n_jobs = -1, verbose = 0,scoring=errorType)
So, my question is how to create custom_cv based on these indexes to be used in this method?
X and y are respectivelly the features matrix and y is the vector of labels.
Example: Supose that I only have one hyperparameter alpha that belongs to the set{1,2,3}. I would like to set alpha=1, estimate the parameters of the model (for instance the coefficients os a regression) using the samples associated with listTrain and evaluate the error using the samples associated with listTest. Then I repeat the process for alpha=2 and finally for alpha=3. Then I choose the alpha that minimizes the error.
EDIT: Actual answer to question. Try passing cv command a generator of the indices:
def index_gen(listTrain, listTest):
yield listTrain, listTest
grid_search = GridSearchCV(estimator = model, param_grid =
param_grid,cv = index_gen(listTrain, listTest), n_jobs = -1,
verbose = 0,scoring=errorType)
EDIT: Before Edits:
As mentioned in the comment by desertnaut, what you are trying to do is bad ML practice, and you will end up with a biased estimate of the generalisation performance of the final model. Using the test set in the manner you're proposing will effectively leak test set information into the training stage, and give you an overestimate of the model's capability to classify unseen data. What I suggest in your case:
grid_search = GridSearchCV(estimator = model, param_grid = param_grid,cv = 5,
n_jobs = -1, verbose = 0,scoring=errorType)
grid_search.fit(x[listTrain], y[listTrain]
Now, your training set will be split into 5 (you can choose the number here) folds, trained using 4 of those folds on a specific set of hyperparameters, and tested the fold that was left out. This is repeated 5 times, till all of your training examples have been part of a left out set. This whole procedure is done for each hyperparameter setting you are testing (5x3 in this case)
grid_search.best_params_ will give you a dictionary of the parameters that performed the best over all 5 folds. These are the parameters that you use to train your final classifier, using again only the training set:
clf = LogisticRegression(**grid_search.best_params_).fit(x[listTrain],
y[listTrain])
Now, finally your classifier is tested on the test set and an unbiased estimate of the generalisation performance is given:
predictions = clf.predict(x[listTest])

Merge specific input coordinates in Keras

I have a large input vector (1000 features) to a Sequential model. The model is mainly a dense network.
I know that features 1-50 are coordinate-wise highly correlated to features 51-100 (1 with 51, 2 with 52 etc.) and I want to take advantage of that.
Is there a way to add a layer to my existing model to reflects that? (joining input 1 and 51 to a neuron, 2 and 52 etc.)
Or maybe the only option is to change the input structure to 50 tensors (of 1x2) and one large vector of 900 features? (I would like to avoid that since it means re-writing my feature preparation code)
I think the first dense layer would find out this relationship, of course if you define and train the model properly. However, if you would like to process the first 100 feature separately, one alternative is to use Keras functional API and define two Input layers, one for the first 100 features and another for the rest of 900 features:
input_100 = Input(shape=(100,))
input_900 = Input(shape=(900,))
Now you can process each one separately. For example, you can define two separate Dense layers connected to each one and then merge their outputs:
dense_100 = Dense(50, activation='relu')(input_100)
dense_900 = Dense(200, activation='relu')(input_900)
concat = concatenate([dense_100, dense_900])
# define the rest of your model ...
model = Model(inputs=[input_100, input_900], outputs=[the_outputs_of_model])
Of course, you need to feed the input layers separately. For that you can easily slice the training data:
model.fit([X_train[:,:100], X_train[:,100:]], y_train, ...)
Update: If you specifically want the features 1 and 51, 2 and 52, etc. to have a separate neuron (which, at least, I can't comment on the efficiency of it without experimenting on data), you can use LocallyConnected1D layer with kernel size and no. filters of 1 (i.e. it has the same behavior as applying a separate Dense layer on each two related features):
input_50_2 = Input(shape=(50,2))
local_out = LocallyConnected1D(1, 1, activation='relu')(input_50_2)
local_reshaped = Reshape((50,))(local_out) # need this for merging since local_out has shape of (None, 50, 1)
# or use the following:
# local_reshaped = Flatten()(local_out)
concat = concatenation([local_reshaped, dense_900])
# define the rest of your model...
X_train_50_2 = np.transpose(X_train[:,:100].reshape((2, 50)))
model.fit([X_train_50_2, X_train[:,100:]], y_train, ...)

How to apply random forest properly?

I am new to machine learning and python. Now I am trying to apply random forest to predict binary results of a target. In my data I have 24 predictors (1000 observations) where one of them is categorical(gender) and all the others numerical. Among numerical ones, there are two types of values which are volume of money in euros (very skewed and scaled) and numbers (number of transactions from an atm). I have transformed the big scale features and did the imputation. Last, I have checked correlation and collinearity and based on that removed some features (as a result I had 24 features.) Now when I implement RF it is always perfect in the training set while the ratios not so good according to crossvalidation. And even applying it in the test set it gives very very low recall values. How should I remedy this?
def classification_model(model, data, predictors, outcome):
# Fit the model:
model.fit(data[predictors], data[outcome])
# Make predictions on training set:
predictions = model.predict(data[predictors])
# Print accuracy
accuracy = metrics.accuracy_score(predictions, data[outcome])
print("Accuracy : %s" % "{0:.3%}".format(accuracy))
# Perform k-fold cross-validation with 5 folds
kf = KFold(data.shape[0], n_folds=5)
error = []
for train, test in kf:
# Filter training data
train_predictors = (data[predictors].iloc[train, :])
# The target we're using to train the algorithm.
train_target = data[outcome].iloc[train]
# Training the algorithm using the predictors and target.
model.fit(train_predictors, train_target)
# Record error from each cross-validation run
error.append(model.score(data[predictors].iloc[test, :], data[outcome].iloc[test]))
print("Cross-Validation Score : %s" % "{0:.3%}".format(np.mean(error)))
# Fit the model again so that it can be refered outside the function:
model.fit(data[predictors], data[outcome])
outcome_var = 'Sold'
model = RandomForestClassifier(n_estimators=20)
predictor_var = train.drop('Sold', axis=1).columns.values
classification_model(model,train,predictor_var,outcome_var)
#Create a series with feature importances:
featimp = pd.Series(model.feature_importances_, index=predictor_var).sort_values(ascending=False)
print(featimp)
outcome_var = 'Sold'
model = RandomForestClassifier(n_estimators=20, max_depth=20, oob_score = True)
predictor_var = ['fet1','fet2','fet3','fet4']
classification_model(model,train,predictor_var,outcome_var)
In Random Forest it is very easy to overfit. To resolve this you need to do parameter search a little more rigorously to know the best parameter to use. [Here](http://scikit-learn.org/stable/auto_examples/model_selection/randomized_search.html
) is the link on how to do this: (from the scikit doc).
It is overfitting and you need to search for the best parameter that will work work on the model. The link provides implementation for Grid and Randomized search for hyper parameter estimation.
And it will also be fun to go through this MIT Artificial Intelligence lecture to get get deep theoretical orientation: https://www.youtube.com/watch?v=UHBmv7qCey4&t=318s.
Hope this helps!

Resources