GridSearchCV does not give the best hyperparameters - python-3.x

I am new and trying to understand GridSearchCV.
I found the best parameter from GridSearchCV does not alwyas give the highest R-squared value.
For example GridSearchCV below, I got alpha = 1 for the best parameter.
Input
alpha = [0.1, 1, 10, 100]
# GridSearchCV
estimator = Ridge()
pipe = Pipeline([('scale',StandardScaler()), ('model', estimator)])
parameters = [{'model__alpha':alpha}]
grid = GridSearchCV(pipe, parameters, scoring='r2', cv=5)
grid.fit(x_train, y_train)
print(f'best alpha: {alpha[grid.best_index_]}')
print(f'score: {grid.best_score_}')
Out
best alpha: 1
score: 0.7928298359066712
When use Ridge regression with the same set of alpha, R2_train has highest value in alpha=0.1.
Also, R2_test has highest value when alpha=10.
Input
# Ridge
results = pd.DataFrame(columns=['alpha', 'R2_train', 'R2_test'])
for n in alpha:
estimator = Ridge(n)
pipe = Pipeline([('scale',StandardScaler()), ('model', estimator)])
pipe.fit(x_train, y_train)
# train R2
R2_train = pipe.score(x_train, y_train)
# test R2
yhat_test = pipe.predict(x_test)
R2_test = r2_score(y_test, yhat_test)
results.loc[len(results)] = [n, R2_train, R2_test]
results
Out
alpha
R2_train
R2_test
0
0.1
0.803881
0.824372
1
1.0
0.803823
0.824664
2
10.0
0.800996
0.825727
3
100.0
0.774868
0.822778
So, I am confused which alpha do I need to choose and why?

Related

Make a prediction with Keras models trained using the Genetic Algorithm with PyGAD

I successfully run the code (original link where to find code) to train Keras Models using the Genetic Algorithm with PyGAD:
import tensorflow.keras
import pygad.kerasga
import numpy
import pygad
def fitness_func(solution, sol_idx):
global data_inputs, data_outputs, keras_ga, model
model_weights_matrix = pygad.kerasga.model_weights_as_matrix(model=model,
weights_vector=solution)
model.set_weights(weights=model_weights_matrix)
predictions = model.predict(data_inputs)
mae = tensorflow.keras.losses.MeanAbsoluteError()
abs_error = mae(data_outputs, predictions).numpy() + 0.00000001
solution_fitness = 1.0 / abs_error
return solution_fitness
def callback_generation(ga_instance):
print("Generation = {generation}".format(generation=ga_instance.generations_completed))
print("Fitness = {fitness}".format(fitness=ga_instance.best_solution()[1]))
input_layer = tensorflow.keras.layers.Input(3)
dense_layer1 = tensorflow.keras.layers.Dense(5, activation="relu")(input_layer)
output_layer = tensorflow.keras.layers.Dense(1, activation="linear")(dense_layer1)
model = tensorflow.keras.Model(inputs=input_layer, outputs=output_layer)
weights_vector = pygad.kerasga.model_weights_as_vector(model=model)
keras_ga = pygad.kerasga.KerasGA(model=model,
num_solutions=10)
# Data inputs
data_inputs = numpy.array([[0.02, 0.1, 0.15],
[0.7, 0.6, 0.8],
[1.5, 1.2, 1.7],
[3.2, 2.9, 3.1]])
# Data outputs
data_outputs = numpy.array([[0.1],
[0.6],
[1.3],
[2.5]])
num_generations = 10
num_parents_mating = 5
initial_population = keras_ga.population_weights
ga_instance = pygad.GA(num_generations=num_generations,
num_parents_mating=num_parents_mating,
initial_population=initial_population,
fitness_func=fitness_func,
on_generation=callback_generation)
ga_instance.run()
# After the generations complete, some plots are showed that summarize how the outputs/fitness values evolve over generations.
ga_instance.plot_result(title="PyGAD & Keras - Iteration vs. Fitness", linewidth=4)
# Returning the details of the best solution.
solution, solution_fitness, solution_idx = ga_instance.best_solution()
print("Fitness value of the best solution = {solution_fitness}".format(solution_fitness=solution_fitness))
print("Index of the best solution : {solution_idx}".format(solution_idx=solution_idx))
# Fetch the parameters of the best solution.
best_solution_weights = pygad.kerasga.model_weights_as_matrix(model=model,
weights_vector=solution)
model.set_weights(best_solution_weights)
predictions = model.predict(data_inputs)
print("Predictions : \n", predictions)
mae = tensorflow.keras.losses.MeanAbsoluteError()
abs_error = mae(data_outputs, predictions).numpy()
print("Absolute Error : ", abs_error)
Out:
Fitness value of the best solution = 5.007608966738384
Index of the best solution : 0
1/1 [==============================] - 0s 18ms/step
Predictions :
[[0.4351511 ]
[0.78366435]
[1.3436508 ]
[2.736318 ]]
Absolute Error : 0.1996961
As I understand, the code above should train a model to help me forecast a new 3-dimension input such as [0.9, 0.7, 0.85].
I wonder how could I modify the code to adapt to the input and output data as below, or call model then make a predictions for new data_inputs = numpy.array([[0.9, 0.7, 0.85]]):
# Data inputs
data_inputs = numpy.array([[0.02, 0.1, 0.15],
[0.7, 0.6, 0.8],
[1.5, 1.2, 1.7],
[3.2, 2.9, 3.1],
[0.9, 0.7, 0.85] # new entry which need forecast
])
# Data outputs
data_outputs = numpy.array([[0.1],
[0.6],
[1.3],
[2.5]]) # Output data for training
Thanks a lot for your help at advance.
My trial code:
from tensorflow import keras
# Load model and weights
with open("./ga_model.json", "r") as json_file:
model_json = json_file.read()
model = keras.models.model_from_json(model_json)
model.load_weights("./ga_model.h5")
# Data inputs
new_data_inputs = numpy.array([
[0.9, 0.7, 0.85] # new entry which need forecast
])
predictions = model.predict(new_data_inputs)
print("Predictions : \n", predictions)
Out:
Predictions :
[[0.8672837]]

Behavior of alphas_ in LassoCV

I am trying to reproduce the behavior of LassoCV outside the CV process and I am struggling to understand what happens. I am fixing the random seed in the cross validation so the behavior should be deterministic, as well as the alpha values (I figured that LassoCV re-orders them in descending order). But I must be missing something because I only get the same result if I only use one alpha at a time, or for the largest value of alpha, if it coincides between two runs. In code:
clf = LassoCV(alphas = np.logspace(-2,2,5), cv = KFold(n_splits=10, shuffle=True, random_state=20), max_iter = 1000000, tol = 0.005)
clf.fit(new_xb,ypb)
print('Alphas', clf.alphas_)
for i, alpha in enumerate(clf.alphas_):
print('Score for alpha', alpha, np.mean(clf.mse_path_[i,:])) #for each alpha (row), 10 cv estimates of MSE
returns
Alphas [1.e+02 1.e+01 1.e+00 1.e-01 1.e-02]
Score for alpha 100.0 158200.48456097275
Score for alpha 10.0 158216.20827618148
Score for alpha 1.0 158231.52763707296
Score for alpha 0.1 158194.40120074182
Score for alpha 0.01 157886.51644333656
but if I run the same code with a different range, ie I change
clf = LassoCV(alphas = np.logspace(-1,1,3), cv = KFold(n_splits=10, shuffle=True, random_state=20), max_iter = 1000000, tol = 0.005)
I get
Alphas [10. 1. 0.1]
Score for alpha 10.0 165760.88919712842
Score for alpha 1.0 165704.1358282215
Score for alpha 0.1 161309.30244060006
On the other hand, if the first (largest value) is the same for two runs, the result stays the same for that value; using
clf = LassoCV(alphas = np.logspace(-1,1,2), cv = KFold(n_splits=10, shuffle=True, random_state=20), max_iter = 1000000, tol = 0.005)
returns
alphas [10. 0.1]
Score for alpha 10.0 165760.88919712842
Score for alpha 0.1 161330.76311479398
And if I run it with one alpha at a time, I get consistent results. So my guess is that the alphas are somehow changing inside the classifier, but I can't figure out why.

PyTorch - sparse tensors do not have strides

I am building my first sentiment analysis model for a small dataset of 1000 reviews using TF-IDF approach along with LSTM using the below code. I am preparing the train data by preprocessing it and feeding to the Vectorizer as below
def tfidf_features(X_train, X_val, X_test):
tfidf_vectorizer = TfidfVectorizer(analyzer='word', token_pattern = '(\S+)', min_df = 5, max_df =
0.9, ngram_range=(1,2))
X_train=tfidf_vectorizer.fit_transform(X_train)
X_val=tfidf_vectorizer.transform(X_val)
X_test=tfidf_vectorizer.transform(X_test)
return X_train, X_val, X_test, tfidf_vectorizer.vocabulary_
I am converting my csr_matrix to a pytorch tensor using the below code
def spy_sparse2torch_sparse(data):
samples=data.shape[0]
features=data.shape[1]
values=data.data
coo_data=data.tocoo()
indices=torch.LongTensor([coo_data.row,coo_data.col])
t=torch.sparse.FloatTensor(indices,torch.from_numpy(values).float(),[samples,features])
return t
And I am getting the training sentences tensor as this
tensor(indices=tensor([[ 0, 0, 1, ..., 599, 599, 599],
[ 97, 131, 49, ..., 109, 65, 49]]),
values=tensor([0.6759, 0.7370, 0.6076, ..., 0.3288, 0.3927, 0.3288]),
size=(600, 145), nnz=1607, layout=torch.sparse_coo)
I am creating a TensorDataSet using the below code wherein I am also converting my label data from bumpy to a torch tensor
train_data = TensorDataset(train_x, torch.from_numpy(train_y))
I have defined my LSTM network and calling it with the following parameters
n_vocab = len(vocabulary)
n_embed = 100
n_hidden = 256
n_output = 1 # 1 ("positive") or 0 ("negative")
n_layers = 2
net = Sentiment_Lstm(n_vocab, n_embed, n_hidden, n_output, n_layers)
I have also defined the loss and optimizer. Now I am training my model using the below code
print_every = 100
step = 0
n_epochs = 4 # validation loss increases from ~ epoch 3 or 4
clip = 5 # for gradient clip to prevent exploding gradient problem in LSTM/RNN
for epoch in range(n_epochs):
h = net.init_hidden(batch_size)
for inputs, labels in train_loader:
step += 1
# making requires_grad = False for the latest set of h
h = tuple([each.data for each in h])
net.zero_grad()
output, h = net(inputs)
loss = criterion(output.squeeze(), labels.float())
loss.backward()
nn.utils.clip_grad_norm(net.parameters(), clip)
optimizer.step()
if (step % print_every) == 0:
net.eval()
valid_losses = []
v_h = net.init_hidden(batch_size)
for v_inputs, v_labels in valid_loader:
v_inputs, v_labels = inputs.to(device), labels.to(device)
v_h = tuple([each.data for each in v_h])
v_output, v_h = net(v_inputs)
v_loss = criterion(v_output.squeeze(), v_labels.float())
valid_losses.append(v_loss.item())
print("Epoch: {}/{}".format((epoch+1), n_epochs),
"Step: {}".format(step),
"Training Loss: {:.4f}".format(loss.item()),
"Validation Loss: {:.4f}".format(np.mean(valid_losses)))
net.train()
However, I am getting a major error on the line output, h = net(inputs) as RuntimeError: sparse tensors do not have strides
The workarounds given on other websites are not understandable. I am expecting an exact code change I need to make in order to fix this issue.
Pytorch does not support sparse (S) to sparse matrix multiplication.
Let us consider :
torch.sparse.mm(c1,c2), where c1 and c2 are sparse_coo_tensor matrices.
case1: If we try c1 and c2 to be S --> It gives the erros RuntimeError: sparse tensors do not have strides.
case2: If c1 is dense (D) and c2 is S --> It gives the same error.
case3: Only when c1 is S and c2 is D --> It works fine.
Reference: https://blog.csdn.net/w55100/article/details/109086131
I guess the matrix multiplication happening in your Sentiment_Lstm might be falling under the first two cases. And thereby throwing this error.
By using dense input format it should work.

LSTM Time-Series produces shifted forecast?

I am doing a time-series forecast with a LSTM NN and Keras. As input features there are two variables (precipitation and temperature) and the one target to be predicted is groundwater-level.
It seems to be working quite all right, though there is a serious offset between the actual data and the output (see image).
Now I've read that this is can be a classic sign of the network not working, as it seems to be mimicing the output and
what the model is actually doing is that when predicting the value at
time “t+1”, it simply uses the value at time “t” as its prediction https://towardsdatascience.com/how-not-to-use-machine-learning-for-time-series-forecasting-avoiding-the-pitfalls-19f9d7adf424
However, this is not actually possible in my case, as the target-values are not used as input variable. I am using a multi variate time-series with two features, independent of the output feature.
Also, the predicted values are not offset in future (t+1) but rather seem to lag behind (t-1).
Does anyone know what could cause this problem?
This is the complete code of my network:
# Split in Input and Output Data
x_1 = data[['MeanT']].values
x_2 = data[['Precip']].values
y = data[['Z_424A_6857']].values
# Scale Data
x = np.hstack([x_1, x_2])
scaler = MinMaxScaler(feature_range=(0, 1))
x = scaler.fit_transform(x)
scaler_out = MinMaxScaler(feature_range=(0, 1))
y = scaler_out.fit_transform(y)
# Reshape Data
x_1, x_2, y = H.create2feature_data(x_1, x_2, y, window)
train_size = int(len(x_1) * .8)
test_size = int(len(x_1)) # * .5
x_1 = np.expand_dims(x_1, 2) # 3D tensor with shape (batch_size, timesteps, input_dim) // (nr. of samples, nr. of timesteps, nr. of features)
x_2 = np.expand_dims(x_2, 2)
y = np.expand_dims(y, 1)
# Split Training Data
x_1_train = x_1[:train_size]
x_2_train = x_2[:train_size]
y_train = y[:train_size]
# Split Test Data
x_1_test = x_1[train_size:test_size]
x_2_test = x_2[train_size:test_size]
y_test = y[train_size:test_size]
# Define Model Input Sets
inputA = Input(shape=(window, 1))
inputB = Input(shape=(window, 1))
# Build Model Branch 1
branch_1 = layers.GRU(16, activation=act, dropout=0, return_sequences=False, stateful=False, batch_input_shape=(batch, 30, 1))(inputA)
branch_1 = layers.Dense(8, activation=act)(branch_1)
#branch_1 = layers.Dropout(0.2)(branch_1)
branch_1 = Model(inputs=inputA, outputs=branch_1)
# Build Model Branch 2
branch_2 = layers.GRU(16, activation=act, dropout=0, return_sequences=False, stateful=False, batch_input_shape=(batch, 30, 1))(inputB)
branch_2 = layers.Dense(8, activation=act)(branch_2)
#branch_2 = layers.Dropout(0.2)(branch_2)
branch_2 = Model(inputs=inputB, outputs=branch_2)
# Combine Model Branches
combined = layers.concatenate([branch_1.output, branch_2.output])
# apply a FC layer and then a regression prediction on the combined outputs
comb = layers.Dense(6, activation=act)(combined)
comb = layers.Dense(1, activation="linear")(comb)
# Accept the inputs of the two branches and then output a single value
model = Model(inputs=[branch_1.input, branch_2.input], outputs=comb)
model.compile(loss='mse', optimizer='adam', metrics=['mse', H.r2_score])
model.summary()
# Training
model.fit([x_1_train, x_2_train], y_train, epochs=epoch, batch_size=batch, validation_split=0.2, callbacks=[tensorboard])
model.reset_states()
# Evaluation
print('Train evaluation')
print(model.evaluate([x_1_train, x_2_train], y_train))
print('Test evaluation')
print(model.evaluate([x_1_test, x_2_test], y_test))
# Predictions
predictions_train = model.predict([x_1_train, x_2_train])
predictions_test = model.predict([x_1_test, x_2_test])
predictions_train = np.reshape(predictions_train, (-1,1))
predictions_test = np.reshape(predictions_test, (-1,1))
# Reverse Scaling
predictions_train = scaler_out.inverse_transform(predictions_train)
predictions_test = scaler_out.inverse_transform(predictions_test)
# Plot results
plt.figure(figsize=(15, 6))
plt.plot(orig_data, color='blue', label='True GWL')
plt.plot(range(train_size), predictions_train, color='red', label='Predicted GWL (Training)')
plt.plot(range(train_size, test_size), predictions_test, color='green', label='Predicted GWL (Test)')
plt.title('GWL Prediction')
plt.xlabel('Day')
plt.ylabel('GWL')
plt.legend()
plt.show()
I am using a batch size of 30 timesteps, a lookback of 90 timesteps, with a total data size of around 7500 time steps.
Any help would be greatly appreciated :-) Thank you!
Probably my answer is not relevant two years later, but I had a similar issue when experimenting with LSTM encoder-decoder model. I solved my problem by scaling the input data in the range -1 .. 1 instead of 0 .. 1 as in your example.

How to get the result auc using scikit

Hi i want to combine train/test split with a cross validation and get the results in auc.
My first approach I get it but with accuracy.
# split data into train+validation set and test set
X_trainval, X_test, y_trainval, y_test = train_test_split(dataset.data, dataset.target)
# split train+validation set into training and validation sets
X_train, X_valid, y_train, y_valid = train_test_split(X_trainval, y_trainval)
# train on classifier
clf.fit(X_train, y_train)
# evaluate the classifier on the test set
score = svm.score(X_valid, y_valid)
# combined training & validation set and evaluate it on the test set
clf.fit(X_trainval, y_trainval)
test_score = svm.score(X_test, y_test)
And I do not find how to apply roc_auc, please help.
Using scikit-learn you can do:
import numpy as np
from sklearn import metrics
y = np.array([1, 1, 2, 2])
scores = np.array([0.1, 0.4, 0.35, 0.8])
fpr, tpr, thresholds = metrics.roc_curve(y, scores, pos_label=2)
Now we get:
print(fpr)
array([ 0. , 0.5, 0.5, 1. ])
print(tpr)
array([ 0.5, 0.5, 1. , 1. ])
print(thresholds)
array([ 0.8 , 0.4 , 0.35, 0.1 ])
In your code, after training your classifier, get the predictions with:
y_preds = clf.predict(X_test)
And then use this to calculate the auc value:
from sklearn.metrics import roc_curve, auc
fpr, tpr, thresholds = roc_curve(y, y_preds, pos_label=1)
auc_roc = auc(fpr, tpr)

Resources