I am preparing input to feed into a Keras Neural network for a multiclass problem as:
encoder = LabelEncoder()
encoder.fit(y)
encoded_Y = encoder.transform(y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)
X_train, X_test, y_train, y_test = train_test_split(X, dummy_y, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.06, random_state=42)
After having trained the model, I try to run the following lines to obtain a prediction that reflects the original class names:
y_pred = model.predict_classes(X_test)
y_pred = encoder.inverse_transform(y_pred)
y_test = np.argmax(y_test, axis = 1)
y_test = encoder.inverse_transform(y_test)
However, I obtain surpisingly low levels of accuracy (0.36), as oppoes to training and validations, that reach 0.98. Is this the right way of transforming classes back into the original labels?
I compute accuracies as:
# For training
history.history['acc']
# For testing
accuracy_score(y_test, y_pred)
Related
from sklearn.model_selection import train_test_split
import lazypredict
from lazypredict.Supervised import LazyClassifier
y = np.array(skin_new_df['diagnostic'])
X = np.array(skin_new_df.drop(['diagnostic'], axis=1))
print(X.shape)
print(y.shape)
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42)
clf = LazyClassifier(verbose=0,
ignore_warnings=True,
custom_metric = None)
models,predictions = clf.fit(X_train, X_test, y_train, y_test)
print(models)
I run this code and get empty frame at the output
(2298, 25)
(2298,)
100%|██████████| 29/29 [00:08<00:00, 3.61it/s]
Accuracy Balanced Accuracy ROC AUC F1 Score Time Taken
Model
I want to get all models accuracy
I have a dataset with 100 samples, I want to split it into 75%, 25%, 25% for both Train Validate, and Test respectively, then I want to do that again with different ratios such as 80%, 10%, 10%.
For this purpose, I was using the code down, but I think that it's not splitting the data correctly on the second step, because it will split the data from 85% to (85% x 85%), and (15% x 15%).
My question is that:
Is there a nice clear way to do the splitting in the correct way for any given ratios?
from sklearn.model_selection import train_test_split
# Split Train Test Validate
X_, X_val, Y_, Y_val = train_test_split(X, Y, test_size=0.15, random_state=42)
X_train, X_test, Y_train, Y_test = train_test_split(X_, Y_, test_size=0.15, random_state=42)
You could always do it manually. A bit messy but you can create a function
def my_train_test_split(X, y, ratio_train, ratio_val, seed=42):
idx = np.arange(X.shape[0])
np.random.seed(seed)
np.random.shuffle(idx)
limit_train = int(ratio_train * X.shape[0])
limit_val = int((ratio_train + ratio_val) * X.shape[0])
idx_train = idx[:limit_train]
idx_val = idx[limit_train:limit_val]
idx_test = idx[limit_val:]
X_train, y_train = X[idx_train], y[idx_train]
X_val, y_val = X[idx_val], y[idx_val]
X_test, y_test = X[idx_test], y[idx_test]
return X_train, X_val, X_test, y_train, y_val, y_test
Ratio test is assumed to be 1-(ratio_train+ratio_val).
I have a data for a regression task.
The independent features(X_train) are scaled with a standard scaler.
Built a Keras sequential model adding hidden layers. Compiled the model.
Then fitting the model with model.fit(X_train_scaled, y_train )
Then I saved the model in a .hdf5 file.
Now how to include the scaling part inside the saved model,
so that the same scaling parameters can be applied to unseen test data.
#imported all the libraries for training and evaluating the model
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=42)
sc = StandardScaler()
X_train_scaled = sc.fit_transform(X_train)
X_test_scaled= sc.transform (X_test)
def build_model():
model = keras.Sequential([layers.Dense(64, activation=tf.nn.relu,input_shape=[len(train_dataset.keys())]),
layers.Dense(64, activation=tf.nn.relu),
layers.Dense(1)
])
optimizer = tf.keras.optimizers.RMSprop(0.001)
model.compile(loss='mean_squared_error',
optimizer=optimizer,
metrics=['mean_absolute_error', 'mean_squared_error'])
return model
model = build_model()
EPOCHS=1000
history = model.fit(X_train_scaled, y_train, epochs=EPOCHS,
validation_split = 0.2, verbose=0)
loss, mae, mse = model.evaluate(X_test_scaled, y_test, verbose=0)
I am trying to wrap my head around the concept of using the last 30% of the entries in the dataset as the test samples. Nothing Random (Intentional). Is this possible?
Split dataset into train / test:
x_train, x_test, y_train, y_test = model_selection.train_test_split(x, y, test_size=0.3,random_state=0)
Is it possible to explicitly control the split in such a manner that the test split only selects entries from the end of the dataset?
You will achieve your goal if you substitute the line:
x_train, x_test, y_train, y_test = model_selection.train_test_split(x, y, test_size=0.3,random_state=0)
with:
idx_train = int((1-.3)* x.shape[0]) # train is (1-.3) of your data
x_train = x[:idx_train,:]
x_test = x[idx_train:, :]
y_train = y[:idx_train]
y_test = y[idx_train:]
I have been using SVR to predict the value of a time series. My dataset is split into two which is train and test and using SVR with RBF kernel to predict the test dataset. While SVR has been perfectly modeled the train data set but always predict the average value of the test dataset.
Have been trying StandardScaller, Normalization and so on but always failed.
here is my code
X = np.array(x).reshape(-1,1)
Y = np.array(y).reshape(-1,1)
sc_y = StandardScaler()
Y = sc_y.fit_transform(Y)
Y = np.array(Y).ravel()
# Fit regression model
X_train, X_test, Y_train, Y_test = train_test_split(X, Y,
test_size=0.4, random_state=0, shuffle=False)
from sklearn.model_selection import cross_val_score
svr_rbf = SVR(kernel='rbf', C=10, gamma=9.9999999999999995e-08, epsilon=0.1)
print(X_train.shape)
svr_rbf.fit(X_train, Y_train)
y_rbf = svr_rbf.predict(X_train)
y_rbf1 = svr_rbf.predict(X_test)
and here is my result
The prediction is at the end where a constant value is shown.
Do you know what should I do to make the prediction better?