can we calculate model accuracy from MAPE or MAE? - statistics

In a linear regression model, if MAPE or MAE is caculated, then can we conclude that the regression model accuracy is (1- MAE)*100?
as the MAPE value mostly ranges under 100 and it is commonly used error metric in regression.

Related

Strange\unexpected behavior with class_weight and LightGBM

I have a LGBM model which does not use the 'class_weight' parameter.
When I run the model I currently achieve a score of 0.81659.
When I apply: class_weight='balanced'
the score drops substantially to 0.78134. A loss of 0.03525.
When I manually compute the weights and apply it with: class_weight={0: 0.61378, 2: 0.86751, 1: 4.58652}
the score drops to 0.78129.
I attribute the minor difference between my manual calculation and 'balanced'
to be rounding error, since I arbitrarily truncated the weights to 5 decimal places.
There are three labels distributed as follows: Counter({0: 32259, 2: 22824, 1: 4317})
Let's pretend the labels were distributed as 33%, 33%, 34%. I would expect applying class weights or
not applying them to have virtually the same impact. There is no reason to expect using class weights in this case to have much, if any, impact.
But the actual data is quite imbalanced.
I would expect setting up the model to have knowledge of this imbalance would allow the model to make a more informed, i.e., better prediction, even if 'better' is only a slight improvement.
I certainly would not expect a 3.5% drop in model performance.
Am I not applying the weights correctly?
Main code block:
model = lgb.LGBMClassifier(learning_rate=i,
num_leaves=j,
objective='multiclass',
colsample_bytree=c,
max_bin=512,
n_estimators=n,
class_weight={0: 0.61378, 2: 0.86751, 1: 4.58652},
random_state=13,
n_jobs=-1,
)
# define evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=13)
# evaluate model
scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
# summarize performance
# print('Mean accuracy: %.3f' % np.mean(scores))
print(f'For {i} learning_rate, {j} num_leaves, {c} colsample_bytree, {n} n_estimators:\n'
f' The Mean accuracy is: {np.mean(scores):.5f}, The Standard Deviation is: {np.std(scores):.3f}')
mean_accuracy.append(np.mean(scores))

Different loss values and accuracies of MLP regressor in keras and scikit-learn

I have a neural network with one hidden layer implemented in both Keras and scikit-learn for solving a regression problem. In scikit-learn I used the MLPregressor class with mostly default parameters and in Keras I have a hidden Dense layer with parameters set to the same defaults as scikit-learn (which uses Adam with same learning rate and epsilon and a batch_size of 200). When I train the networks the scikit-learn model has a loss value that is about half of keras and its accuracy (measured in mean absolute error) is also better. Shouldn't the loss values be similar if not identical and the accuracies also be similar? Has anyone experienced something similar and able to make the Keras model more accurate?
Scikit-learn model:
clf = MLPRegressor(hidden_layer_sizes=(1600,), max_iter=1000, verbose=True, learning_rate_init=.001)
Keras model:
inputs = keras.Input(shape=(cols,))
x = keras.layers.Dense(1600, activation='relu', kernel_initializer="glorot_uniform", bias_initializer="glorot_uniform", kernel_regularizer=keras.regularizers.L2(.0001))(inputs)
outputs = keras.layers.Dense(1,kernel_initializer="glorot_uniform", bias_initializer="glorot_uniform", kernel_regularizer=keras.regularizers.L2(.0001))(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=keras.optimizers.Adam(epsilon=1e-8, learning_rate=.001),loss="mse")
model.fit(x=X, y=y, epochs=1000, batch_size=200)
It is because the formula of mean squared loss(MSE) from scikit-learn is different from that of tensorflow.
From the source code of scikit-learn:
def squared_loss(y_true, y_pred):
return ((y_true - y_pred) ** 2).mean() / 2
while MSE from tensorflow:
backend.mean(math_ops.squared_difference(y_pred, y_true), axis=-1)
As you can see the scikit-learn one is divided by 2, coherent with what you said:
the scikit-learn model has a loss value that is about half of keras
That implied the models from keras and scikit-learn actually achieved similar performance. That also implied learning rate 0.001 in scikit-learn is not equivalent to the same learning rate in tensorflow.
Also, another smaller but significant difference is the formula of L2 regularization.
From the source code of scikit-learn,
# Add L2 regularization term to loss
values = 0
for s in self.coefs_:
s = s.ravel()
values += np.dot(s, s)
loss += (0.5 * self.alpha) * values / n_samples
while that of tensorflow is loss = l2 * reduce_sum(square(x)).
Therefore, with the same l2 regularization parameter, tensorflow one has stronger regularization, which will result in poorer fit to the training data.

Sklearn logistic regression - adjust cutoff point

I have a logistic regression model trying to predict one of two classes: A or B.
My model's accuracy when predicting A is ~85%.
Model's accuracy when predicting B is ~50%.
Prediction of B is not important however prediction of A is very important.
My goal is to maximize the accuracy when predicting A. Is there any way to adjust the default decision threshold when determining the class?
classifier = LogisticRegression(penalty = 'l2',solver = 'saga', multi_class = 'ovr')
classifier.fit(np.float64(X_train), np.float64(y_train))
Thanks!
RB
As mentioned in the comments, procedure of selecting threshold is done after training. You can find threshold that maximizes utility function of your choice, for example:
from sklearn import metrics
preds = classifier.predict_proba(test_data)
tpr, tpr, thresholds = metrics.roc_curve(test_y,preds[:,1])
print (thresholds)
accuracy_ls = []
for thres in thresholds:
y_pred = np.where(preds[:,1]>thres,1,0)
# Apply desired utility function to y_preds, for example accuracy.
accuracy_ls.append(metrics.accuracy_score(test_y, y_pred, normalize=True))
After that, choose threshold that maximizes chosen utility function. In your case choose threshold that maximizes 1 in y_pred.

How can I calculate the loss without the weight decay in Keras?

I defined a convolutional layer and also use the L2 weight decay in Keras.
When I define the loss in the model.fit(), has all the weight decay loss been included in this loss? If the weight decay loss has been included in the total loss, how can I get the loss without this weight decay during the training?
I want to investigate the loss without the weight decay, while I want this weight decay to attend this training.
Yes, weight decay losses are included in the loss value printed on the screen.
The value you want to monitor is the total loss minus the sum of regularization losses.
The total loss is just model.total_loss
.
The regularization losses are collected in the list model.losses.
The following lines can be found in the source code of model.compile():
# Add regularization penalties
# and other layer-specific losses.
for loss_tensor in self.losses:
total_loss += loss_tensor
To get the loss without weight decay, you can reverse the above operations. I.e., the value to be monitored is model.total_loss - sum(model.losses).
Now, how to monitor this value is a bit tricky. Fortunately, the list of metrics used by a Keras model is not fixed until model.fit() is called. So you can append this value to the list, and it'll be printed on the screen during model fitting.
Here's a simple example:
input_tensor = Input(shape=(64, 64, 3))
hidden = Conv2D(32, 1, kernel_regularizer=l2(0.01))(input_tensor)
hidden = GlobalAveragePooling2D()(hidden)
out = Dense(1)(hidden)
model = Model(input_tensor, out)
model.compile(loss='mse', optimizer='adam')
loss_no_weight_decay = model.total_loss - sum(model.losses)
model.metrics_tensors.append(loss_no_weight_decay)
model.metrics_names.append('loss_no_weight_decay')
When you run model.fit(), something like this will be printed to the screen:
Epoch 1/1
100/100 [==================] - 0s - loss: 0.5764 - loss_no_weight_decay: 0.5178
You can also verify whether this value is correct by computing the L2 regularization manually:
conv_kernel = model.layers[1].get_weights()[0]
print(np.sum(0.01 * np.square(conv_kernel)))
In my case, the printed value is 0.0585, which is indeed the difference between loss and loss_no_weight_decay (with some rounding error).

Effect of class_weight and sample_weight in Keras

Can someone tell me mathematically how sample_weight and class_weight are used in Keras in the calculation of loss function and metrics? A simple mathematical express will be great.
It is a simple multiplication. The loss contributed by the sample is magnified by its sample weight. Assuming i = 1 to n samples, a weight vector of sample weights w of length n, and that the loss for sample i is denoted L_i:
In Keras in particular, the product of each sample's loss with its weight is divided by the fraction of weights that are not 0 such that the loss per batch is proportional to the number of weight > 0 samples. Let p be the proportion of non-zero weights.
Here's the relevant snippet of code from the Keras repo:
score_array = loss_fn(y_true, y_pred)
if weights is not None:
score_array *= weights
score_array /= K.mean(K.cast(K.not_equal(weights, 0), K.floatx()))
return K.mean(score_array)
class_weight is used in the same way as sample_weight; it is just provided as a convenience to specify certain weights across entire classes.
The sample weights are currently not applied to metrics, only loss.

Resources