This question already has an answer here:
Numpy .T syntax for Python
(1 answer)
Closed 9 months ago.
Coef_ is used to find coefficients in linear equations in Python. But Coef_, which I could not find the answer to, was put at the end of .T. What is the .T function here?
for C, marker in zip([0.001, 1, 100], ['o', '^', 'v']):
lr_l1 = LogisticRegression(C=C, penalty="l1").fit(X_train, y_train)
print("Training accuracy of l1 logreg with C={:.3f}: {:.2f}".format(
C, lr_l1.score(X_train, y_train)))
print("Test accuracy of l1 logreg with C={:.3f}: {:.2f}".format(
C, lr_l1.score(X_test, y_test)))
plt.plot(lr_l1.coef_.T, marker, label="C={:.3f}".format(C))
".T" method means Transpose which switches rows & columns
if you have a matrix m:
[1 2 3
4 5 6
7 8 9]
Then m.T would be:
[1 4 7
2 5 8
3 6 9]
It looks like its used in this line:
plt.plot(lr_l1.coef_.T,...)
to make sure it plots the coefficients in an expected way. If the model was built from sklearn LogisticRegression, then you can review the docs here
coef_ has shape (n_classes,n_features), so that means
coef_.T has shape (n_features,n_classes)
Here is a notebook that shows how this works
Related
I am new to pytorch, and I have been trying some examples with autograd, to see if I understand it. I am confused about why the following code does not work:
def Loss(a):
return a**2
a=torch.tensor(3.0, requires_grad=True )
L=Loss(a)
L.backward()
with torch.no_grad(): a=a+1.0
L=Loss(a)
L.backward()
print(a.grad)
Instead of outputing 8.0, we get "RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn"
There are two things to note regarding your code:
You are performing two back propagation up to leaf a which means the gradients should accumulate. In other words, you should get a gradient equal to da²/da + d(a+1)²/da which is equal to 2a + 2(a+1) which is 2(2a + 1). If a=3, then a.grad will be equal to 14.
You are using a torch.no_grad context manager which means you will be unable to perform backpropagation from any resulting tensor i.e. here a itself.
Here is a snippet which yields the desired result, that is 14 as the accumulation of both gradients:
>>> L = Loss(a)
>>> L.backward()
>>> a.grad
6
>>> L = Loss(a+1)
>>> L.backward()
>>> a.grad
14 # as 6 + 8
I have a Train set training_set of m observations and n features, and I have three different validation sets val_a, val_b, and val_c which don't leak information to one another.
I would like to perform hyperparameter tuning via HalvingGridSearchCV, where I fit models on training_set, and validate on all three validation sets separately, and then take the score to be the average score for all three (or the lowest score).
The reason is that the three validation were observations of the samples at three distinct time points (A, B, C), and the training set contains observations from only time point A. Thus, a model trained on training_set and evaluated on val_a would not necessarily be best for val_b and val_c.
Also, concatenating all of the sets via training_set = pd.concat([training_set, val_a, val_b, val_c]), and then performing a variant of GroupShuffleSplit is non-ideal, as this results in leaking information from different time points to the model.
Thus far here's what I've tried:
import pandas as pd
from sklearn.model_selection import PredefinedSplit
# Assume each dataset has 4 observations.
tf = [-1] * len(training_set)
training_set = pd.concat([training_set, val_a, val_b, val_c])
tf += [0] * len(val_a) + [1] * len(val_b) + [2] * len(val_c)
print("Test fold:", tf)
pds = PredefinedSplit(test_fold = tf)
# gs = HalvingGridSearchCV(estimator = LGBMRegressor(), param_grid = param_grid, cv = pds, scoring = 'r2', refit = False, min_resources = 'exhaust')
for train_index, test_index in ps.split():
print("TRAIN:", train_index, "TEST:", test_index)
Output:
Test fold: [-1, -1, -1, -1, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2]
TRAIN: [ 0 1 2 3 8 9 10 11 12 13 14 15] TEST: [4 5 6 7]
TRAIN: [ 0 1 2 3 4 5 6 7 12 13 14 15] TEST: [ 8 9 10 11]
TRAIN: [ 0 1 2 3 4 5 6 7 8 9 10 11] TEST: [12 13 14 15]
As you can see, this would generate a 3 fold cross-validation, where each validation set is left out once, and included in the training set all of the other times. I know -1 will leave the observations out of any test set, but there is no value to leave the observations out of any train set. ):
Thank you!
This question already has answers here:
sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype('float64')
(26 answers)
Closed 1 year ago.
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(estimators = 6, criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
This is my function (based on this) to clean the dataset of nan, Inf, and missing cells. So make sure that you have non NaN values in your input. And all those values are actually float values. None of the values should be Inf either.
This error occurs when the following function is called : Github
def _assert_all_finite(X):
"""Like assert_all_finite, but only for ndarray."""
X = np.asanyarray(X)
# First try an O(n) time, O(1) space solution for the common case that
# everything is finite; fall back to O(n) space np.isfinite to prevent
# false positives from overflow in sum method.
if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())
and not np.isfinite(X).all()):
raise ValueError("Input contains NaN, infinity"
" or a value too large for %r." % X.dtype)
I am doing a time series forecasting exercise using the window method but i am struggling to understand how to do the forecast out of sample.
Here is the code:
def windowed_dataset(series, window_size, batch_size, shuffle_buffer):
dataset = tf.data.Dataset.from_tensor_slices(series)
dataset = dataset.window(window_size + 1, shift=1, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(window_size + 1))
dataset = dataset.shuffle(shuffle_buffer).map(lambda window: (window[:-1], window[-1]))
dataset = dataset.batch(batch_size).prefetch(1)
return dataset
dataset = windowed_dataset(x_train, window_size, batch_size, shuffle_buffer_size)
The function windowed_dataset split the univariate time series series into a matrix. Imagine, we have a dataset as follows
dataset = tf.data.Dataset.range(10)
for val in dataset:
print(val.numpy())
0
1
2
3
4
5
6
7
8
9
the windowed_dataset function convert series into windows with x features on the left and y labels on the right.
[2 3 4 5] [6]
[4 5 6 7] [8]
[3 4 5 6] [7]
[1 2 3 4] [5]
[5 6 7 8] [9]
[0 1 2 3] [4]
In the next step, we implement the neural network model on the training dataset as follows:
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(10, input_shape=[window_size], activation="relu"),
tf.keras.layers.Dense(10, activation="relu"),
tf.keras.layers.Dense(1)
])
model.compile(loss="mse", optimizer=tf.keras.optimizers.SGD(lr=1e-6, momentum=0.9))
model.fit(dataset,epochs=100,verbose=0)
Up to here, i am fine with the code. However, I am struggling to understand the out of sample forecasting shown below:
forecast = []
for time in range(len(series) - window_size):
forecast.append(model.predict(series[time:time + window_size][np.newaxis]))
forecast = forecast[split_time-window_size:]
Can someone please explain to me why are we using a loop here for time in range(len(series) - window_size) ? why not simply do model.predict(dataset_validation) for the validation part and model.predict(dataset) for the training part ?
I don't understand the need for the for loop because this is not a rolling forecast we are not re-training the model. Can someone please explain to me?
While i understand why the data science community structure the dataset this way, i personally find it a lot clearer when we split the X and y and do the model.fit as follows model.fit(X,y,epochs=100,verbose=0) and the predict as as follows model.predict(X)
The for loop is returning the predictions in order, whereas if you call model.predict(dataset_validation) you'll get the predictions in a shuffled order (assumed you shuffled the dataset).
As for the point of using datasets - it can just help with code organization. There is no need to ever use one if you don't want to.
I'm following an exercise from the Hands-on machine learning book by Aurelien Geron.
assume data is a dataframe:
income_cat index
0 5.0 0
1 5.0 1
2 5.0 2
3 5.0 3
4 5.0 4
from sklearn.model_selection import StratifiedShuffleSplit
split = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=42)
Option A (from book):
for test_indices, train_indices in split.split(data, data.income_cat):
print(test_indices, train_indices)
Option B:
test_indices, train_indices = split.split(data, data.income_cat)
print(test_indices, train_indices)
Why doesn't option B work? This is a python question more than a sklearn question.
A tuple should be expanded both with a loop or without, what could I be missing?
The only difference between options A and B is the for loop.
Output from option A:
[4 2 1 0] [3]
Output from Option B:
ValueError: not enough values to unpack (expected 2, got 1)
StratifiedShuffleSplit.split returns a generator not a list. In the for loop under the hood there is called __next__ method on the generator and it returns the next element of the sequence (that generator eventually generates). This returned element unpacks into two variables: test_indices, train_indices.
You can achieve the same result by calling __next__ explicitly using helper function next():
test_indices, train_indices = next(split.split(data, data.income_cat))