I am writing the following classifier to check out sci-kit.
...
class MyClassifier():
def fit(self, x_train, y_train):
self.x_train = x_train
self.y_train = y_train
return
def predict(self, x_test):
prediction = []
for row in x_test:
label = self.closest(row)
prediction.append(label)
return prediction
def closest(self, row):
best_dist = euc(row, self.x_train[0])
best_index = 0
for i in range(1, len(self.x_train)):
dist = euc(row, self.x_train[0])
if dist < best_dist:
best_dist = dist
best_index = i
return self.y_train[best_index]
And later, I want to use my own classifier:
# Use my own Classifier
classifer = MyClassifier()
print(classifer)
classifer = classifer.fit(x_train, y_train)
prediction = classifer.predict(x_test)
print(prediction)
print(y_test)
When I run it, I am getting the following error:
<__main__.MyClassifier object at 0x103ec5668>
Traceback (most recent call last):
File "/.../NewClassifier.py", line 72, in <module>
prediction = classifer.predict(x_test)
AttributeError: 'NoneType' object has no attribute 'predict'
What's wrong with predict() function?
Your classmethod
def fit(self, x_train, y_train):
self.x_train = x_train
self.y_train = y_train
return
returns nothing, so it implicitly returns None.
Therefor classifer = classifer.fit(x_train, y_train) is overwrites the variable named classifer of type MyClassifier wiht a None.
A None has not method that you can call - thats the exact error message you got.
You should change classifer = classifer.fit(x_train, y_train) to simply
classifer.fit(x_train, y_train)
so you keep the variable named classifer as your Class-Instance instead of "overwriting" it with None.
This should fix it:
# Use my own Classifier
classifer = MyClassifier()
print(classifer)
classifer.fit(x_train, y_train)
prediction = classifer.predict(x_test)
print(prediction)
print(y_test)
I recommend using Python's built in debugger, pdb. If you add import pdb;pdb.set_trace() before your classifer = MyClassifier() statement, you can see every variable and interact with your code.
Now, you are overwriting your class instantiation.
-> print(classifer)
(Pdb) n
<__main__.MyClassifier object at 0x7f7fe2f139e8> // This is your classifer object
-> classifer = classifer.fit("test", "test2")
(Pdb) classifer
-> prediction = classifier.predict(x_test)
(Pdb) classifer
(Pdb)
So, because you are naming the variable the same thing, it's overwriting your previous class.
You have classifer = MyClassifier() and then classifer = classifer.foo so, it loses it's orginal reference to MyClassifier().
Secondly, your fit(x_train, y_train) function doesn't return anything.
Having:
def fit(self, x_train, y_train):
self.x_train = x_train
self.y_train = y_train
return
Is the same as:
def fit(self, x_train, y_train):
self.x_train = x_train
self.y_train = y_train
return None
Which is what your getting:
(Pdb) print(classifer)
None
And thus, that's why your receiving AttributeError: 'NoneType' object has no attribute 'predict' because classifer is None.
I'm not sure what the fit function is supposed to return, but I imagine it's self. So, the following code works for me in getting past your error, but since i don't know what x_train, y_train, x_test, and y_test are supposed to be, I couldn't run all of your code. Still, it fixes the problem you asked the question about.
class MyClassifier():
def fit(self, x_train, y_train):
self.x_train = x_train
self.y_train = y_train
return self // Must return something, and from context, this
// seems to be your intention.
def predict(self, x_test):
prediction = []
for row in x_test:
label = self.closest(row)
prediction.append(label)
return prediction
def closest(self, row):
best_dist = euc(row, self.x_train[0])
best_index = 0
for i in range(1, len(self.x_train)):
dist = euc(row, self.x_train)
if dist < best_dist:
best_dist = dist
best_index = i
return self.y_train[best_index]
classifier = MyClassifier()
print(classifier)
classifier2 = classifier.fit("test", "test2")
prediction = classifier2.predict(x_test)
print(prediction)
print(y_test)
Related
I am trying my hands on PyTorch.
I am getting this error:
RuntimeError: Expected object of scalar type Double but got scalar type Float for argument #2 'weight' in call to _thnn_conv2d_forward
This is my code(shamelessly copied from a online tutorial):
class Net(Module):
def __init__(self):
super(Net,self).__init__()
self.cnn_layers = Sequential(
Conv2d(1,4,kernel_size=3,stride=1,padding=1),
BatchNorm2d(4),
ReLU(inplace=True),
MaxPool2d(kernel_size=2,stride=2),
Conv2d(4,4,kernel_size=3,stride=1,padding=1),
BatchNorm2d(4),
ReLU(inplace=True),
MaxPool2d(kernel_size=2,stride=2)
)
self.linear_layers = Sequential(
Linear(900,10)
)
def forward(self,x):
# self.weights = self.weights.double()
x = self.cnn_layers(x)
x = x.view(x.size(0),-1)
x = self.linear_layers(x)
return x
tdata = dt.Data("train")
train_x = torch.from_numpy(tdata.get_train()[0].reshape(925,1,300,300))
train_y = torch.from_numpy(tdata.get_train()[1].astype(int))
val_x = torch.from_numpy(tdata.get_test()[0].reshape(102,1,300,300))
val_y = torch.from_numpy(tdata.get_test()[1].astype(int))
print(val_y.shape)
plt.imshow(tdata.get_train()[0][100],cmap='gray')
plt.show()
model = Net()
# defining the optimizer
optimizer = Adam(model.parameters(), lr=0.07)
# defining the loss function
criterion = CrossEntropyLoss()
# checking if GPU is available
if torch.cuda.is_available():
model = model.cuda()
criterion = criterion.cuda()
print(model)
def train(epoch):
model.train()
tr_loss = 0
# getting the training set
x_train, y_train = Variable(train_x.double()), Variable(train_y.double())
# getting the validation set
x_val, y_val = Variable(val_x), Variable(val_y)
# converting the data into GPU format
if torch.cuda.is_available():
x_train = x_train.cuda()
y_train = y_train.cuda()
x_val = x_val.cuda()
y_val = y_val.cuda()
# clearing the Gradients of the model parameters
optimizer.zero_grad()
# prediction for training and validation set
output_train = model(x_train.double())
output_val = model(x_val)
# computing the training and validation loss
loss_train = criterion(output_train, y_train)
loss_val = criterion(output_val, y_val)
train_losses.append(loss_train)
val_losses.append(loss_val)
# computing the updated weights of all the model parameters
loss_train.backward()
optimizer.step()
tr_loss = loss_train.item()
if epoch%2 == 0:
# printing the validation loss
print('Epoch : ',epoch+1, '\t', 'loss :', loss_val)
n_epochs = 25
# empty list to store training losses
train_losses = []
# empty list to store validation losses
val_losses = []
# training the model
for epoch in range(n_epochs):
train(epoch)
tdata.get_train() and tdata.get_test() returns a tuple (numpy(dtype='double'),numpy(dtype='int') )
I think weights are an internal data structure. So, its type should be adjusted by PyTorch itself. What is the problem here?
You may just add .to(torch.float32) to your train_x and val_x tensors
I don't completely agree with the other answer. It solves partially the problem.
Okay writting :
# converting the target into torch format
train_y = train_y.astype(int);
train_y = torch.from_numpy(train_y).to(torch.float32)
# converting validation images into torch format
val_x = val_x.reshape(6000, 1, 28, 28)
val_x = torch.from_numpy(val_x).to(torch.float32)
# converting the target into torch format
val_y = val_y.astype(int);
val_y = torch.from_numpy(val_y).to(torch.float32)
is important, but also :
# computing the training and validation loss
y_train = y_train.long() #we convert the results because they aren't in the good format
y_train = y_train.squeeze_()
y_val = y_val.long() #we convert the results because they aren't in the good format
y_val = y_val.squeeze_()
loss_train = criterion(output_train, y_train)
loss_val = criterion(output_val, y_val)
train_losses.append(loss_train)
val_losses.append(loss_val)
in my case, this solved my problem.
I am working on an RL problem and I created a class to initialize the model and other parameters. The code is as follows:
class Agent:
def __init__(self, state_size, is_eval=False, model_name=""):
self.state_size = state_size
self.action_size = 20 # measurement, CNOT, bit-flip
self.memory = deque(maxlen=1000)
self.inventory = []
self.model_name = model_name
self.is_eval = is_eval
self.done = False
self.gamma = 0.95
self.epsilon = 1.0
self.epsilon_min = 0.01
self.epsilon_decay = 0.995
def model(self):
model = Sequential()
model.add(Dense(units=16, input_dim=self.state_size, activation="relu"))
model.add(Dense(units=32, activation="relu"))
model.add(Dense(units=8, activation="relu"))
model.add(Dense(self.action_size, activation="softmax"))
model.compile(loss="categorical_crossentropy", optimizer=Adam(lr=0.003))
return model
def act(self, state):
options = self.model.predict(state)
return np.argmax(options[0]), options
I want to run it for only one iteration, hence I create an object and I pass a vector of length 16 like this:
agent = Agent(density.flatten().shape)
state = density.flatten()
action, probs = agent.act(state)
However, I get the following error:
AttributeError Traceback (most recent call last) <ipython-input-14-4f0ff0c40f49> in <module>
----> 1 action, probs = agent.act(state)
<ipython-input-10-562aaf040521> in act(self, state)
39 # return random.randrange(self.action_size)
40 # model = self.model()
---> 41 options = self.model.predict(state)
42 return np.argmax(options[0]), options
43
AttributeError: 'function' object has no attribute 'predict'
What's the issue? I checked some other people's codes as well, like this and I think mine is also very similar.
Let me know.
EDIT:
I changed the argument in Dense from input_dim to input_shape and self.model.predict(state) to self.model().predict(state).
Now when I run the NN for one input data of shape (16,1), I get the following error:
ValueError: Error when checking input: expected dense_1_input to have
3 dimensions, but got array with shape (16, 1)
And when I run it with shape (1,16), I get the following error:
ValueError: Error when checking input: expected dense_1_input to have
3 dimensions, but got array with shape (1, 16)
What should I do in this case?
in last code block,
def act(self, state):
options = self.model.predict(state)
return np.argmax(options[0]), options
self.model is a function which is returning a model, it should be self.model().predict(state)
I used np.reshape. So in this case, I did
density_test = np.reshape(density.flatten(), (1,1,16))
and the network gave the output.
I am working on a project where I am using custom callback with earlystopping callback, in this my model training not stops even val_loss not improving much.
Here is my implmentation:
class CustomCallback(keras.callbacks.Callback):
def __init__(self, x, y):
self.x = x
self.y = y
def on_epoch_end(self, epoch, logs={}):
y_pred = self.model.predict(self.x)
error_rate = np.sum(self.y == y_pred)
print(f'Error number:: {error_rate}')
logs['error_rate'] = error_rate
early_stop = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=2)
custom_callback = CustomCallback(X_data, y_data)
model.fit(train_data, y_train, epochs=100, batch_size=32, validation_data=(cv_data, y_cv), callbacks=[early_stop, custom_callback])
What is wrong in my implementation?
Why not use a custom metric instead of a callback?
def error_rate(y_true, y_pred):
rate = K.cast(K.equal(y_true, y_pred), K.floatx())
return keras.backend.sum(rate)
Are you passing label numbers or one hot tensors as y?? Usually it should be rounding first (there will be nothing equal if you don't)
def error_rate(y_true, y_pred):
y_pred = K.cast(K.greater(y_pred, 0.5), K.floatx())
ate = K.cast(K.equal(y_true, y_pred), K.floatx())
return keras.backend.sum(rate)
Use it as a metric:
model.compile(......, metrics=[error_rate, ...])
Try passing the min_delta argument in EarlyStopping with some value so that an absolute change less than min_delta will count as no improvement and it will stop training.
I want to check what data is in input, or to check output of some layer. For this i do the following:
import tensorflow.keras.backend as K
import tensorflow as tf
import numpy as np
x = [[i, i * 3 + 1] for i in range(100)]
y = [2 * i + 1 for i in range(100)]
x = np.array(x)
y = np.array(y)
print_weights = tf.keras.callbacks.LambdaCallback(
on_batch_end=lambda batch, logs: print(K.get_value(model.layers[1].input)))
def sobaka():
a = tf.keras.Input(shape=(2,))
b = tf.keras.layers.Dense(1)
c = b(a)
model = tf.keras.models.Model(a, c)
optimizer = tf.keras.optimizers.Adam(lr=0.1)
model.compile(loss='mean_squared_error', optimizer=optimizer, metrics=['accuracy'])
return model
kek = tf.placeholder(tf.float32, shape=(2,))
model = sobaka()
model.fit(x, y, batch_size=1, epochs=2, callbacks=[print_weights])
So every batch (one training sample) it would print input tensor. But, i got an error:
You must feed a value for placeholder tensor 'input_1' with dtype
float and shape [?,2]
Please, help me understand how to fit placeholder in my code. And is there any possible solution to print information every iteration? (when batch is ,for example, 10?)
One option is to use a [custom callback][1]
like so:
class MyCallback(tf.keras.callbacks.Callback):
def __init__(self, patience=0):
super(MyCallback, self).__init__()
def on_epoch_begin(self, epoch, logs=None):
tf.print(self.model.get_weights())
model.fit(
x_train,
y_train,
epochs=epochs,
batch_size=batch_size,
callbacks=[MyCallback()],
validation_data=(x_test, y_test),
)
I am tryting to use sklearn's gridsearch with a model created by xgboost. To do this, I am creating a custom scorer based on ndcg evaluation. I am successfully able to use Snippet 1 but it is too messy / hacky, I would prefer to use good old sklearn to simplify the code. I tried to implement GridSearch and the results is completely off: for the same X and y sets I get NDCG#k = 0.8 with Snippet 1 versus 0.5 with Snippet 2. Obviously there something I am not doing right here ...
The following pieces of code return very different results:
Snippet1:
kf = StratifiedKFold(y, n_folds=5, shuffle=True, random_state=42)
max_depth = [6]
learning_rate = [0.22]
n_estimators = [43]
reg_alpha = [0.1]
reg_lambda = [10]
for md in max_depth:
for lr in learning_rate:
for ne in n_estimators:
for ra in reg_alpha:
for rl in reg_lambda:
xgb = XGBClassifier(objective='multi:softprob',
max_depth=md,
learning_rate=lr,
n_estimators=ne,
reg_alpha=ra,
reg_lambda=rl,
subsample=0.6, colsample_bytree=0.6, seed=0)
print([md, lr, ne])
score = []
for train_index, test_index in kf:
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
xgb.fit(X_train, y_train)
y_pred = xgb.predict_proba(X_test)
score.append(ndcg_scorer(y_test, y_pred))
print('all scores: %s' % score)
print('average score: %s' % np.mean(score))
Snippet2:
from sklearn.grid_search import GridSearchCV
params = {
'max_depth':[6],
'learning_rate':[0.22],
'n_estimators':[43],
'reg_alpha':[0.1],
'reg_lambda':[10],
'subsample':[0.6],
'colsample_bytree':[0.6]
}
xgb = XGBClassifier(objective='multi:softprob',seed=0)
scorer = make_scorer(ndcg_scorer, needs_proba=True)
gs = GridSearchCV(xgb, params, cv=5, scoring=scorer, verbose=10, refit=False)
gs.fit(X,y)
gs.best_score_
While snippet1 gives me the result as expected, the score returned by Snippet2 is not consistent with the ndcg_scorer.
The problem is with cv inGridSearchCV(xgb, params, cv=5, scoring=scorer, verbose=10, refit=False). It can recieve a KFold / StratifiedKFold instead of an int. Unlike what is says in the doc it seems that by default an agrument of type 'int' is not calling StratifiedKFold another function maybe KFold.