Training a `BoostedTreesClassifier` in Tensorflow? - python-3.x

I am learning TensorFlow and am trying to train a BoostedTreesClassifier (premade estimator). However, I cannot get it to work with my bucketized columns. Below is my bucketized column:
age_bucket_column = tf.feature_column.bucketized_column(tf.numeric_column(key='age'), [20, 40, 60])
Here is my train input function (note features is a Pandas DataFrame):
def train_input_fn(features, labels, batch_size):
dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
dataset = dataset.shuffle(buffer_size=1000).repeat(count=None).batch(batch_size)
return dataset.make_one_shot_iterator().get_next()
Here is my estimator:
boosted_trees_classifier = tf.estimator.BoostedTreesClassifier(
feature_columns=[age_bucket_column],
n_batches_per_layer=100
)
And here is my code to train it:
classifier.train(
input_fn=lambda: train_input_fn(train_X, train_y, 100),
steps=1000
)
However, when I run it, I get the following error:
ValueError: Tensor conversion requested dtype float32 for Tensor with dtype int64: 'Tensor("IteratorGetNext:13", shape=(?,), dtype=int64, device=/device:CPU:0)'
Note that when I run the same code but with another model (say a LinearClassifier or DNNClassifier) it works perfectly. What am I doing wrong? Thank you in advance!

This is probably because of your labels are of type int64. Cast them to float32
train_y = pd.Series(train_y , index=np.array(range(1, train_y.shape[0] + 1)), dtype=np.float32)

Related

pytorch subset on dataloader decrease number of data

i'm using random_split()
dataset_train, dataset_valid = random_split(dataset, [int(len(dataset) * 0.8), int(len(dataset) * 0.2+1)])
len(dataset_train) is 1026 and len(dataset_valid) is 257 but put this two vriable into dataloader decrease number of data
loader_train = DataLoader(dataset_train, batch_size=batch_size, shuffle=True, num_workers=0)
loader_val = DataLoader(dataset_valid, batch_size=batch_size, shuffle=True, num_workers=0)
print (len(loader_train))
print (len(loader_val))
output is :
257, 65
I don't know why decrease the size of dataset.
please any help. thanks.
You can use data.Subset which works as a wrapper for data.Dataset instances. You need to provide a generator or list of indices that you want to retain in the constructed dataset
Here is a minimal setup example:
>>> dataset_train = data.TensorDataset(torch.arange(100))
Construct the subset by wrapping dataset_train:
>>> subset = data.Subset(ds, range(10)) # will select the first 10 of dataset_train
Finally construct your dataloader:
>>> loader_train = data.DataLoader(subset, batch_size=2)
As an illustration here is loader_train:
>>> for x in dl:
... print(x)
[tensor([0, 1])]
[tensor([2, 3])]
[tensor([4, 5])]
[tensor([6, 7])]
[tensor([8, 9])]

I can predict one image but not a set of images with a pytorch resnet18 model, how can i predict a set of images in a list using pytorch models?

x is a list of (36, 60, 3) images. I am trying to predict with a pytorch pretrained resnet18 the output on my images. I took x as a list of 2 images. when I take only 1 image, i get the prediction with no errors as it follows:
im = x[0]
preprocess = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)])
# Pass the image for preprocessing and the image preprocessed
img_preprocessed = preprocess(im)
# Reshape, crop, and normalize the input tensor for feeding into network for evaluation
batch_img_tensor = torch.unsqueeze(img_preprocessed, 0)
resnet18.eval()
out = resnet18(batch_img_tensor).flatten()
but it does not work when i set im=x. Something goes wrong in preprocessing line and I get this error:
TypeError: pic should be PIL Image or ndarray. Got <class 'list'>
I tried Variable (torch.tensot(x)) as follows :
x=dataset(source_p)
y=Variable(torch.tensor(x))
print(y.shape)
resnet18(y)
I get the following error :
RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[2, 36, 60, 3] to have 3 channels, but got 36 channels instead
My question is : how can I predict all images in x list at once?
Thanks!
Eventually I created a class that takes x and transforms all elements :
class formDataset(Dataset):
def __init__(self, imgs, transform=None):
self.imgs = imgs
self.transform = transform
def __len__(self):
return len(self.imgs)
def __getitem__(self, idx):
if torch.is_tensor(idx):
idx = idx.tolist()
image = self.imgs[idx]
sample = {image}
if self.transform:
sample = self.transform(sample)
return sample
after I call
l_set=formDataset(imgs=x,transform=preprocess)
l_loader = DataLoader(l_set, batch_size=2)
for data in (l_loader):
features=resnet(outputs)
You need to batch your images along 0th dimension.
im = torch.stack(x, 0)

Getting the true labels for augmented data from Keras data generator when batch size is greater than 1 using flow_from_dataframe

I'm using keras version 2.3.1 and tensorflow version on Ubuntu 18.04.4 LTS. I created my data generator as follows:
datagen = ImageDataGenerator(brightness_range=(0.5, 1.5),
preprocessing_function=densenet.preprocess_input)
datagen = datagen.flow_from_dataframe(dataframe=df,
x_col="Image Path",
y_col=["class1", "class2", "class3"],
class_mode="raw",
shuffle=True,
batch_size=32,
seed=123,
target_size=(256, 256))
Suppose the number of images in dataframe (df) is 100 images. If I make prediction on this data generator as follows
probs = model.predict_generator(datagen, steps=datagen.n, verbose=1)
I will get the probs as arrays with dimension (3200, 3) because of the data augmentation. However, when I try to link to the true labels using the following command.
y_true = datagen.labels
I got the array of y_true with dimension (100, 3) which is the labels of images before augmentation. How can I like each augmented sample to its true label? Since I configured my datagen to shuffle images, I'm not sure whether I can replicate each true label in y_true with the number of batch_size using this command or not.
y_true_augmented = np.repeat(y_true, repeats=32, axis=0)
May I have your suggestions?
As the datagen is a generator and you have specified the random seed then can't you use something like:
y_true = []
for x,y in datagen:
y_true.append(y)
for n steps and then pass the datagen to your model. Kidnly correct me if I am wrong.

Dimension error in feeding Keras with Tensorflow dataset

I have a TFRecords file consisting of 60 examples of six Landsat band values for some pixels plus a label for each pixel, and I want to train a Keras classifier with it. But I get a dimension mismatch when I try to load network with the data.
TFRecords file is generated with below structure:
# TFRecords file contains below features per each example
bands = ['B2', 'B3', 'B4', 'B5', 'B6', 'B7','landcover']
columns = [tf.FixedLenFeature(shape=[1], dtype=tf.float32) for k in bands]
featuresDict = dict(zip(bands, columns))
And my code for defining generator function and Keras model is as follows:
def tfdata_generator_training(fileName, batchSize=None):
dataset = tf.data.TFRecordDataset(fileName, compression_type='GZIP')
def parse_tfrecord(example):
features = tf.parse_single_example(example, featuresDict)
# Extract landcover and remove it from dictionary
labels = features.pop('landcover')
labels = tf.one_hot(tf.cast(labels, tf.uint8), 3)
# Return list of dictionary values (to be convertable to numpy array for Keras) and pixel label in one-hot format
return list(features.values()), labels
# Map the parsing function over the dataset
dataset = dataset.map(parse_tfrecord)
dataset = dataset.batch(batchSize)
return dataset
training_data = tfdata_generator_training(fileName=<my_file_path>, batchSize=1)
def keras_model():
from tensorflow.keras.layers import Dense, Input
inputs = Input(shape=(6,1))
x = Dense(5, activation='relu')(inputs)
x = Dense(7, activation='relu')(x)
outputs = Dense(3, activation='softmax')(x)
return tf.keras.Model(inputs, outputs)
model = keras_model()
model.compile('adam', 'categorical_crossentropy', metrics=['acc'])
model.fit(training_data.make_one_shot_iterator(), steps_per_epoch=60, epochs=8)
But I get below error when running the code:
ValueError: Error when checking target: expected dense_2 to have shape (6, 3) but got array with shape (1, 3)
What is the problem with my code? I also tried to get the dimensions of the input layer and the Tensorflow printout was as follows:
(<tf.Tensor 'IteratorGetNext:0' shape=(?, 6, 1) dtype=float32>, <tf.Tensor 'IteratorGetNext:1' shape=(?, 1, 3) dtype=float32>)

how to save and restore tf.estimator model

I would like to save and restor my tf.estimator model. Although I tried to follow other related issues on stackoverflow, I could not be successful. The following input_fn can provide the data to be predicted. But I do not know how to use it to save and restore the model to predict.
Btw, my return dataset has a shape of [batch_size, dim] where dtype is float32
def predict_input_fn(path, dim, batch_size):
dataset = ds.get_dataset(path,
dim)
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(1)
return dataset
What I have tried until so far is the following but it did not work as expected, could you please help me to save and restore such a model?
Trial
def serving_input_receiver_fn():
features = tf.placeholder(
dtype=tf.float32, shape=[None, batch_size])
fn = lambda x : precict_input_fn(path, dim, batch_size)
mapped_fn = tf.map_fn(fn, features)
return tf.estimator.export.ServingInputReceiver(mapped_fn, features)
estimator.export_savedmodel(model_save_path, serving_input_receiver_fn)
Error:
Failed to convert object of type <class 'tensorflow.python.data.ops.dataset_ops.PrefetchDataset'> to Tensor. Contents: <PrefetchDataset shapes: (?, 1024), types: tf.float32>. Consider casting elements to a supported type

Resources