Inside my custom dataset, I want to apply transforms.Compose() to a NumPy array.
My images are in a NumPy array format with shape (num_samples, width, height, channels).
How can I apply the follwoing transforms to the full numpy array?
img_transform = transforms.Compose([
transforms.Scale((224,224)),
transforms.ToTensor(),
transforms.Normalize([0.46, 0.48, 0.51], [0.32, 0.32, 0.32])
])
My attempts are ending in multiple errors as the transforms accept a PIL image not a 4-d NumPy array.
from torchvision import transforms
import numpy as np
import torch
img_transform = transforms.Compose([
transforms.Scale((224,224)),
transforms.ToTensor(),
transforms.Normalize([0.46, 0.48, 0.51], [0.32, 0.32, 0.32])
])
a = np.random.randint(0,256, (299,299,3))
print(a.shape)
img_transform(a)
All torchvision transforms operate on single images, not batches of images, hence a 4D array cannot be used.
Single images given as NumPy arrays, like in your code example, can be used by converting them to a PIL image. You can simply add transforms.ToPILImage to the beginning of the transformation pipeline, as it converts either a tensor or a NumPy array to a PIL image.
img_transform = transforms.Compose([
transforms.ToPILImage(),
transforms.Resize((224,224)),
transforms.ToTensor(),
transforms.Normalize([0.46, 0.48, 0.51], [0.32, 0.32, 0.32])
])
Note: transforms.Scale is deprecated in favour of transforms.Resize.
In your example you used np.random.randint, which by default uses type int64, but images have to be uint8. Libraries such as OpenCV return uint8 arrays when loading an image.
a = np.random.randint(0,256, (299,299,3), dtype=np.uint8)
Related
I have trained a ResNet50 model on intel image multiclass classification task. The task is trying to predict an image whether it is a building a street or glacier etc. The model is succesfully trained and able to make prediction. I have save the model and trying to use the saved model on new image.
Here is the code on training
import os
import torch
import tarfile
import torchvision
import torch.nn as nn
from PIL import Image
import matplotlib.pyplot as plt
import torch.nn.functional as F
from torchvision import transforms
from torchvision.utils import make_grid
from torch.utils.data import random_split
from torchvision.transforms import ToTensor
from torchvision.datasets import ImageFolder
from torch.utils.data import Dataset, DataLoader
from torchvision.datasets.utils import download_url
import PIL
import PIL.Image
import numpy as np
transform_train=transforms.Compose([
transforms.Resize((150,150)),
transforms.RandomHorizontalFlip(),
transforms.RandomVerticalFlip(),
transforms.ToTensor(),
transforms.Normalize((.5,.5,.5),(.5,.5,.5))
])
transform_test=transforms.Compose([
transforms.Resize((150,150)),
transforms.ToTensor(),
transforms.Normalize((.5,.5,.5),(.5,.5,.5))
])
...
torch.save(model2.state_dict(),'/content/drive/MyDrive/saved_model/model_resnet.pth')
When I called the model in other files, I use similar image transformation, however it gives me an error, here is the code and the error
model = torch.load('/content/drive/MyDrive/saved_model/model_resnet.pth')
image=Image.open(Path('/content/drive/MyDrive/images/seg_pred/seg_pred/10004.jpg'))
transform_train=transforms.Compose([
transforms.Resize((150,150)),
transforms.RandomHorizontalFlip(),
transforms.RandomVerticalFlip(),
transforms.ToTensor(),
transforms.Normalize((.5,.5,.5),(.5,.5,.5))
])
input = transform_train(image)
#input = input.view(1, 3, 150,150)
output = model(input)
prediction = int(torch.max(output.data, 1)[1].numpy())
print(prediction)
The error that gives me is
TypeError: 'collections.OrderedDict' object is not callable
My pytorch version is
1.9.0+cu102
You need to create the structure of the model first, it's similar to create model2 on your training code, it can be like:
model = resnet()
Then load the saved state dict:
model.load_state_dict(torch.load('/content/drive/MyDrive/saved_model/model_resnet.pth'))
model.eval()
Ref:
https://pytorch.org/tutorials/beginner/saving_loading_models.html
Based on your question it's clear that you want to prediction on a new image. But you are trying to augment and get transform the image using transform which is not a proper way to get the prediction.
So as the code link you provided having plenty of code you can use them as in your code.
I am sharing the fast.ai and simple `TensorFlow code by which you can predict a new image and then be able to see the result.
img = open_image('any_image.jpg')
print(learn.predict(img)[0])
OR you can try this function:
import matplotlib.pyplot as plt # visualization
import matplotlib.image as mpimg
import tensorflow as tf # Deep Learning Framework
import pathlib
def pred_plot(file, model, class_names=class_names, image_size=(150, 150)):
img = tf.io.read_file(file)
img = tf.io.decode_image(img, channels=3)
img = tf.image.resize(img, size=image_size)
pred_probs = model.predict(tf.expand_dims(img, axis=0))
pred_class = class_names[pred_probs.argmax()]
plt.imshow(img/225.)
plt.title(f'Pred: {pred_class}')
plt.axis(False);
pass any image and you will get the prediction with visilzation.
url ='dummy.jpg'
pred_plot(url, model=model_2, class_names=class_names)
To load my image dataset, I have done following coding
X=[]
for i in range(1,682):
image=Image.open(str(i)+'.jpg')
image=image.resize((100,100))
temp=asarray(image)
X.append(temp)
Shape of X is (681,100,100,3) but I want shape of X to be (681,100,100). How can I do that?
You can use opencv to read images it reads images as numpy arrays
import cv2
X=[]
for i in range(1,682):
temp = cv2.imread(str(i)+'.jpg', cv2.IMREAD_GRAYSCALE)
temp = cv2.resize(temp, (100,100))
X.append(temp)
i can show the image using image.open, but how do i display from the binary data?
trying to use plot gets: ValueError: x and y can be no greater than 2-D, but have shapes (64,) and (64, 64, 3). this makes sense as that is what the result is supposed to be, but how do i display it?
import pathlib
import glob
from os.path import join
import matplotlib.pyplot as plt
from PIL import Image
import tensorflow as tf
def parse(image): # my like ings, but with .png instead of .jpeg.
image_string = tf.io.read_file(image)
image = tf.image.decode_png(image_string, channels=3)
image = tf.image.convert_image_dtype(image, tf.float32)
image = tf.image.resize(image, [64, 64])
return image
root = "in/flower_photos/tulips"
path = join(root,"*.jpg")
files = sorted(glob.glob(path))
file=files[0]
image = Image.open(file)
image.show()
binary=parse(file)
print(type(binary))
# how do i see this?
#plt.plot(binary) # does not seem to work
#plt.show() # does not seem to work
found a nice pillow tutorial.
from matplotlib import image
from matplotlib import pyplot
from PIL import Image
# load the image
filename='Sydney-Opera-House.jpg'
im = Image.open(filename)
# summarize some details about the image
print(im.format)
print(im.mode)
print(im.size)
# show the image
#image.show()
# load image as pixel array
data = image.imread(filename)
# summarize shape of the pixel array
print(data.dtype)
print(data.shape)
# display the array of pixels as an image
pyplot.imshow(data)
pyplot.show()
I am trying to use an ExtraTreesClassifier with sparse data, as per the documentation, however I do get a run time TypeError asking for dense data. This is on scikit-learn 0.17.1, and below I am quoting from the documentation:
Parameters:
X : array-like or sparse matrix of shape = [n_samples, n_features]
The code is quite simple:
import pandas as pd
from scipy.sparse import coo_matrix, csr_matrix, hstack
from sklearn.ensemble import ExtraTreesClassifier
import numpy as np
from scipy import *
features = array([[1, 0], [0, 1], [3, 4]])
sparse_features = csr_matrix(features)
labels = array([0, 1, 0])
classifier = ExtraTreesClassifier()
classifier.fit(sparse_features, labels)
And here the exception: TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.. This works fine when passing in features.
Looks like the documentation is out of date or is there something wrong with the above code?
Any help will be greatly appreciated. Thank you.
Quoting the documentation:
Internally, it will be converted to dtype=np.float32 and if a sparse matrix is provided to a sparse csc_matrix.
So I expect that passing a csc_matrix should help.
On my setup both version work normally (csc and csr, sklearn 0.17.1), I assume that problems could be with older versions of scipy.
Long story short, I'm just simply trying to get a canny edged image of image.jpg.
The documentation is very spotty so I'm getting very confused. If anyone can help that'd be greatly appreciated.
from scipy import misc
import numpy as np
from skimage import data
from skimage import feature
from skimage import io
im=misc.imread('image1.jpg')
edges1 = feature.canny(im)
...
And I'm getting this error
ValueError: The parameter `image` must be a 2-dimensional array
Can anyone explain how to create a 2D array from an image file?
Thanks!
I suspect image1.jpg is a color image, so im is 3D, with shape (num_rows, num_cols, num_color_channels). One option is to tell imread to flatten the image into a 2D array by giving it the argument flatten=True:
im = misc.imread('image1.jpg', flatten=True)
Or you could apply canny to just one of the color channels, e.g.
im = misc.imread('image1.jpg')
red_edges = feature.canny(im[:, :, 0])
The canny edge detection needs a grayscale image input in order to work.
You can convert 3D (color) images to 2D (grayscale) using the rgb2gray module in scikit-image.
from skimage import io, features
from skimage.color import rgb2gray
image = rgb2gray(io.imread("image.png"))
edges = feature.canny(image)