For object detection, I'm using detectron2.
I want to fix the input image size so I made my customized dataloader:
def build_train_loader(cls, cfg):
dataloader = build_detection_train_loader(cfg,
mapper=DatasetMapper(cfg, is_train=True, augmentations=[
T.Resize((1200, 1200))
]))
What I wonder is for the prediction, I can use the DefaultPredictor of detectron2 and resize my images to (1200, 1200) as prepossessing before sending to the predictor?
Or the DefaultPredictor is resizing the image before the prediction and I have to override a function to resize to (1200, 1200)?
You have to preprocess the images yourself or to write your own predictor that will apply the resize before calling the model.
The DefaultPredictor applies a ResizeShortestEdge transform (that can be configured in the config file), but this is not exactly what you want.
Related
I am currently using Facenet to build a facial Detection and recognition application. The first part takes images from the webcam, detects the Face of the person from the webcam using the MTCNN model. After that it stores the images in a folder. Then I decided to use ImageDataGenerator in that folder to create more images for the dataset, but by datagen gives the resultant images in grayscale format.
Here's the code for it:
datagen = ImageDataGenerator(rotation_range=40,width_shift_range=0.2,height_shift_range=0.2,shear_range=0.2,zoom_range=0.2,horizontal_flip=True,fill_mode='nearest',rescale=False)
This is the flow function:
for train_img in train_images:
img = image.img_to_array(train_img) # convert image to numpy array
img = img.reshape((1,) + img.shape) # reshape image
i = 0
datagen.fit(img)
for batch in datagen.flow(img, save_format='png',save_to_dir=train_path): # this loops runs forever until we break, saving images to current directory with specified prefix
i += 1
if i > 10: # Make 10 Augmentation of every Images
break
Please help.
I trained an image classification neural network model written in ml5.js. When I try to use the model files in a p5.js web editor, I get an error 'Based on the provided shape, [1,64,64,4], the tensor should have 16384 values but has 20155392'.
The code is in this p5 sketch - https://editor.p5js.org/konstantina1/sketches/85Ny1SC2J (clicking on the arrow in the top right corner will show the files).
When I run a local server on a web page with the same structure, I see 'model ready!' (a confirmation that the model has loaded) and that's it.
I read a lot of comments that the bin file may be corrupt - I saved the model myself producing the bin file so it should be ok.
As suggested here by the author of very similar code, https://www.youtube.com/watch?v=3MqJzMvHE3E, adding pixelDensity(1) in setup() doesn't help.
I am new to machine learning, could someone please help? Thank you in advance.
The model was trained with images 64x64 px so the input test image must be the same size.
1944 (original image width) * 2592 (original image height) * 4 (number of channels) = 20155392. The tensor should have 64 (image width) * 64 (image height) * 4 (number of channels) = 16387 values. This is what the error refers to.
The copy() method used originally didn't resize the input image properly.
The correct way to resize the image is inputImage.resize(IMAGE_WIDTH, IMAGE_HEIGHT).
Working sketch: https://editor.p5js.org/konstantina1/sketches/85Ny1SC2J
A version of the sketch with image file upload: https://editor.p5js.org/konstantina1/sketches/qMNkkkbIm
I'm going through this tutorial on pytorch. https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html
And I've been able to show real images next to the fake ones that I have generated.
# Grab a batch of real images from the dataloader
real_batch = next(iter(dataloader))
# Plot the real images
plt.figure(figsize=(15,15))
plt.subplot(1,2,1)
plt.axis("off")
plt.title("Real Images")
plt.imshow(np.transpose(vutils.make_grid(real_batch[0].to(device)[:64], padding=5, normalize=True).cpu(),(1,2,0)))
# Plot the fake images from the last epoch
plt.subplot(1,2,2)
plt.axis("off")
plt.title("Fake Images")
plt.imshow(np.transpose(img_list[-1],(1,2,0)))
plt.show()
Which from my dataset results in this:
I was wondering how I can show one image from the fake images generated. I also want to show it as a 512 X 512 image if possible.
Edit:
The img_list[-1].shape is torch.Size([3, 530, 530]).
Edit2:
This part of the training shows that img_list is a list of images with each image being a group of sub-images (not being able to separate them). Is there a way I can edit this to make img_list an image of each fake image generated?
Here is what I wanted:
noise = torch.randn(1, nz, 1, 1, device=device)
with torch.no_grad():
newfake = netG(noise).detach().cpu()
plt.axis("off")
plt.imshow(np.transpose(newfake[0],(1,2,0)))
plt.show()
As it generates a new image, with a new noise. The img_list was combining the generated images into one image.
However, this code still only generates 64 by 64 pixel images.
I am writing a code of a well-known problem MNIST database of handwritten digits in PyTorch. I downloaded the train and testing dataset (from the main website) including the labeled dataset. The dataset format is t10k-images-idx3-ubyte.gz and after extract t10k-images-idx3-ubyte. My dataset folder looks like
MINST
Data
train-images-idx3-ubyte.gz
train-labels-idx1-ubyte.gz
t10k-images-idx3-ubyte.gz
t10k-labels-idx1-ubyte.gz
Now, I wrote a code to load data like bellow
def load_dataset():
data_path = "/home/MNIST/Data/"
xy_trainPT = torchvision.datasets.ImageFolder(
root=data_path, transform=torchvision.transforms.ToTensor()
)
train_loader = torch.utils.data.DataLoader(
xy_trainPT, batch_size=64, num_workers=0, shuffle=True
)
return train_loader
My code is showing Supported extensions are: .jpg,.jpeg,.png,.ppm,.bmp,.pgm,.tif,.tiff,.webp
How can I solve this problem and I also want to check that my images are loaded (just a figure contains the first 5 images) from the dataset?
Read this Extract images from .idx3-ubyte file or GZIP via Python
Update
You can import data using this format
xy_trainPT = torchvision.datasets.MNIST(
root="~/Handwritten_Deep_L/",
train=True,
download=True,
transform=torchvision.transforms.Compose([torchvision.transforms.ToTensor()]),
)
Now, what is happening at download=True first your code will check at the root directory (your given path) contains any datasets or not.
If no then datasets will be downloaded from the web.
If yes this path already contains a dataset then your code will work using the existing dataset and will not download from the internet.
You can check, first give a path without any dataset (data will be downloaded from the internet), and then give another path which already contains dataset data will not be downloaded.
Welcome to stackoverflow !
The MNIST dataset is not stored as images, but in a binary format (as indicated by the ubyte extension). Therefore, ImageFolderis not the type dataset you want. Instead, you will need to use the MNIST dataset class. It could even download the data if you had not done it already :)
This is a dataset class, so just instantiate with the proper root path, then put it as the parameter of your dataloader and everything should work just fine.
If you want to check the images, just use the getmethod of the dataloader, and save the result as a png file (you may need to convert the tensor to a numpy array first).
Hi I am trying to get the handwritten data only from an image, for that I took a empty image and a filled one and then I am doing ImageChops.difference to get the data out of it.
The problem is right now with the alignment of images, both are not equally aligned in terms of depth, so the results are not correct.
from PIL import Image, ImageChops
def compare_images(path_one, path_two, diff_save_location):
"""
Compares to images and saves a diff image, if there
is a difference
#param: path_one: The path to the first image
#param: path_two: The path to the second image
"""
image_one = Image.open(path_one).convert('LA')
image_two = Image.open(path_two).convert('LA')
diff = ImageChops.difference(image_one, image_two)
if diff.getbbox():
diff.convert('RGB').save(diff_save_location)
if __name__ == '__main__':
compare_images('images/blank.jpg',
'images/filled.jpg',
'images/diff.jpg')
This is the result which I got.
the result which I am looking for:
Can anyone help me with this.
Thanks.
This site may be helpful: https://www.learnopencv.com/image-alignment-feature-based-using-opencv-c-python/ . The main idea is to first detect keypoints use SIFT, SURF or other algorithms in both images; then match the keypoints from the empty image with the keypoints from the handwritten image, to get a homography matrix; then use this matrix to align the two images.
After image alignment, post processing may be needed due to illumination or noise.