Python - Show one image from an image array - python-3.x

I'm going through this tutorial on pytorch. https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html
And I've been able to show real images next to the fake ones that I have generated.
# Grab a batch of real images from the dataloader
real_batch = next(iter(dataloader))
# Plot the real images
plt.figure(figsize=(15,15))
plt.subplot(1,2,1)
plt.axis("off")
plt.title("Real Images")
plt.imshow(np.transpose(vutils.make_grid(real_batch[0].to(device)[:64], padding=5, normalize=True).cpu(),(1,2,0)))
# Plot the fake images from the last epoch
plt.subplot(1,2,2)
plt.axis("off")
plt.title("Fake Images")
plt.imshow(np.transpose(img_list[-1],(1,2,0)))
plt.show()
Which from my dataset results in this:
I was wondering how I can show one image from the fake images generated. I also want to show it as a 512 X 512 image if possible.
Edit:
The img_list[-1].shape is torch.Size([3, 530, 530]).
Edit2:
This part of the training shows that img_list is a list of images with each image being a group of sub-images (not being able to separate them). Is there a way I can edit this to make img_list an image of each fake image generated?

Here is what I wanted:
noise = torch.randn(1, nz, 1, 1, device=device)
with torch.no_grad():
newfake = netG(noise).detach().cpu()
plt.axis("off")
plt.imshow(np.transpose(newfake[0],(1,2,0)))
plt.show()
As it generates a new image, with a new noise. The img_list was combining the generated images into one image.
However, this code still only generates 64 by 64 pixel images.

Related

My Image Data Generator is converting my images to grayscale(from RGB)

I am currently using Facenet to build a facial Detection and recognition application. The first part takes images from the webcam, detects the Face of the person from the webcam using the MTCNN model. After that it stores the images in a folder. Then I decided to use ImageDataGenerator in that folder to create more images for the dataset, but by datagen gives the resultant images in grayscale format.
Here's the code for it:
datagen = ImageDataGenerator(rotation_range=40,width_shift_range=0.2,height_shift_range=0.2,shear_range=0.2,zoom_range=0.2,horizontal_flip=True,fill_mode='nearest',rescale=False)
This is the flow function:
for train_img in train_images:
img = image.img_to_array(train_img) # convert image to numpy array
img = img.reshape((1,) + img.shape) # reshape image
i = 0
datagen.fit(img)
for batch in datagen.flow(img, save_format='png',save_to_dir=train_path): # this loops runs forever until we break, saving images to current directory with specified prefix
i += 1
if i > 10: # Make 10 Augmentation of every Images
break
Please help.

Image size in DefaultPredictor of Detectron2

For object detection, I'm using detectron2.
I want to fix the input image size so I made my customized dataloader:
def build_train_loader(cls, cfg):
dataloader = build_detection_train_loader(cfg,
mapper=DatasetMapper(cfg, is_train=True, augmentations=[
T.Resize((1200, 1200))
]))
What I wonder is for the prediction, I can use the DefaultPredictor of detectron2 and resize my images to (1200, 1200) as prepossessing before sending to the predictor?
Or the DefaultPredictor is resizing the image before the prediction and I have to override a function to resize to (1200, 1200)?
You have to preprocess the images yourself or to write your own predictor that will apply the resize before calling the model.
The DefaultPredictor applies a ResizeShortestEdge transform (that can be configured in the config file), but this is not exactly what you want.

How to display more than 10 images in Tensorboard?

I noticed that it doesn't matter how many image I save to the tensorboard log file, tensorboard will only ever show 10 of them (per tag).
How can we increase the number of images or at least select which ones are displayed?
To reproduce what I mean run following MCVE:
import torch
from torch.utils.tensorboard import SummaryWriter
tb = SummaryWriter(comment="test")
for k in range(100):
# create an image with some funny pattern
b = [n for (n, c) in enumerate(bin(k)) if c == '1']
img = torch.zeros((1,10,10))
img[0, b, :] = 0.5
img =img + img.permute([0, 2, 1])
# add the image to the tensorboard file
tb.add_image(tag="test", img_tensor=img, global_step=k)
This creates a folder runs in which the data is saved. From the same folder execute tensorboard --logdir runs, open the browser and go to localhost:6006 (or replace 6006 with whatever port tensorboard happens to display after starting it). Then go to the tab called "images" and move the slider above the grayscale image.
In my case it only displayed the images from steps
k = 3, 20, 24, 32, 37, 49, 52, 53, 67, 78
which isn't even an nice even spacing, but looks pretty random. I'd prefer to have
see more than just 10 of the images I saved, and
have a more even spacing of number of steps between each image displayed.
How can I achieve this?
EDIT: I just found the option --samples_per_plugin and tried tensorboard --logdir runs --samples_per_plugin "images=100". This indeed increased the number of images, but it only showed the images from steps k = 0,1,2,3....,78, but none from above 78.
You probably have to wait a little bit longer to wait for all the data to be loaded, but this is indeed the correct solution, see --help:
--samples_per_plugin: An optional comma separated list of plugin_name=num_samples pairs to explicitly specify how many samples
to keep per tag for that plugin. For unspecified plugins, TensorBoard
randomly downsamples logged summaries to reasonable values to prevent
out-of-memory errors for long running jobs. This flag allows fine
control over that downsampling. Note that 0 means keep all samples of
that type. For instance, "scalars=500,images=0" keeps 500 scalars and
all images. Most users should not need to set this flag. (default: '')
Regarding the random samples: This is also true, there is some sort of randomness to it, from the the FAQ:
Is my data being downsampled? Am I really seeing all the data?
TensorBoard uses reservoir sampling to downsample your data so that it
can be loaded into RAM. You can modify the number of elements it will
keep per tag in tensorboard/backend/application.py.

How to differentiate Passport and PAN card Scanned images in python

The goal is to identify that the input scanned image is passport or PAN card using Opencv.
I have used structural_similarity(compare_ssim) method of skimage to compare input scan image with the images of template of Passport and PAN card.
But in both cases i got low score.
Here is the code that i have tried
from skimage.measure import compare_ssim as ssim
import matplotlib.pyplot as plt
import numpy as np
import cv2enter code here
img1 = cv2.imread('PAN_Template.jpg', 0)
img2 = cv2.imread('PAN_Sample1.jpg', 0)
def prepare_img(im):
size = 300, 200
im = cv2.resize(im, size)
return im
img1 = prepare_img(img1)
img2 = prepare_img(img2)
def compare_images(imageA, imageB):
s = ssim(imageA, imageB)
return s
ssim = compare_images(img1, img2)
print(ssim)
Comparing the PAN Card Template with Passport i have got ssim score of 0.12
and Comparing the PAN Card template with a PAN Card the score was 0.20
Since both the score were very close i wast not able to distinguish between them through the code.
If anyone got any other solution or approach then please help.
Here is a sample image
PAN Scanned Image
You can also compare 2 images by the mean square error (MSE) of those 2 images.
def mse(imageA, imageB):
# the 'Mean Squared Error' between the two images is the
# sum of the squared difference between the two images;
# NOTE: the two images must have the same dimension
err = np.sum((imageA.astype("float") - imageB.astype("float")) ** 2)
err /= float(imageA.shape[0] * imageA.shape[1])
# return the MSE, the lower the error, the more "similar"
# the two images are
return err
As per my understanding Pan card and Passport images contain different text data, so i believe OCR can solve this problem.
All you need to do is- extract the text data from the images using any OCR library like Tesseract and look for a few predefined key words in the text data to differentiate the images.
Here is simple Python script showing the image pre-processing and OCR using pyteseract module:
img = cv2.imread("D:/pan.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret,th1 = cv2.threshold(gray,127,255,cv2.THRESH_BINARY)
cv2.imwrite('filterImg.png', th1)
pilImg = Image.open('filterimg.png')
text = pytesseract.image_to_string(pilImg)
print(text.encode("utf-8"))
Below is the binary image used for OCR:
I got the below string data after doing the OCR on the above image:
esraax fram EP aca ae
~ INCOME TAX DEPARTMENT Ld GOVT. OF INDIA
wrtterterad sg
Permanent Account Number. Card \xe2\x80\x98yf
KFWPS6061C
PEF vom ; ae
Reviavs /Father's Name. e.
SUDHIR SINGH : . ,
Though this text data contains noises but i believe it is more than enough to get the job done.
Another OCR solution is to use TextCleaner ImageMagick script from Fred's Scripts. A tutorial which explain how to install and use it (on Windows) is available here.
Script used:
C:/cygwin64/bin/textcleaner -g -e normalize -f 20 -o 20 -s 20 C:/Users/Link/Desktop/id.png C:/Users/Link/Desktop/out.png
Result:
I applied OCR on this with Tesseract (I am using version 4) and that's the result:
fart
INCOME TAX DEPARTMENT : GOVT. OF INDIA
wort cra teat ears -
Permanent Account Number Card
KFWPS6061C
TT aa
MAYANK SUDHIR SINGH el
far aT ary /Father's Name
SUDHIR SINGH
Wa RT /Date of Birth den. +
06/01/1997 genge / Signature
Code for OCR:
import cv2
from PIL import Image
import tesserocr as tr
number_ok = cv2.imread("C:\\Users\\Link\\Desktop\\id.png")
blur = cv2.medianBlur(number_ok, 1)
cv2.imshow('ocr', blur)
pil_img = Image.fromarray(cv2.cvtColor(blur, cv2.COLOR_BGR2RGB))
api = tr.PyTessBaseAPI()
try:
api.SetImage(pil_img)
boxes = api.GetComponentImages(tr.RIL.TEXTLINE, True)
text = api.GetUTF8Text()
finally:
api.End()
print(text)
cv2.waitKey(0)
Now, this don't answer at your question (passport or PAN card) but it's a good point where you can start.
Doing OCR might be a solution for this type of image classification but it might fail for the blurry or not properly exposed images. And it might be slower than newer deep learning methods.
You can use Object detection (Tensorflow or any other library) to train two separate class of image i.e PAN and Passport. For fine-tuning pre-trained models, you don't need much data too. And as per my understanding, PAN and passport have different background color so I guess it will be really accurate.
Tensorflow Object Detection: Link
Nowadays OpenCV also supports object detection without installing any new libraries(i.e.Tensorflow, caffee, etc.). You can refer this article for YOLO based object detection in OpenCV.
We can use:
Histogram Comparison - Simplest & fastest methods, using this we will get the similarity between histograms.
Template Matching - Searching and finding the location of a template image, using this we can find smaller image parts in a bigger one. (like some common patterns in PAN card).
Feature Matching - Features extracted from one image and the same feature will be recognised in another image even if the image rotated or skewed.

Align the Images properly

Hi I am trying to get the handwritten data only from an image, for that I took a empty image and a filled one and then I am doing ImageChops.difference to get the data out of it.
The problem is right now with the alignment of images, both are not equally aligned in terms of depth, so the results are not correct.
from PIL import Image, ImageChops
def compare_images(path_one, path_two, diff_save_location):
"""
Compares to images and saves a diff image, if there
is a difference
#param: path_one: The path to the first image
#param: path_two: The path to the second image
"""
image_one = Image.open(path_one).convert('LA')
image_two = Image.open(path_two).convert('LA')
diff = ImageChops.difference(image_one, image_two)
if diff.getbbox():
diff.convert('RGB').save(diff_save_location)
if __name__ == '__main__':
compare_images('images/blank.jpg',
'images/filled.jpg',
'images/diff.jpg')
This is the result which I got.
the result which I am looking for:
Can anyone help me with this.
Thanks.
This site may be helpful: https://www.learnopencv.com/image-alignment-feature-based-using-opencv-c-python/ . The main idea is to first detect keypoints use SIFT, SURF or other algorithms in both images; then match the keypoints from the empty image with the keypoints from the handwritten image, to get a homography matrix; then use this matrix to align the two images.
After image alignment, post processing may be needed due to illumination or noise.

Resources