I tried to apply a gaussian filter to 6 images to denoise them using the following code:
import os
import matplotlib.image as img
def load_data(dir_name ='C:/Users/ASUS/Desktop/Self_Learning/Coursera/Deep Learning in Computer Vision/plates'):
for f in os.listdir(dir_name):
fpath = os.path.join(dir_name, f) # this will give you the path of each file in your directory
im = img.imread(fpath)
return im_list
plates = load_data()
# The auxiliary function `visualize()` displays the images given as argument.
def visualize(imgs, format=None):
plt.figure(figsize=(20, 40))
for i, img in enumerate(imgs):
if img.shape[0] == 3:
img = img.transpose(1,2,0)
plt_idx = i+1
plt.subplot(3, 3, plt_idx)
plt.imshow(img, cmap=format)
visualize(plates, 'gray')
from scipy import ndimage
def noise_reduction(img):
for i in img:
gauss_filtered = ndimage.gaussian_filter(i, sigma=1.4,truncate=2.0)
return denoised_list
denoised_img= noise_reduction(plates)
visualize(denoised_img, 'gray')
plates is the file that contains my images and visualize is a function to display the images.
The result should have been 6 denoised gray scale images. However, I got blue-ish ones.
Here are my originale images (plates):
This is the result after applying the gaussian filter:


How to create image sequence?

My problem is extracting the text from multiple columns of .PDF.
Common libs like PyPDF2 didn't work.
The code below I made to try to read with Pytesseract but I was also unsuccessful because it is mixing the two columns.
Now my idea using this code as a base is to create a cutout in each column 1 and 2 and generate a new image by pasting column 1 and then columns 2 below, so I could read with Pytesseract or AWS Textract without problems.
how could i do this with opencv?
import fitz
import cv2
import pytesseract
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
SCANNED_FILE = "decreto_santos.pdf"
img = cv2.imread(SCANNED_FILE)
zoom_x = 2.0
zoom_y = 2.0
mat = fitz.Matrix(zoom_x, zoom_y)
# I create an image for each page of the PDF and save.
doc =
print("Generated pages: ")
for page in doc:
pix = page.get_pixmap(matriz=mat)
png = 'output/' + SCANNED_FILE.split('/')[-1].split('.')[0] + 'page-%i.png' % page.number
# Upload an image to crop
original_image = cv2.imread('output/decreto_santospage-1.png')
# Grayscale image
gray_image = cv2.cvtColor(original_image, cv2.COLOR_BGR2GRAY)
plt.figure(figsize=(25, 15))
plt.imshow(gray_image, cmap='gray')
# Result:
# Otsu thresholding
ret, threshold_image = cv2.threshold(gray_image, 0,255, cv2.THRESH_OTSU | cv2.THRESH_BINARY_INV)
plt.figure(figsize=(25, 15))
plt.imshow(threshold_image, cmap='gray')
# Result:
rectangular_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
# Applying dilation on the threshold image
dilated_image = cv2.dilate(threshold_image, rectangular_kernel, iterations = 1)
plt.figure(figsize=(25, 15))
# Result:
# Finding contours
contours, hierarchy = cv2.findContours(dilated_image, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
# Creating a copy of the image
copied_image = original_image.copy()
with open("output/recognized-kernel-66-66.txt", "w+") as f:
mask = np.zeros(original_image.shape, np.uint8)
# Looping through the identified contours
# Then rectangular part is cropped and passed on to pytesseract
# pytesseract extracts the text inside each contours
# Extracted text is then written into a text file
for cnt in contours:
x, y, w, h = cv2.boundingRect(cnt)
# Cropping the text block for giving input to OCR
cropped = copied_image[y:y + h, x:x + w]
with open("output/recognized-kernel-66-66.txt", "a") as f:
# Apply OCR on the cropped image
text = pytesseract.image_to_string(cropped, lang='por', config='--oem 1 --psm 1')
masked = cv2.drawContours(mask, [cnt], 0, (255, 255, 255), -1)
plt.figure(figsize=(25, 15))
plt.imshow(masked, cmap='gray')
My base for this code was this post

How to increase the image size extracted from visualizing the feature map?

can someone help me on how to increase the size of images from feature map extracted? i recently run CNN on set of images and would like to see the feature extracted. I manage to extract it but unable to actually see it because it was too small.
My code:
from matplotlib import pyplot
#summarize feature map shapes
for i in range(len(cnn.layers)):
layer = cnn.layers[i]
#check fr conv layer
if 'conv' not in
from keras import models
from keras.preprocessing import image
model_new = models.Model(inputs=cnn.inputs, outputs=cnn.layers[1].output)
img_path = 'train/1/2NbeGPsQf2Q - 4 0.jpg'
img = image.load_img(img_path, target_size=(img_rows, img_cols))
import numpy as np
from keras.applications.imagenet_utils import decode_predictions, preprocess_input
img = image.img_to_array(img)
img = np.expand_dims(img, axis=0)
img = preprocess_input(img)
features = model_new.predict(img)
square = 10
ix = 1
for _ in range(square):
for _ in range(square):
# specify subplot and turn of axis
ax = pyplot.subplot(square, square, ix)
# plot filter channel in colour
pyplot.imshow(features[0, :, :, ix-1], cmap='viridis')
ix += 1
# show the figure
the result is at attached.output of feature map layer 1
its too small. How can i make it bigger so i can see what actually is there?
Appreciate for any input. Thanks!

Hyperparameter optimization in pytorch (currently with sklearn GridSearchCV)

I use this(link) pytorch tutorial and wish to add the grid search functionality in it ,sklearn.model_selection.GridSearchCV (link), in order to optimize the hyper parameters. I struggle in understanding what X and Y in,y) should be; per the documentation (link) x and y are supposed to have the following structure but I have trouble figuring out how to get these off the code. The output of the class PennFudanDataset returns img and target in a form that does not align with the X, Y I need.
Are n_samples, n_features within the following block of code or in the tutorial’s block regarding the model?
fit(X, y=None, *, groups=None, **fit_params)[source]
Run fit with all sets of parameters.
Xarray-like of shape (n_samples, n_features)
Training vector, where n_samples is the number of samples and n_features is the number of features.
yarray-like of shape (n_samples, n_output) or (n_samples,), default=None
Target relative to X for classification or regression; None for unsupervised learning.
Is there something else we could use instead that is easier to implement for this particular tutorial? I’ve read about ray tune(link), optuna(link) etc. but they seem more complex than that. I am currently also looking into scipy.optimize.brute(link) which seems simpler.
PennFundanDataset class:
import os
import numpy as np
import torch
from PIL import Image
class PennFudanDataset(object):
def __init__(self, root, transforms):
self.root = root
self.transforms = transforms
# load all image files, sorting them to
# ensure that they are aligned
self.imgs = list(sorted(os.listdir(os.path.join(root, "PNGImages"))))
self.masks = list(sorted(os.listdir(os.path.join(root, "PedMasks"))))
def __getitem__(self, idx):
# load images ad masks
img_path = os.path.join(self.root, "PNGImages", self.imgs[idx])
mask_path = os.path.join(self.root, "PedMasks", self.masks[idx])
img ="RGB")
# note that we haven't converted the mask to RGB,
# because each color corresponds to a different instance
# with 0 being background
mask =
# convert the PIL Image into a numpy array
mask = np.array(mask)
# instances are encoded as different colors
obj_ids = np.unique(mask)
# first id is the background, so remove it
obj_ids = obj_ids[1:]
# split the color-encoded mask into a set
# of binary masks
masks = mask == obj_ids[:, None, None]
# get bounding box coordinates for each mask
num_objs = len(obj_ids)
boxes = []
for i in range(num_objs):
pos = np.where(masks[i])
xmin = np.min(pos[1])
xmax = np.max(pos[1])
ymin = np.min(pos[0])
ymax = np.max(pos[0])
boxes.append([xmin, ymin, xmax, ymax])
# convert everything into a torch.Tensor
boxes = torch.as_tensor(boxes, dtype=torch.float32)
# there is only one class
labels = torch.ones((num_objs,), dtype=torch.int64)
masks = torch.as_tensor(masks, dtype=torch.uint8)
image_id = torch.tensor([idx])
area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
# suppose all instances are not crowd
iscrowd = torch.zeros((num_objs,), dtype=torch.int64)
target = {}
target["boxes"] = boxes
target["labels"] = labels
target["masks"] = masks
target["image_id"] = image_id
target["area"] = area
target["iscrowd"] = iscrowd
if self.transforms is not None:
img, target = self.transforms(img, target)
return img, target
def __len__(self):
return len(self.imgs)

how do I extract all pixel values from a certain ROI and then store it as a CSV file and again use that csv file to get back that ROI

After applying mask original image
import cv2
import dlib
import numpy as np
img = cv2.imread("Aayush.jpg")
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
msk = np.zeros_like(img_gray)
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")
faces = detector(img_gray)
for face in faces:
landmarks = predictor(img_gray, face)
lp = []
for n in range(0,68):
x = landmarks.part(n).x
y = landmarks.part(n).y
p = np.array(lp, np.int32), (x,y), 3, (0, 0, 255), -1)
convexhull = cv2.convexHull(p)
#cv2.polylines(img, [convexhull], True, (255,0,0), 3)
cv2.fillConvexPoly(msk, convexhull, 255)
img1 = cv2.bitwise_and(img, img, mask = msk)
img1 containsa complete black image with face cut from img, I just require the pixel values of face portion and not complete image
As original image and mask have not been provided in the question itself. I am assuming a simple input image and a mask image with circular cavity as:
The mask here is a single channel matrix with a value of 255 in the central cavity. To get the pixel info inside the cavity only you can use following numpy operation:
pixel_info = original_image[mask == 255]
# You may need to convert the numpy array to Python list.
pixel_info_list = pixel_info.tolist()
Now you may serialize the list to any format you want (csv in this case.)
Full code:
import cv2
import numpy as np
original_image = cv2.imread("/path/to/lena.png")
mask = np.zeros(original_image.shape[:2], dtype=original_image.dtype)
mask =, (256, 256), 100, [255], -1)
pixel_info = original_image[mask == 255]
pixel_info_list = pixel_info.tolist()

Find image in scaled Image

I have to find an image during a stream of desktop. My code work, but if the image, during the stream, is resized, the program not work. How can I solve this problems?
from PIL import ImageGrab
import numpy as np
import cv2
template = cv2.imread('piccola.png') #image to find
w, h = template.shape[:-1]
while 1:
img = ImageGrab.grab(bbox=(0,0,800,600)) #bbox specifies specific region (bbox= x,y,width,height *starts top-left)
img_np = np.array(img) #this is the array obtained from conversion
#frame = cv2.cvtColor(img_np, cv2.COLOR_BGR2GRAY)
res = cv2.matchTemplate(img_np, template, cv2.TM_CCOEFF_NORMED)
threshold = .85
loc = np.where(res >= threshold)
for pt in zip(*loc[::-1]): # Switch columns and rows
cv2.rectangle(img_np, pt, (pt[0] + h, pt[1] + w), (0, 0, 255), 2)
cv2.imshow("output", img_np)
if cv2.waitKey(25) & 0xFF == ord('q'):
Instead of using cv2.matchTemplate, you can actually extract features from your template image, i.e. extract features such as SIFT/ORB/KAZE/BRISK and match them against by extracting the same features from the grabbed image. You can set up a threshold for the matching criteria.
you can read more about feature description and matching here -
Sample code for your understanding.
import cv2
import numpy as np
img1 = cv2.imread("template.jpg", cv2.IMREAD_GRAYSCALE)
img2 = cv2.imread("image.jpg", cv2.IMREAD_GRAYSCALE)
# ORB Detector
orb = cv2.ORB_create()
kp1, des1 = orb.detectAndCompute(img1, None)
kp2, des2 = orb.detectAndCompute(img2, None)
# Brute Force Matching
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = bf.match(des1, des2)
matches = sorted(matches, key = lambda x:x.distance)
#drawing the matches
matching_result = cv2.drawMatches(img1, kp1, img2, kp2, matches[:50], None, flags=2)
you can filter the matches which have the distance > 0.7 (usual threshold) and check the percentage of matches. Based on that you can decide how well it's finding the similar images.
Methods like SIFT is patented but performs well.
Methods like ORB is fastest, but not invariant to scale.
you can try methods like KAZE and AKAZE.
