I have the following code in keras:
# load all images in a directory into memory
def load_images(path, size=(256,512)):
src_list, tar_list = list(), list()
# enumerate filenames in directory, assume all are images
for filename in listdir(path):
# load and resize the image
pixels = load_img(path + filename, target_size=size)
# convert to numpy array
pixels = img_to_array(pixels)
# split into satellite and map
sat_img, map_img = pixels[:, :256], pixels[:, 256:]
src_list.append(sat_img)
tar_list.append(map_img)
return [asarray(src_list), asarray(tar_list)]
I would like to convert it to pytorch, but I don't know much about it. Any suggestion?
I don't think you have anything to change but the very last line:
return [torch.stack(src_list), torch.stack(tar_list)]
Related
I would like to read all images found in a pdf file by PyMuPDF as opencv images, as close as they are from the source (avoiding funky format conversions that would lead to precision loss). Basically, I would like the result to be the exact same as if I was doing a cv2.imread(filename): (in terms of the type it outputs, color space, etc...)
# Libraries
import os
import cv2
import fitz
import numpy as np
# Input file
filename = "myfile.pdf"
# Read all images in file as a list of opencv images
def read_images(filename):
images = []:
_, extension = os.path.splitext(filename)
# If it's a pdf process each image
if (extension == ".pdf"):
pdf = fitz.open(file)
for index in range(len(pdf)):
page = pdf[index]
for im in page.getImageList():
xref = im[0]
pix = fitz.Pixmap(pdf, xref)
images.append(pix_to_opencv_image(pix)) # DO SOMETHING HERE
# Otherwise just do an imread
else:
images.append(cv2.imread(filename))
return images
Basically I would like to know what the function pix_to_opencv_image should be:
# Equivalent of doing a "cv2.imread" on a pdf pixmap:
def pix_to_opencv_image(pix):
# DO SOMETHING HERE
If found example explaining how to convert pdf pixmaps to numpy arrays, but nothing that outputs an opencv image.
How can I achieve this?
I used help() function to find the various data descriptors associated with it --> help(pix)
pix.samples stores the image information as bytes. Using numpy's frombuffer, the image array can be obtained from these bytes after reshaping accordingly.
pix.height and pix.width gives the height and width of the image array respectively. pix.n is the number of channels. These can be used for reshaping the resulting array.
Your complete function would be:
def pix_to_image(pix):
bytes = np.frombuffer(pix.samples, dtype=np.uint8)
img = bytes.reshape(pix.height, pix.width, pix.n)
return img
You can display the result using cv2.imshow().
I have several .jpeg images with different names, that I want to load into a cnn in a jupyter notebook to have them classified. The only way I found was:
test_image = image.load_img("name_of_picute.jpeg",target_size=(64,64))
test_image = image.img_to_array(test_image)
test_image = np.expand_dims(test_image, axis=0)
result = cnn.predict(test_image)
All the other things found at the Keras API like tf.keras.preprocessing.image_dataset_from_directory()seems to only work on labeled data. Sadly I can't "simply" iterate over the name of the pictures a they are named differently, is there a way to predict all of them at once without naming every single picture?
Thanks for yout help,
Nick
The solutiontf.keras.preprocessing.image_dataset_from_directory can be updated to return the dataset and the image_path as explained here -> https://stackoverflow.com/a/63725072/4994352
There are multiple ways, for larger data it is useful to use a tf.data.DataSet as it can be tweaked for performance quite easily. I will give you the non-performance-optimized code. Replace <YOUR PATH INCL. REGEX> with the path like ../input/pokemon-images-and-types/images/*/*.
import tensorflow as tf
from tensorflow.data.experimental import AUTOTUNE
def load(file_path):
img = tf.io.read_file(file_path)
img = tf.image.decode_jpeg(img, channels=3)
... # do some preprocessing like resizing if necessary
return img
list_ds = tf.data.Dataset.list_files(str('<YOUR PATH INCL. REGEX>'), shuffle=True) # Get all images from subfolders
train_dataset = list_ds.take(-1)
# Set `num_parallel_calls` so multiple images are loaded/processed in parallel.
train_dataset = train_dataset.map(load, num_parallel_calls=AUTOTUNE)
What I am doing here is encoding a image and then adding this into a list with the path of the original image in the database variable like this
database.append[path, encoding]
I then want to save this database variable into a pickle for use in other programs. how would I go about doing that as I have had no luck with saving the files correctly yet.
Any help would be appreciated.
Here is the method that I am using to generate the variables I want to save
def embedDatabase(imagePath, model, metadata):
#Get the metadata
#Perform embedding
# calculated by feeding the aligned and scaled images into the pre-trained network.
'''
#Go through the database and get the embedding for each image
'''
database = []
embedded = np.zeros((metadata.shape[0], 128))
print("Embedding")
for i, m in enumerate(metadata):
img = imgUtil.loadImage(m.image_path())
_,img = imgUtil.alignImage(img)
# scale RGB values to interval [0,1]
if img is not None:
img = (img / 255.).astype(np.float32)
#Get the embedding vectors for the image
embedded[i] = model.predict(np.expand_dims(img, axis=0))[0]
database.append([m.image_path(), embedded[i]])
#return the array of embedded images from the database
return embedded, database
And this is the load image method
def loadImage(path):
img = cv2.imread(path, 1)
if img is not None:
# OpenCV loads images with color channels
# in BGR order. So we need to reverse them
return img[...,::-1]
else:
pass
print("There is no Image avaliable")
Figured it out.
with open("database.pickle", "wb") as f:
pickle.dump(database, f, pickle.HIGHEST_PROTOCOL)
for some reason I needed the pickle.HIGHEST_PROTOCOL Thing
So After some image preprocessing I have gotten an image which holds 5 contours
(The image was resized for posting here in stackoverflow):
I'd like to remove all "islands" except for the actual letter,
So at first I tried using cv2.erode and cv2.dilate with all kinds of kernels sizes and it didn't do the job, so I decided to remove by masking all contours except the largest one by this:
_, cnts, _ = cv2.findContours(original, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_NONE)
I would expect according to the given image there would be 5 contours
areas = []
for contour in cnts:
area = cv2.contourArea(contour)
areas.append(area)
relevant_indexes = list(range(1, len(cnts)))
relevant_indexes.remove(areas.index(max(areas)))
mask = numpy.zeros(eroded.shape).astype(eroded.dtype)
color = 255
for i in relevant_indexes:
cv2.fillPoly(mask, cnts[i], color)
cv2.imwrite("mask.png", mask)
// Trying to mask out the noise
result = cv2.bitwise_xor(orifinal, mask)
cv2.imwrite("result.png", result)
But the mask I get is:
it's not what I would expect, and the left down contour is missing,
can someone PLEASE explain me what am I missing here? And what would be the correct approach for getting rid of those "isolated islands"?
Thank you all!
p.s
The original photo I'm working on:
Solution:
It sounds like you want to mask out the largest connected component (cv-speak for "island").
Here's an opencv/python script to do that:
#!/usr/bin/env python
import cv2
import numpy as np
import console
# load image in grayscale
img = cv2.imread("img.png", 0)
# get all connected components
_, output, stats, _ = cv2.connectedComponentsWithStats(img, connectivity=4)
# get a list of areas for each group label
group_areas = stats[cv2.CC_STAT_AREA]
# get the id of the group with the largest area (ignoring 0, which is the background id)
max_group_id = np.argmax(group_areas[1:]) + 1
# get max_group_id mask and save it as an image
max_group_id_mask = (output == max_group_id).astype(np.uint8) * 255
cv2.imwrite("output.png", max_group_id_mask)
Result:
Here's the result of the above script on your sample image:
I am trying to read 100 images into filenames[:100],and for every image
(1) crop
(2) Resize to 100 x 100
Convert all these images into np array.
and plot a montage of that.
I am quite new to python, please let me know if "plt.imread(fname)[...,3]" is nothing but appending.
thanks.
EDIT:
I want to create montage of 100 specific pictures.
so my code will be something like this:-
import os
import numpy as np
import matplotlib.pyplot as plt
from skimage.transform import resize
dirname = "Images/n02087046-toy_terrier"
filenames = [os.path.join(dirname, fname)
for fname in os.listdir(dirname)
if '.jpg' in fname]
filenames = filenames[:100]
assert(len(filenames) == 100)
# Read every filename as an RGB image
imgs = [plt.imread(fname)[...,:3] for fname in filenames]
# Crop
imgs = [utils.imcrop_tosquare(img_i) for img_i in imgs]
# Then resize the square image to 100 x 100 pixels
imgs = [resize(img_i, (100, 100)) for img_i in imgs]
imgs = np.array(imgs).astype(np.float32)
imgs.shape
plt.figure(figsize=(10, 10))
#montage is a utility function.
plt.imshow(utils.montage(imgs, saveto='dataset.png'))
so wrt "plt.imread(fname)[...,3]",
what does it do and how to decompose it to more understandable way.
Thanks.
The line
imgs = [plt.imread(fname)[...,:3] for fname in filenames]
creates a list of 100 images.
plt.imread(fname) reads an image and returns a numpy array. You may look at the shape of the array, which should be something like (n,m,3) or (n,m,4), where n and m are whole numbers.
The last axis of dimension 3 or 4 denotes the 3 colorchannels plus possibly a fouth channel which is the alpha (transparency).
The slicing [...,:3] is then equivalent to [:,:,:3], which translates in words into "Take all values of the first two dimensions and take the first three values of the third dimension". I.e. you neglect the alpha channel, if present. The reason for doing this is most probably that you want to combine different images here and not all may have an alpha channel. So by neglecting it, you make sure not to run into problems later on.
In this example, you only take jpeg images. Jpeg images do not have an alpha channel and thus you may actually use
imgs = [plt.imread(fname) for fname in filenames]
which should give you the same result.