I would like to read all images found in a pdf file by PyMuPDF as opencv images, as close as they are from the source (avoiding funky format conversions that would lead to precision loss). Basically, I would like the result to be the exact same as if I was doing a cv2.imread(filename): (in terms of the type it outputs, color space, etc...)
# Libraries
import os
import cv2
import fitz
import numpy as np
# Input file
filename = "myfile.pdf"
# Read all images in file as a list of opencv images
def read_images(filename):
images = []:
_, extension = os.path.splitext(filename)
# If it's a pdf process each image
if (extension == ".pdf"):
pdf = fitz.open(file)
for index in range(len(pdf)):
page = pdf[index]
for im in page.getImageList():
xref = im[0]
pix = fitz.Pixmap(pdf, xref)
images.append(pix_to_opencv_image(pix)) # DO SOMETHING HERE
# Otherwise just do an imread
else:
images.append(cv2.imread(filename))
return images
Basically I would like to know what the function pix_to_opencv_image should be:
# Equivalent of doing a "cv2.imread" on a pdf pixmap:
def pix_to_opencv_image(pix):
# DO SOMETHING HERE
If found example explaining how to convert pdf pixmaps to numpy arrays, but nothing that outputs an opencv image.
How can I achieve this?
I used help() function to find the various data descriptors associated with it --> help(pix)
pix.samples stores the image information as bytes. Using numpy's frombuffer, the image array can be obtained from these bytes after reshaping accordingly.
pix.height and pix.width gives the height and width of the image array respectively. pix.n is the number of channels. These can be used for reshaping the resulting array.
Your complete function would be:
def pix_to_image(pix):
bytes = np.frombuffer(pix.samples, dtype=np.uint8)
img = bytes.reshape(pix.height, pix.width, pix.n)
return img
You can display the result using cv2.imshow().
I have the following code in keras:
# load all images in a directory into memory
def load_images(path, size=(256,512)):
src_list, tar_list = list(), list()
# enumerate filenames in directory, assume all are images
for filename in listdir(path):
# load and resize the image
pixels = load_img(path + filename, target_size=size)
# convert to numpy array
pixels = img_to_array(pixels)
# split into satellite and map
sat_img, map_img = pixels[:, :256], pixels[:, 256:]
src_list.append(sat_img)
tar_list.append(map_img)
return [asarray(src_list), asarray(tar_list)]
I would like to convert it to pytorch, but I don't know much about it. Any suggestion?
I don't think you have anything to change but the very last line:
return [torch.stack(src_list), torch.stack(tar_list)]
I used to use scipy which would load an image from file straight into an ndarray.
from scipy import misc
img = misc.imread('./myimage.jpg')
type(img)
>>> numpy.ndarray
But now it gives me a DeprecationWarning and the docs say it will be removed in 1.2.0. and I should use imageio.imread instead. But:
import imageio
img = imageio.imread('./myimage.jpg')
type(img)
>>> imageio.core.util.Image
I could convert it by doing
img = numpy.array(img)
But this seems hacky. Is there any way to load an image straight into a numpy array as I was doing before with scipy's misc.imread (other than using OpenCV)?
The result of imageio.imread is already a NumPy array; imageio.core.util.Image is an ndarray subclass that exists primarily so the array can have a meta attribute holding image metadata.
If you want an object of type exactly numpy.ndarray, you can use asarray:
array = numpy.asarray(img)
Unlike numpy.array(img), this will not copy img's data.
If it was a bitmap or even jpeg, you can do:
import matplotlib.pyplot as plt
import numpy as np
# 'pip install pillow' but import PIL
from PIL import Image
png_filepath = 'somepng.png'
png_pil_img = Image.open(png_filepath)
# this will print info about the PIL object
print(png_pil_img.format, png_pil_img.size, png_pil_img.mode)
png_np_img = np.asarray(png_pil_img)
plt.imshow(png_np_img) # this will graphit in a jupyter notebook
# or if its grayscale plt.imshow(png_np_img, cmap='gray')
# FWIW, this will show the np characteritics
print("shape is ", png_np_img.shape)
print("dtype is ", png_np_img.dtype)
print("ndim is ", png_np_img.ndim)
print("itemsize is ", png_np_img.itemsize) # size in bytes of each array element
print("nbytes is ", png_np_img.nbytes) # size in bytes of each array element
If you have a jpg, it works the same. PIL.image will decode the compressed JPG, and convert it to an array for you. Literally it will do all this for you. Perhaps you could load the raw bitmap with file io skipping the header, yadda yadda, but PIL is popular for a reason.
The output for a grayscale png will look like this:
PNG (3024, 4032) L
shape is (4032, 3024)
dtype is uint8
ndim is 2
itemsize is 1
nbytes is 12192768
The output for a color jpeg will look like this:
JPEG (704, 480) RGB
shape is (480, 704, 3)
dtype is uint8
ndim is 3
itemsize is 1
nbytes is 1013760
In either case, the pixel values range 0-255 as ints. They are not floats. The color image has three channels corresponding to red green and blue. The grayscale image is much greater resolution and the jpg.
I am extracting some features from an audio file and save them in a list and then saving a list in hdf5 file but it cause an error. Previously I am directly saving features in a hdf5 file but it just overwrite all the values and save only the last one.
ampList = []
mffcslist = []
centroidlist = []
i = 0
ampList.append(Xdb) # saving extracted feature in a list
mffcslist.append(mfccs)
centroidlist.append(spectral_centroids)
with h5py.File('C:/Users/Aweem Ashar/Desktop/feature.h5', 'a') as f:
f.close()
for i in range(len(audio_path)):
#print(ampList[i])
f.create_dataset("amplitude", data=ampList[i])
f.create_dataset("MffC", data=mffcslist[i])
f.create_dataset("spectral", data=centroidlist[i])
# plt.show() # To view Wave graph
I didn't look at your code that closely when I wrote my comment. I just realized you are loading your list data one element at a time. There are much better/faster ways to do it with Numpy arrays. I don't know what kind of data you're working with, so created a very simple example with a few floats in ampList. I use np.asarray() to convert the list to a Numpy array and load into the dataset in 1 shot. Much easier and compact. This method (with np.asarray()) will work for any list with elements of a common type (all floats or all ints).
My simple example:
import h5py
import numpy as np
ampList = [ 20., 11., 33., 40., 100. ]
with h5py.File('SO_58092765.h5','w') as h5f:
h5f.create_dataset("amplitude", data=np.asarray(ampList) )
A Better Approach:
The example above addresses your basic question (how to copy the list data into a HDF dataset). However, I think there is a better approach for your scenario. I assume you have amplitude, MffC, and spectral data for each and every audio file, AND it would be convenient to have that data associated with the audio file name. If so, that's where HDF5 and mixed format datatypes are so powerful.
I created a second example (below) to show how you can save mixed data in a single dataset. I assumed the following datatypes (to make the example interesting):
Audio file name: String
amplitude: Float
MffC: Integer
Spectral (centroid): Float array of shape (3,)
This example creates 2 HDF5 files:
SO_58092765_3ds.h5: saves each List as a separate dataset.
SO_58092765_1ds.h5: saves all List data in a single dataset, with each List written to a separate Field/Column.
The second method uses a Numpy datatype (dtype) to define the name and datatype of each column of data in the HDF5 dataset. The dtype is then used to create an empty dataset. Each List is written to the dataset by referencing the field name.
Second example:
import h5py
import numpy as np
fileList = [ 'audio1.mp3', 'audio2.mp3', 'audio11.mp3', 'audio21.mp3','audio22.mp3' ]
ampList = [ 20., 11., 33., 40., 100. ]
mffcslist = [ 12, 8, 9, 14, 33 ]
centroidlist = [ (0.,0.,0.), (1.,0.,0.),
(0.,1.,0.), (0.,1.,0.),
(1.,1.,1.),]
# create SO_58092765_3ds.h5:
with h5py.File('SO_58092765_3ds.h5','w') as h5f:
h5f.create_dataset("amplitude", data=np.asarray(ampList) )
h5f.create_dataset("MffC", data=np.asarray(mffcslist) )
h5f.create_dataset("spectral", data=np.asarray(centroidlist) )
# create SO_58092765_1ds.h5 with ds_dtype:
ds_dtype = np.dtype( [("audiofile",'S20'), ("amplitude",float),
("MffC",int), ("spectral",float, (3,)) ] )
with h5py.File('SO_58092765_1ds.h5','w') as h5f:
ds = h5f.create_dataset("test_data", shape=(len(ampList),), dtype=ds_dtype )
ds['audiofile'] = np.asarray(fileList)
ds['amplitude'] = np.asarray(ampList)
ds['MffC'] = np.asarray(mffcslist)
ds['spectral'] = np.asarray(centroidlist)
import matplotlib.pyplot as plt
from glob import glob
import librosa as lb
import sklearn
import librosa.display
import librosa
import h5py
import numpy as np
dir = 'C:\\Users\\Aweem Ashar\\Desktop\\recordingd'
audio_path = glob(dir + '/*.wav')
ampList = []
mffcslist = []
centroidlist = []
for file in range(0, len(audio_path)):
x, sr = lb.load(audio_path[file])
print(type(x), type(sr))
librosa.display.waveplot(x, sr=sr)
X = librosa.stft(x)
Xdb = librosa.amplitude_to_db(abs(X))
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
spectral_centroids = librosa.feature.spectral_centroid(x, sr=sr)[0]
spectral_centroids.shape
frames = range(len(spectral_centroids))
t = librosa.frames_to_time(frames)
def normalize(x, axis=0):
return sklearn.preprocessing.minmax_scale(x, axis=axis)
librosa.display.waveplot(x, sr=sr, alpha=0.4)
plt.plot(t, normalize(spectral_centroids), color='r')
mfccs = librosa.feature.mfcc(x, sr=sr)
print(mfccs.shape)
librosa.display.specshow(mfccs, sr=sr, x_axis='time')
ampList.append(Xdb)
mffcslist.append(mfccs)
centroidlist.append(spectral_centroids)
with h5py.File('C:/Users/Aweem Ashar/Desktop/feature.h5', 'w') as f:
f.create_dataset("amplitude", data=np.asarray(ampList))
f.create_dataset("MFCC", data=np.asarray(mffcslist))
f.create_dataset("SpectralCentroid", data=np.asarray(centroidlist))
Aweem, I'm not familiar with librosa or sklearn so can't debug all your code. When working with something new, use a minimally complete verifiable example (MCVE) to confirm behavior with simple data sets. They are much easier to diagnose.
To do that, I simplified the for loop in your second post; reorganizing and removing what I thought was unnecessary. Also, you don't need to loop over all the images. Change the glob() call to get 1 (or a few) images. The shape of Xdb (saved to ampList) should show why Numpy asarray() tries to broadcast the shape (and why you get an error). If not, post the output for review.
Finally, you should add a comment to create_dataset("amplitude") to verify that the other 2 create_dataset() calls work. Good luck.
dir = 'C:\\Users\\Aweem Ashar\\Desktop\\recordingd'
# change this to get 1 wav file:
audio_path = glob(dir + '/*.wav')
ampList = []
mffcslist = []
centroidlist = []
for file in range(0, len(audio_path)):
x, sr = lb.load(audio_path[file])
print(x, sr)
X = librosa.stft(x)
Xdb = librosa.amplitude_to_db(abs(X))
print (Xdb.shape)
ampList.append(Xdb)
spectral_centroids = librosa.feature.spectral_centroid(x, sr=sr)[0]
print (spectral_centroids.shape)
centroidlist.append(spectral_centroids)
mfccs = librosa.feature.mfcc(x, sr=sr)
print (mfccs.shape)
mffcslist.append(mfccs)
with h5py.File('C:/Users/Aweem Ashar/Desktop/feature.h5', 'w') as f:
f.create_dataset("amplitude", data=np.asarray(ampList))
f.create_dataset("MFCC", data=np.asarray(mffcslist))
f.create_dataset("SpectralCentroid", data=np.asarray(centroidlist))
In a directory (A), there exist 4 subdirectories (a,b,c,d), each of which contains 5 gray scale MRI images.
I could create a 3 dimensional ndarray in a subdirectory (a) by the following code.
path = "./A/a"
list = []
for root, dirs, files in os.walk(path):
for name in files:
list.append(os.path.join(root, name))
list.sort()
refds = pydicom.dcmread(list[0])
constpixeldims = (int(refds.Rows), int(refds.Columns), len(list))
arraydicom = numpy.zeros(constpixeldims,dtype=refds.pixel_array.dtype)
for namedcm in list:
ds = pydicom.dcmread(namedcm)
arraydicom[:,:,list.index(namedcm)] = ds.pixel_array
However, I want to create a 4 dimensinal ndarray with all 4 subdirectories.
By the 4 dimensional ndarray, I want to perform neural network analysis of MRI images.
[Another Question]
By the following code, I could create a 3 dimensional array.
for namedcm in dcmlist[0]:
ds = pydicom.dcmread(namedcm)
arraydcm[:,:,dcmlist[0].index(namedcm)] = ds.pixel_array
Is it possible to create 4 dimensional array by changing of the nubmer of dcmlist from 0 to 3?
You can do something like that:
import glob
import cv2
import numpy as np
import os
DIR = './A'
images = []
for sub_dir in os.listdir(DIR):
print(sub_dir)
images.append([cv2.imread(img) for img in glob.glob(DIR+os.sep+sub_dir+os.sep+'*')])
print(np.array(images).shape)
You'll get the following shape (if all the images are 160x160): (4,5,160,160).
If you want, for some reason, the shape (4,160,160,5), you can use numpy.rollaxis:
images = np.rollaxis(np.array(images),1,4)
#images.shape -> (4,160,160,5)