I've been working on a sequential model which takes images as inputs. However, different thing is that the input images are actually determined by keys.
For example, the training sequence is (You may assume fi is frame id of a video)
{ f1, f2, f3, ..., fn }
and the corresponding image sequence is
{ M[f1], M[f2], M[f3], ..., M[fn] }
where M is a map storing {fi->image} mapping.
Suppose in the next batch, my training sequence become
{ f2, f3, ..., fn+1 }
and the image sequence becomes
{ M[f2], M[f3], M[f4], ..., M[fn+1] }
As you can see, if I directly save the image sequences into disk, there are lot of redundancies (in the above case, M[f2] to M[fn] are saved twice). So it seems necessary that the images are referenced by keys and thus imagedataloader class can not be used.
[EDIT]
My model is a 2-class classifier takes image sequences as input, in which the images are mapped with the frame id(fi). Whether an image sequence is positive or negative is pre-generated in my data_preprocess code.
Positive samples may look like this:
{f3, f4, f5, f6, f7} 1
{f4, f5, f6, f7, f8} 1
{f5, f6, f7, f8, f9} 1
...
While negative samples look like this:
{f1, f2, f3, f4, f5} 0
{f2, f3, f4, f5, f6} 0
{f10, f11, f12, f13, f14} 0
...
So, it is not like image classifying problem, where an image has exactly a fixed label. In my case, every image will be used many times and their being positive or negative are together determined by the whole sequence, but not itself.
[EDIT II]
The images are frames of N videos and are stored on disk like this:
|-data_root/
|-Video 1/
| |-frame_1_1.jpg
| |-frame_1_2.jpg
| ...
|-Video 2/
| |-frame_2_1.jpg
| |-frame_2_2.jpg
| ...
...
...
|-Video N/
| |-frame_N_1.jpg
| |-frame_N_2.jpg
...
What I'd like to do is, given two sequences of frames/images of scenes, the model predicts whether the two scenes are of the same kind.
Since a video may contain a long time span for each scene, I divide the whole sequence of a scene into a number of non-overlap sub-sequences (omit the indexes of videos):
Sequence of scene i: frame_1, frame_2, frame_3, ..., frame_n
Sub-sequence i_1: frame_1, frame_2, frame_3, ..., frame_10
Sub-sequence i_2: frame_11, frame_12, frame_13, ..., frame_20
Sub-sequence i_3: frame_21, frame_22, frame_23, ..., frame_30
...
Then, I randomly generate positive samples Pi (pairs of sub-sequences generated from the same sequence), like:
<Pair of sub-sequences> <Labels>
P1 {sub-sequence i_4, sub-sequence i_2}, 1
P2 {sub-sequence i_3, sub-sequence i_5}, 1
... ...
For negative samples, I generate pairs of sub-sequences (Ni) from different scenes:
<Pair of sub-sequences> <Labels>
N1 {sub-sequence i_1, sub-sequence j_6}, 0
N2 {sub-sequence i_2, sub-sequence j_4}, 0
... ...
It is obvious that one frame/image can occur multiple times in different training samples. E.g. in the above case, both N2 and P1 contain sub-sequence i_2. So I choose to save the generated sample pairs by sequences of frame id(fi) and during training, fetch the corresponding frames/images of a sequence by frame id(fi).
How should I do it elegantly with Keras?
Not sure how you build your sequences but have you considered using the ImageDataGenerator from keras.preprocessing.image ?
Once you have built this object with whatever parameters you want, you can use the flow_from_directory(directory_path) method. Once you have done this, you can use the filename attribute of this object :
my_generator = ImageDataGenerator(...)
my_generator.flow_from_directory(path_dir)
list_of_file_names = my_generator.filename
you now have a mapping between indexes of the list and the elements(=file_paths) of the list.
I hope this helps?
EDIT :
From this, you can build a mapping a dictionnary
map_images = {str(os.path.splitext(os.path.split(file_path)[1])[0]): file_path for file_path in list_of_file_names}
This takes the file_path retrieved from you image folder using ImageDataGenerator, it extracts the file name, removes the file extension and transforms the name of the file into an string which is your frame_id.
You now have a map between frame_id and file_path that you can use with load_img() and img_to_array() from keras.preprocessing.image
the function load_img() is defined like this and returns a PIL image instance:
def load_img(path, grayscale=False, target_size=None):
"""Loads an image into PIL format.
# Arguments
path: Path to image file
grayscale: Boolean, whether to load the image as grayscale.
target_size: Either 'None' (default to original size)
or tuple of ints '(img_height, img_width)'.
# Returns
A PIL Image instance.
# Raises
ImportError: if PIL is not available.
"""
Then img_to_array() is defined like this and returns a 3D numpy array to feed your model:
def img_to_array(img, dim_ordering='default'):
"""Converts a PIL Image instance to a Numpy array.
# Arguments
img: PIL Image instance.
dim_ordering: Image data format.
# Returns
A 3D Numpy array.
# Raises
ValueError: if invalid 'img' or 'dim_ordering' is passed.
"""
So to summarize : 1 build a mapping between your frame_id and the path of the corresponding file. Then load the file using img_load() and img_to_array(). I hope I have understood your question correctly !
EDIT 2:
Seeing your new edit, now that I understand the structure of your file system, we can even add the video in your dictionary like this :
# list of video_id of each frame
videos = my_generator.classes
# mapping of the frame_id to path_of_file and vid_id
map_images = {str(os.path.splitext(os.path.split(file_path)[1])[0]): (file_path, vid_id) for file_path,vid_id in zip(list_of_file_names,videos) }
Related
I think some of my question is answered here:1
But the difference that I have is that I'm wondering if it is possible to do the slicing step without having to re-write the datasets to another file first.
Here is the code that reads in a single HDF5 file that is given as an argument to the script:
with h5py.File(args.H5file, 'r') as df:
print('Here are the keys of the input file\n', df.keys())
#interesting point here: you need the [:] behind each of these and we didn't need it when
#creating datasets not using the 'with' formalism above. Adding that even handled the cases
#in the 'hits' and 'truth_hadrons' where there are additional dimensions...go figure.
jetdset = df['jets'][:]
haddset = df['truth_hadrons'][:]
hitdset = df['hits'][:]
Then later I do some slicing operations on these datasets.
Ideally I'd be able to pass a wild-card into args.H5file and then the whole set of files, all with the same data formats, would end up in the three datasets above.
I do not want to store or make persistent these three datasets at the end of the script as the output are plots that use the information in the slices.
Any help would be appreciated!
There are at least 2 ways to access multiple files:
If all files follow a naming pattern, you can use the glob
module. It uses wildcards to find files. (Note: I prefer
glob.iglob; it is an iterator that yields values without creating a list. glob.glob creates a list which you frequently don't need.)
Alternatively, you could input a list of filenames and loop on
the list.
Example of iglob:
import glob
for fname in glob.iglob('img_data_0?.h5'):
with h5py.File(fname, 'r') as h5f:
print('Here are the keys of the input file\n', h5.keys())
Example with a list of names:
filenames = [ 'img_data_01.h5', 'img_data_02.h5', 'img_data_03.h5' ]
for fname in filenames:
with h5py.File(fname, 'r') as h5f:
print('Here are the keys of the input file\n', h5.keys())
Next, your code mentions using [:] when you access a dataset. Whether or not you need to add indices depends on the object you want returned.
If you include [()], it returns the entire dataset as a numpy array. Note [()] is now preferred over [:]. You can use any valid slice notation, e.g., [0,0,:] for a slice of a 3-axis array.
If you don't include [:], it returns a h5py dataset object, which
behaves like a numpy array. (For example, you can get dtype and shape, and slice the data). The advantage? It has a smaller memory footprint. I use h5py dataset objects unless I specifically need an array (for example, passing image data to another package).
Examples of each method:
jets_dset = h5f['jets'] # w/out [()] returns a h5py dataset object
jets_arr = h5f['jets'][()] # with [()] returns a numpy array object
Finally, if you want to create a single array that merges values from 3 datasets, you have to create an array big enough to hold the data, then load with slice notation. Alternatively, you can use np.concatenate() (However, be careful, as concatenating a lot of data can be slow.)
A simple example is shown below. It assumes you know the shape of the dataset, and they are the same for all 3 files. (a0, a1 are the axes lengths for 1 dataset) If you don't know them, you can get them from the .shape attribute
Example for method 1 (pre-allocating array jets3x_arr):
a0, a1 = 100, 100
jets3x_arr = np.empty(shape=(a0, a1, 3)) # add dtype= if not float
for cnt, fname in enumerate(glob.iglob('img_data_0?.h5')):
with h5py.File(fname, 'r') as h5f:
jets3x_arr[:,:,cnt] = h5f['jets']
Example for method 2 (using np.concatenate()):
a0, a1 = 100, 100
for cnt, fname in enumerate(glob.iglob('img_data_0?.h5')):
with h5py.File(fname, 'r') as h5f:
if cnt == 0:
jets3x_arr= h5f['jets'][()].reshape(a0,a1,1)
else:
jets3x_arr= np.concatenate(\
(jets3x_arr, h5f['jets'][()].reshape(a0,a1,1)), axis=2)
I need to calculate the mean average precision (mAP) of specific keypoints (and not for all keypoints, as it done by default).
Here's my code :
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
# https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb
cocoGt = COCO('annotations/person_keypoints_val2017.json') # initialize COCO ground truth api
cocoDt = cocoGt.loadRes('detections/results.json') # initialize COCO pred api
cat_ids = cocoGt.getCatIds(catNms=['person'])
imgIds = cocoGt.getImgIds(catIds=cat_ids)
cocoEval = COCOeval(cocoGt, cocoDt, 'keypoints')
cocoEval.params.imgIds = imgIds
cocoEval.evaluate()
cocoEval.accumulate()
cocoEval.summarize()
print(cocoEval.stats[0])
This code prints the mAP for all keypoints ['nose', ...,'right_ankle'] but I need only for few specific keypoints like ['nose', 'left_hip', 'right_hip']
I recently solved this and evaluated only the 13 key points, leaving behind the eyes and the ears as per my application.
Just open the cocoeval.py under pycocotools, then head over to the computeOKS function, where you will encounter two sets of keypoints—ground truth keypoints—and detection keypoints, such as a NumPy array.
Make sure to do proper slicing for that 51 array size Python lists.
For example, if you wish to only check the mAP for nose, the slicing would be as follows:
g= np.array(gt['keypoints'][0:3])
Similarly, do it for a dt array.
Also, set the sigma values of those unwanted key points to 0.
You are all set!
I have the following images at some specific path:
0.png
1.png
2.png
3.png
4.png
5.png
6.png
7.png
8.png
9.png
10.png
11.png
12.png
13.png
14.png
15.png
16.png
17.png
18.png
19.png
20.png
21.png
22.png
23.png
24.png
25.png
26.png
And I want to use the following program (which is located at the same specific path the images are located) to convert those images to a single gif file
from PIL import Image
import glob
# Create the frames
frames = []
imgs = glob.glob("*.png")
for i in imgs:
new_frame = Image.open(i)
frames.append(new_frame)
print(i)
# Save into a GIF file that loops forever
frames[0].save('png_to_gif.gif', format='GIF',
append_images=frames[1:],
save_all=True,
duration=160, loop=0)
After running the program above I got this output:
0.png
1.png
10.png
11.png
12.png
13.png
14.png
15.png
16.png
17.png
18.png
19.png
2.png
20.png
21.png
22.png
23.png
24.png
25.png
26.png
3.png
4.png
5.png
6.png
7.png
8.png
9.png
How can I assure that the order in which the program will append the images will be the same order in which they were named?
Computer does not sort alfabetically but lexicographically.
"10" < "2", cause program looks on signs from left to right (first compare 1 with 2 if it is equal it stops and yell that 10 is smaller).
that's why i jumps like that.
It would be better if every file have the same length (001.png, 002.png etc). If you prefer not to do that, you can sort them remembering that length is crucial (first check length, then if number1 < number2)
The filenames you're getting appear to be ordered lexicographically, as strings. That's why longer strings that start with small numerals sort ahead of shorter strings with larger numbers. It's just like how ah sorts ahead of b in alphabetical order.
You want the numerical part of the string to be ordered numerically, so that 2 comes before 10. To do that, you'll need to apply your own sorting logic.
Try something like this:
import re # put this at the top of the file
def keyfunc(filename):
prefix, numeral, suffix = re.split(r'(\d+)', filename, maxsplit=1)
return prefix, int(numeral), suffix
imgs = glob.glob("*.png")
for i in sorted(imgs, key=keyfunc):
...
An alternative solution might be to rename the files that have only one digit in their names to have a leading zero. That is, 1.png should become 01.png and so on. That way the sorting that's being applied (by either glob.glob, or your filesystem) will order everything correctly (because 02 correctly sorts before 10 even lexicographically).
I decided to modify the code this way, it seems to work for the moment, if there's a way in which I can just create an array to store only the filenames that end with .png for a given path, it's welcome:
from PIL import Image
# Create the frames
frames = []
imgs = ["0.png", "1.png", "2.png", "3.png", "4.png", "5.png", "6.png", "7.png", "8.png", "9.png", "10.png", "11.png", "12.png", "13.png", "14.png", "15.png", "16.png", "17.png", "18.png", "19.png", "20.png", "21.png", "22.png", "23.png", "24.png", "25.png", "26.png"]
for i in imgs:
new_frame = Image.open(i)
frames.append(new_frame)
print(i)
# Save into a GIF file that loops forever
frames[0].save('png_to_gif.gif', format='GIF',
append_images=frames[1:],
save_all=True,
duration=180, loop=0)
I am looking for way to encrypt images with keys, first input should be image like this:
Second input look like this:
key = "01010100 01110010 10101110 01110001 10110000 01100001 01010010 10110001 10110011 10110011 10000000 01011100 01010010 01011100 11010011 00011000 10100100"(or i can convert this to another type but this is my raw data)
And after xor image^key output should look like this:
What i've tried so far ?
import cv2
import numpy as np
demo = cv2.imread("koala.jpeg")
key = "01010100 01110010 10101110 01110001 10110000 01100001 01010010 10110001 10110011 10110011 10000000 01011100 01010010 01011100 11010011 00011000 10100100"
r, c, t = demo.shape
encryption = cv2.bitwise_xor(demo, key) # encryption
decryption = cv2.bitwise_xor(encryption, key) # decryption
cv2.imshow("encryption", encryption) # Display ciphertext image
cv2.imshow("decryption", decryption) # Display the decrypted image
cv2.waitKey(-1)
cv2.destroyAllWindows()
Here is the output:
Traceback (most recent call last):
File "/Users/kemal/Documents/Python/pyImage/xor.py", line 10, in <module>
encryption = cv2.bitwise_xor(demo, key) # encryption
TypeError: Expected Ptr<cv::UMat> for argument 'src2'
My question is, what is the proper way to xor my image and my key?
Now i've achieve success encryption with two images but when want to encrypt my image with a key its not working.
Actually i know this is the wrong way to do this encryption, maybe i can try to convert my key to image but i believe there is a proper way to do this, and i want to learn. Thank you for any kind of help.
If this library is not a proper way to do this encryption i can also change library and use different library there is no restrictions. Only important thing is input and outputs for me.
Thank you for any kind of help.
The following Python program:
displays the original image,
displays the encrypted image after pressing a key,
displays the decrypted image after pressing a key,
deletes all images after pressing a key.
Please note, that depending on the size of the image, each encryption and decryption can take a few seconds.
import cv2
import numpy as np
from numpy import random
# Load original image
demo = cv2.imread(<path to jpeg>)
r, c, t = demo.shape
# Display original image
cv2.imshow("Original image", demo)
cv2.waitKey()
# Create random key
key = random.randint(256, size = (r, c, t))
# Encryption
# Iterate over the image
encrypted_image = np.zeros((r, c, t), np.uint8)
for row in range(r):
for column in range(c):
for depth in range(t):
encrypted_image[row, column, depth] = demo[row, column, depth] ^ key[row, column, depth]
cv2.imshow("Encrypted image", encrypted_image)
cv2.waitKey()
# Decryption
# Iterate over the encrypted image
decrypted_image = np.zeros((r, c, t), np.uint8)
for row in range(r):
for column in range(c):
for depth in range(t):
decrypted_image[row, column, depth] = encrypted_image[row, column, depth] ^ key[row, column, depth]
cv2.imshow("Decrypted Image", decrypted_image)
cv2.waitKey()
cv2.destroyAllWindows()
After loading, the original image is stored in a 3D array with r rows, c columns and t color values between 0 and 255.
Next, a key is generated that corresponds to an identically sized array whose elements are randomly generated.
For encryption, it is iterated over the original image and each value of the image array ix XORed with the corresponding value of the key array.
For decryption, an analogous iteration is performed over the encrypted image and each value of the encrypted image array is XORed with the corresponding value of the key array.
Please note that this logic is not to be considered as the right way, but as a possible one. It is based on a one-time-pad encryption, which is information-theoretically secure when used as specified. However, this is not fulfilled here, since the key would have to be chosen truly randomly, whereas here it is chosen pseudo randomly. For a one-time-pad, the key has to be as long as the message, i.e. the key array has to be as large as the image data. In principle, other algorithms can be used as well, e.g. AES, wich is however more complex to implement (with regard to padding, IV etc.).
Note also that when storing the encrypted image, a format must be used that does not change the image data (which is actually the ciphertext). Otherwise, after loading the encrypted image, decryption may be incorrect or even impossible, depending on the algorithm. A format that generally compresses the data and thus changes the data is e.g. jpg, a format that does not change the data is e.g. bmp, see e.g. here for more details. The format can be controlled by the file extension when saving with imwrite.
I'm crawling across a folder of WAV files, with each file having the same sample-rate but different lengths. I'm loading these using Librosa and computing a range of spectral features on them. This results in arrays of different sizes due to the differing durations. Trying to then concatenate all of these arrays fails - obviously because of their different shapes, for example:
shape(1,2046)
shape(1,304)
shape(1,154)
So what I've done is before loading the files I use librosa to get the duration of each file and pack it into a list.
class GetDurations:
def __init__(self, files, samplerate):
list = []
self.files = files
self.sampleRate = samplerate
for file in self.files:
list.append(librosa.get_duration(filename=file, sr=44100))
self.maxFileDuration = np.max(list)
Then I get the maximum value of the list, to get the maximum possibly length of my array, and convert it to frames (which is what the spectral extraction features of Librosa work with)
self.maxDurationInFrames = librosa.time_to_frames(self.getDur.maxFileDuration,
sr=44100,hop_length=512) + 1
So now I've got a value that I know will account for the longest duration of my input files. I just need to initialise my array with this length.
allSpectralCentroid = np.zeros((1, self.maxDurationInFrames))[1:]
This gives me an empty container for all of my extracted spectral centroid data for all WAV files in the directory. In order to add data to this array I later on do the following:
padValue = allSpectralCentroid.shape[1] - workingSpectralCentroid.shape[1]
workingSpectralCentroid = np.pad(workingSpectralCentroid[0], ((0, padValue)), mode='constant')[np.newaxis]
allSpectralCentroid = np.append(allSpectralCentroid, workingSpectralCentroid, axis=0)
This subtracts the length of the 'working' array from the 'all' array to get me a pad value. It then pads the working array with zeros to make it the same length as the all array. Finally it then appends the two (joining them together) and assigns this to the 'all' variable.
So....
My question is - is there a more efficient way to do this?
Bonus question - How do I do this when I 100% can never know the length required??