How to identify shot boundary of a video using a threshold value in python opencv? - python-3.x

I want to implement code for shot boundary detection. The difference measure is the sum of the absolute bin-wise histogram differences. A shot boundary is
declared if the histogram difference between consecutive frames exceeds a threshold.
But i was unable to implement it.
It would be great if anyone can help me on this.

You can use cv2.calHist:
hist = cv2.calcHist([img],[0],None,[256],[0,256])
to calculate the histograms to the current and previous frames
Since the histograms are of the same size, you can take the absolute value of the difference(numpy.abs()), and then take the sum (np.sum()) to calculate the difference measure.
Caution: in practice, it takes a lot of experimenting to come up with the right bin count and the threshold for shot detection.

Related

Identify short signal peaks with rolling time frame in python

I am trying to identify all peaks from my sensor readings data. The smallest peak can be lesser than 10 amplitude and largest can be more than 400 amplitude. The rolling time window is not fixed as one peak can arrive in 6 hours vs second one in another 3 hours. I tried wavelet transform and python peak identification but that is only working for higher peaks. How do I resolve this? Here is signal image link, all peaks in Grey color I am identifying and in blue is my algorithm
Welcome to SO.
It is hard to provide you with a detailed answer without knowing your data's sampling rate and the duration of the peaks. From what I see in your example image they seem all over the place!
I don't think that wavelets will be of any use for your problem.
A recipe that I like to use to despike data is:
Smooth your input data using a median filter (a 11 points median filter generally does the trick for me): smoothed=scipy.signal.medfilt(data, window_len=11)
Compute a noise array by subtracting smoothed from data: noise=data-smoothed
Create a despiked_data array from data:
despiked_data=np.zeros_like(data)
np.copyto(despiked_data, data)
Then every time the noise exceeds a user defined threshold (mythreshold), replace the corresponding value in despiked_data with nan values: despiked_data[np.abs(noise)>mythreshold]=np.nan
You may later interpolate the output despiked_data array but if your intent is simply to identify the spikes, you don't even need to run this extra step.

How to compare images and determine which has more content?

Goal: I want to grab the best frame from an animated GIF and use it as a static preview image. I believe the best frame is one that shows the most content - not necessarily the first or last frame.
Take this GIF for example:
--
This is the first frame:
--
Here is the 28th frame:
It's clear that frame 28th represents the entire GIF well.
How could I programmatically determine if one frame has more pixel/content over another? Any thoughts, ideas, packages/modules, or articles that you can point me to would be greatly appreciated.
One straightforward way this could be accomplished would be to estimate the entropy of each image and choose the frame with maximal entropy.
In information theory, entropy can be thought of as the "randomness" of the image. An image of a single color is very predictable, the flatter the distribution, the more random. This is highly related to the compression method described by Arthur-R as entropy is the lower bound on how much data can be losslessly compressed.
Estimating Entropy
One way to estimate the entropy is to approximate the probability mass function for pixel intensities using a histogram. To generate the plot below I first convert the image to grayscale, then compute the histogram using a bin spacing of 1 (for pixel values from 0 to 255). Then, normalize the histogram so that the bins sum to 1. This normalized histogram is an approximation of the pixel probability mass function.
Using this probability mass function we can easily estimate the entropy of the grayscale image which is described by the following equation
H = E[-log(p(x))]
Where H is entropy, E is the expected value, and p(x) is the probability that any given pixel takes the value x.
Programmatically H can be estimated by simply computing -p(x)*log(p(x)) for each value p(x) in the histogram and then adding them together.
Plot of entropy vs. frame number for your example.
with frame 21 (the 22nd frame) having the highest entropy.
Observations
The entropy computed here is not equal to the true entropy of the
image because it makes the assumption that each pixel is independently sampled from the same distribution. To get the true entropy we would need to know
the joint distribution of the image which we won't be able to know without
understanding the underlying random process that generated the images
(which would include human interaction). However, I don't think the true entropy would be very useful and this measure should
give a reasonable estimate of how much content is in the image.
This method will fail if some not-so-interesting frame
contains much more noise (randomly colored pixels) than the most
interesting frame because noise results in a high entropy. For example, the
following image is pure uniform noise and therefore has maximum entropy (H = 8 bits), i.e. no compression is possible.
Ruby Implementation
I don't know ruby but it looks like one of the answers to this question refers to a package for computing entropy of an image.
From m. simon borg's comment
FWIW, using Ruby's File.size() returns 1904 bytes for the 28th frame
image and 946 bytes for the first frame image – m. simon borg
File.size() should be roughly proportional to entropy.
As an aside, if you check the size of the 200x200 noise image on disk you will see that the file is 40,345 bytes even after compression, but the uncompressed data is only 40,000 bytes. Information theory tells us that no compression scheme can ever losslessly compress such images on average.
There are a couple ways I might go about this. My first thought (this may not be the most practical solution, but it seems theoretically interesting!) would be to try losslessly compressing each frame, and in theory, the frame with the least repeatable content (and thus the most unique content) would have the largest size, so you could then compare the size in bytes/bits of each compressed frame. The accuracy of this solution would probably be highly dependent on the photo passed in.
A more realistic/ practical solution might be to grab the predominant color in the GIF (so in the example, the background color), and then iterate through each pixel and increment a counter each time the color of the current pixel doesn't match the color of the background.
I'm thinking about some more optimized/ sample based solutions, and will edit my response to include them a little later, if performance is a concern for you.
I think that you can choose an API such as Restful Web Service for do that because without it that's so hard.
For example,these are some famous API's:
https://cloud.google.com/vision/
https://www.clarifai.com/
https://vize.ai
https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/
https://imagga.com

A method to find the inconsistency or variation in the data

I am running an experiment (it's an image processing experiment) in which I have a set of paper samples and each sample has a set of lines. For each line in the paper sample, its strength is calculated which is denoted by say 's'. For a given paper sample I have to find the variation amongst the strength values 's'. If the variation is above a certain limit, we have to discard that paper.
1) I started with the Standard Deviation of the values, but the problem I am facing is that for each sample, order of magnitude for s (because of various properties of line like its length, sharpness, darkness etc) might differ and also the calculated Standard Deviations values are also differing a lot in magnitude. So I can't really use this method for different samples.
Is there any way where I can find that suitable limit which can be applicable for all samples.
I am thinking that since I don't have any history of how the strength value should behave,( for a given sample depending on the order of magnitude of the strength value more variation could be tolerated in that sample whereas because the magnitude is less in another sample, there should be less variation in that sample) I first need to find a way of baselining the variation in different samples. I don't know what approaches I could try to get started.
Please note that I have to tell variation between lines within a sample whereas the limit should be applicable for any good sample.
Please help me out.
You seem to have a set of samples. Then, for each sample you want to do two things: 1) compute a descriptive metric and 2) perform outlier detection. Both of these are vast subjects that require some knowledge of the phenomenology and statistics of the underlying problem. However, below are some ideas to get you going.
Compute a metric
Median Absolute Deviation. If your sample strength s has values that can jump by an order of magnitude across a sample then it is understandable that the standard deviation was not a good metric. The standard deviation is notoriously sensitive to outliers. So, try a more robust estimate of dispersion in your data. For example, the MAD estimate uses the median in the underlying computations which is more robust to a large spread in the numbers.
Robust measures of scale. Read up on other robust measures like the Interquartile range.
Perform outlier detection
Thresholding. This is similar to what you are already doing. However, you have to choose a suitable threshold for the metric computed above. You might consider using another robust metric for thresholding the metric. You can compute a robust estimate of their mean (e.g., the median) and a robust estimate of their standard deviation (e.g., 1.4826 * MAD). Then identify outliers as metric values above some number of robust standard deviations above the robust mean.
Histogram Another simple method is to histogram your computed metrics from step #1. This is non-parametric so it doesn't require you to model your data. If can histogram your metric values and then use the top 1% (or some other value) as your threshold limit.
Triangle Method A neat and simple heuristic for thresholding is the triangle method to perform binary classification of a skewed distribution.
Anomaly detection Read up on other outlier detection methods.

Determining Note Durations based on Onset Locations

I have a question regarding how to determine the Duration of notes given their Onset Locations.
So for example, I have an array of amplitude values (containing short) and another array of the same size, that contains a 1 if a note onset is detected, and a 0 if not. So basically, the distance between each 1 will be used to determine the duration.
How can I do this? I know that I have to use the Sample Rate and other attributes of the audio data, but is there a particular formula that I can use?
Thank you!
So you are starting with a list of ONSETS, what you are really looking for is a list of OFFSETS.
There are many methods for onset detection (here is a paper on it) https://adamhess.github.io/Onset_Detection_Nov302011.pdf
many of the same methods can be applied to Offset Detection:
Since the onset is marked by an INCREASE in spectral content you can measure a decrease in Spectral content.
take a reasonable time window before and after your onset. (.25-.5s)
Chop up the window into smaller segments and take 50% overlapping Fourier transforms.
compute the difference between the fourier co-efficient between two successive windows decreases and only allow negative changes in SD.
multiple your results by -1.
pick the peaks off of the results
Voila, offsets.
(look at page 7 of the paper listed above for more detail about spectrial difference function, you can apply a modified (as above) version of it_
Well, if your samplerate in Hz is fs, then the time between two nodes is equal to
1/fs * <number of zeros between the two node-ones>
Very simple :-)
Regards

Obtaining minimum and maximum rendered UV values (Direct3D)

I need to calculate the minimum and maximum UV values assigned to the pixels produced when a given object is drawn onscreen from a certain perspective. For example, if I have a UV-mapped cube but only the front face is visible, min(UV) and max(UV) should be set to the minimum and maximum UV coordinates assigned to the pixels of the visible face.
I'd like to do this using Direct3D 9 shaders (and the lowest shader model possible) to speed up processing. The vertex shader could be as simple as taking each input vertex's UV coordinates and passing them on, unmodified, to the pixel shader. The pixel shader, on the other hand, would have to take the values produced by the vertex shader and use these to compute the minimum and maximum UV values.
What I'd like to know is:
How do I maintain state (current min and max values) between invocations of the pixel shader?
How do I get the final min and max values produced by the shader into my program?
Is there any way to do this in HLSL, or am I stuck doing it by hand? If HLSL won't work, what would be the most efficient way to do this without the use of shaders?
1) You don't.
2) You would have to do a read back at some point. This will be a fairly slow process and cause a pipeline stall.
In general I can't think of a good way to do this. What exactly are you trying to acheieve with this? There may be some other way to achieve the result you are after.
You "may" be able to get something going using multiple render targets and writing the UVs for each pixel to the render target. Then you'd need to pass the render target back to main memory and then parse it for your min and max values. This is a realy slow and very ugly solution.
If you can do it as a couple of seperate pass you may be able to render to a very small render target and use 1 pass with a Max and 1 pass with a Min alpha blend op. Again ... not a great solution.

Resources