OpenCV - Capture arbitrary frame from video file - python-3.x

I use the following code to extract a specific frame from a video file. In this example, I'm simply getting the middle frame:
import cv2
video_path = '/tmp/wonderwall.mp4'
vidcap = cv2.VideoCapture(video_path)
middle_frame = int(vidcap.get(cv2.CAP_PROP_FRAME_COUNT) / 2)
success, image = vidcap.read()
count = 0
success = True
while success:
success, image = vidcap.read()
if count == middle_frame:
temp_file = tempfile.NamedTemporaryFile(suffix='.jpg', delete=False)
cv2.imwrite(temp_file.name, image)
count += 1
However, with this method, capturing the middle frame in a very large file can take a while.
Apparently, in the older cv module, one could do:
import cv
img = cv.QueryFrame(capture)
Is there a similar way in cv2 to grab a specific frame in a video file, without having to iterate through all frames?

You can do it in the same way, in C++ (python conversion should be more than easy).
cv::VideoCapture cap("file.avi");
double number_of_frame = cap.get(CV_CAP_PROP_FRAME_COUNT);
cap.set(CV_CAP_PROP_POS_FRAMES, IndexOfTheFrameYouWant);
cv::Mat frameIwant = cap.read();
For reference :
VideoCapture::get(int propId)
Can take various flag returning nearly all you can wish for (http://docs.opencv.org/2.4/modules/highgui/doc/reading_and_writing_images_and_video.html and look for get() ).
VideoCapture::set(int propId, double value)
Set will do what you want (same doc look for set) if you use the propID 1, and the index of the frame you desire.
You should note that if the index you use as parameter is superior to the max frame that the code will grab the last frame of the video if you are lucky, or crash at run time.

Related

Removing Gridlines from Scanned Graph Paper Documents

I would like to remove gridlines from a scanned document using Python to make them easier to read.
Here is a snippet of what we're working with:
As you can see, there are inconsistencies in the grid, and to make matters worse the scanning isn't always square. Five example documents can be found here.
I am open to whatever methods you may suggest for this, but using openCV and pypdf might be a good place to start before any more involved breaking out the machine learning techniques.
This post addresses a similar question, but does not have a solution. The user posted the following code snippet which may be of interest (to be honest I have not tested it, I am just putting it here for your convivence).
import cv2
import numpy as np
def rmv_lines(Image_Path):
img = cv2.imread(Image_Path)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray,50,150,apertureSize = 3)
minLineLength, maxLineGap = 100, 15
lines = cv2.HoughLinesP(edges,1,np.pi/180,100,minLineLength,maxLineGap)
for x in range(0, len(lines)):
for x1,y1,x2,y2 in lines[x]:
#if x1 != x2 and y1 != y2:
cv2.line(img,(x1,y1),(x2,y2),(255,255,255),4)
return cv2.imwrite('removed.jpg',img)
I would prefer the final documents be in pdf format if possible.
(disclaimer: I am the author of pText, the library being used in this answer)
I can help you part of the way (extracting the images from the PDF).
Start by loading the Document.
You'll see that I'm passing an extra parameter in the PDF.loads method.
SimpleImageExtraction acts like an EventListener for PDF instructions. Whenever it encounters an instruction that would render an image, it intercepts the instruction and stores the image.
with open(file, "rb") as pdf_file_handle:
l = SimpleImageExtraction()
doc = PDF.loads(pdf_file_handle, [l])
Now that we have loaded the Document, and SimpleImageExtraction should have had a chance to work its magic, we can output the images. In this example I'm just going to store them.
for i, img in enumerate(l.get_images_per_page(0)):
output_file = "image_" + str(i) + ".jpg"
with open(output_file, "wb") as image_file_handle:
img.save(image_file_handle)
You can obtain pText either on GitHub, or using PyPi
There are a ton more examples, check them out to find out more about working with images.

Create new raster (.tif) from standard deviation stretched bands, works with dstack but not to write a new file, Python

I am sorry if the title is unclear, I am new to python and my vocabulary is limited.
What I am trying to do is apply a standard deviation stretch to each band in a .tif raster and then create a new raster (.tif) by stacking those bands using GDAL (Python).
I able to create new false color rasters with differing band combinations and save them, and I am able to create my desired IMAGE in python using dstack (first block of code), but I am unable to save that image as a georectified .tif file.
So to create the stretched image using dstack my code looks like:
import os
import numpy as np
import matplotlib.pyplot as plt
import math
from osgeo import gdal
# code from my prof
def std_stretch_data(data, n=2):
"""Applies an n-standard deviation stretch to data."""
# Get the mean and n standard deviations.
mean, d = data.mean(), data.std() * n
# Calculate new min and max as integers. Make sure the min isn't
# smaller than the real min value, and the max isn't larger than
# the real max value.
new_min = math.floor(max(mean - d, data.min()))
new_max = math.ceil(min(mean + d, data.max()))
# Convert any values smaller than new_min to new_min, and any
# values larger than new_max to new_max.
data = np.clip(data, new_min, new_max)
# Scale the data.
data = (data - data.min()) / (new_max - new_min)
return data
# open the raster
img = gdal.Open(r'/Users/Rebekah/ThesisData/TestImages/OG/OG_1234.tif')
#open the bands
red = img.GetRasterBand(1).ReadAsArray()
green = img.GetRasterBand(2).ReadAsArray()
blue = img.GetRasterBand(3).ReadAsArray()
# create alpha band where a 0 indicates a transparent pixel and 1 is a opaque pixel
# (this is from class and i dont FULLY understand it)
alpha = np.where(red + green + blue ==0, 0, 1).astype(np.byte)
red_stretched = std_stretch_data(red, 1)
green_stretched = std_stretch_data(green, 1)
blue_stretched = std_stretch_data(blue, 1)
data_stretched = np.dstack((red_stretched, green_stretched, blue_stretched, alpha))
plt.imshow(data_stretched)
plt.show()
And that gives me a beautiful image of exactly what I want in a separate window. But no where in that code is an option to assign projections, or save it as a multiband tif.
So I took that and applied it the best I could to the code I use to create false color images and it fails (code below). If I create a 4 band tif with the alpha band the output is an empty tif, and if I create a 3 band tif and omit the alpha band the output is an entirely black tif.
import os
import numpy as np
import matplotlib.pyplot as plt
import math
from osgeo import gdal
#code from my professor
def std_stretch_data(data, n=2):
"""Applies an n-standard deviation stretch to data."""
# Get the mean and n standard deviations.
mean, d = data.mean(), data.std() * n
# Calculate new min and max as integers. Make sure the min isn't
# smaller than the real min value, and the max isn't larger than
# the real max value.
new_min = math.floor(max(mean - d, data.min()))
new_max = math.ceil(min(mean + d, data.max()))
# Convert any values smaller than new_min to new_min, and any
# values larger than new_max to new_max.
data = np.clip(data, new_min, new_max)
# Scale the data.
data = (data - data.min()) / (new_max - new_min)
return data
#open image
img = gdal.Open(r'/Users/Rebekah/ThesisData/TestImages/OG/OG_1234.tif')
# get geotill driver
gtiff_driver = gdal.GetDriverByName('GTiff')
# read in bands
red = img.GetRasterBand(1).ReadAsArray()
green = img.GetRasterBand(2).ReadAsArray()
blue = img.GetRasterBand(3).ReadAsArray()
# create alpha band where a 0 indicates a transparent pixel and 1 is a opaque pixel
# (this is from class and i dont FULLY understand it)
alpha = np.where(red + green + blue ==0, 0, 1).astype(np.byte)
# apply the 1 standard deviation stretch
red_stretched = std_stretch_data(red, 1)
green_stretched = std_stretch_data(green, 1)
blue_stretched = std_stretch_data(blue, 1)
# create empty tif file
NewImg = gtiff_driver.Create('/Users/riemann/ThesisData/TestImages/FCI_tests/1234_devst1.tif', img.RasterXSize, img.RasterYSize, 4, gdal.GDT_Byte)
if NewImg is None:
raise IOerror('could not create new raster')
# set the projection and geo transform of the new raster to be the same as the original
NewImg.SetProjection(img.GetProjection())
NewImg.SetGeoTransform(img.GetGeoTransform())
# write new bands to the new raster
band1 = NewImg.GetRasterBand(1)
band1.WriteArray(red_stretched)
band2 = NewImg.GetRasterBand(2)
band2.WriteArray(green_stretched)
band3= NewImg.GetRasterBand(3)
band3.WriteArray(blue_stretched)
alpha_band = NewImg.GetRasterBand(4)
alpha_band.WriteArray(alpha)
del band1, band2, band3, img, alpha_band
I am not entirely sure how to go from here and create a new file displaying the stretch on the different bands.
The image is just a 4 band raster (NAIP) downloaded from earthexplorer, I can upload the specific image I am using for my test if needed but there is nothing inherently special about this file compared to other NAIP images.
You should close the new Dataset (NewImg) as well by either adding it to the del list you already have, or setting it to None.
That properly closes the file and makes sure all data is written to disk.
There is however another issue, you are scaling your data between 0 and 1, but storing it as a Byte. So either change the output datatype from gdal.GDT_Byte to something like gdal.GDT_Float32. Or multiply your scaled data to fit the output datatype, in the case of Byte multiple with 255 (don't forget the alpha), you should properly round it for accuracy, GDAL will otherwise truncate to the nearest integer.
You can use np.iinfo() to check what the range of a datatype is, in case you are unsure what multiplication to use for other datatypes.
Depending on your use case, it might be easiest to use gdal.Translate for the scaling. If you would modify your scaling function a little to return the scaling parameteters instead of the data, you could use something like:
ds = gdal.Translate(output_file, input_file, outputType=gdal.GDT_Byte, scaleParams=[
[old_min_r, old_max_r, new_min_r, new_max_r], # red
[old_min_g, old_max_g, new_min_g, new_max_g], # green
[old_min_b, old_max_b, new_min_b, new_max_b], # blue
[old_min_a, old_max_a, new_min_a, new_max_a], # alpha
])
ds = None
You could also add the exponents keyword for non-linear stretching.
Using gdal.Translate would save you from all the standard file creation boilerplate, you still would need to think about the datatype, since that might change compared to the input file.

'numpy.ndarray' object has no attribute 'write'

I am writing a python code to calculate the background of an astronomical image of globular cluster M15 (M15 reduced). My code can calculate the background and plot it using plt.imshow(). To save the background subtracted image I have to convert it to a str from a numpy.nparray. I have tried many things including the np.array2string used here. The file just stays as an array, which can't be saved as I need it to save as a .fits file. Any ideas how to get this to a str?
The code:
#sigma clip is the number of standard deviations from centre value that value can be before being rejected
sigma_clip = SigmaClip(sigma=2.)
#used to estimate the background in each of the meshes
bkg_estimator = MedianBackground()
#define path for reading in images
M15red_path = Path('.', 'ObservingData/M15normalised/')
M15red_images = ccdp.ImageFileCollection(M15red_path)
M15reduced = M15red_images.files_filtered(imagetyp='Light Frame', include_path=True)
M15backsub_path = Path('.', 'ObservingData/M15backsub/')
for n in range (0,59):
bkg = Background2D(CCDData.read(M15reduced[n]).data, box_size=(20,20),
filter_size=(3, 3),
edge_method='pad',
sigma_clip=sigma_clip,
bkg_estimator=bkg_estimator)
M15subback = CCDData.read(M15reduced[n]).data - bkg.background
np.array2string(M15subback)
#M15subback.write(M15backsub_path / 'M15backsub{}.fits'.format(n))
print(type(M15subback[1]))
You could try using [numpy.save][1] (but it saves a '.npy' file). In your case,
import numpy as np
...
for n in range (0,59):
...
np.save('M15backsub{}.npy'.format(n), M15backsub)
Since you need to store a numpy array, this should work.

Change gif characteristics

I have written a python code which creates a gif from a list of images. In order to do this, I used the python library: imageio. Here is my code :
def create_gif(files, gif_path):
"""Creates an animated gif from a list of figures
Args:
files (list of str) : list of the files that are to be used for the gif creation.
All files should have the same extension which should be either png or jpg
gif_path (str) : path where the created gif is to be saved
Raise:
ValueError: if the files given in argument don't have the proper
file extenion (".png" or ".jpeg" for the images in 'files',
and ".gif" for 'gif_path')
"""
images = []
for image in files:
# Make sure that the file is a ".png" or a ".jpeg" one
if splitext(image)[-1] == ".png" or splitext(image)[-1] == ".jpeg":
pass
elif splitext(image)[-1] == "":
image += ".png"
else:
raise ValueError("Wrong file extension ({})".format(image))
# Reads the image with imageio and puts it into the images list
images.append(imageio.imread(image))
# Mak sure that the file is a ".gif" one
if splitext(gif_path)[-1] == ".gif":
pass
elif splitext(gif_path)[-1] == "":
gif_path += ".gif"
else:
raise ValueError("Wrong file extension ({})".format(gif_path))
# imageio writes all the images in a .gif file at the gif_path
imageio.mimsave(gif_path, images)
When I try this code with a list of images the Gif is correctly created but I have no idea how to change its parameters :
What I mean by that is that I would like to be able to control the delay between the gif's images, and also to control how much time the gif's is running.
I have tried to my gif with the Image module from PIL, and change its info, but when I save it my gif turns into my first image.
Could you please help me understand what I am doing wrong?
here is the code that I ran to try to change the gif prameter :
# Try to change gif parameters
my_gif = Image.open(my_gif.name)
my_gif_info = my_gif.info
print(my_gif_info)
my_gif_info['loop'] = 65535
my_gif_info['duration'] = 100
print(my_gif.info)
my_gif.save('./generated_gif/my_third_gif.gif')
You can just pass both parameters, loop and duration, to the mimsave/mimwrite method.
imageio.mimsave(gif_name, fileList, loop=4, duration = 0.3)
Next time you want to check which parameters can be used for a format compatible with imageio you can just use imageio.help(format name).
imageio.help("gif")
GIF-PIL - Static and animated gif (Pillow)
A format for reading and writing static and animated GIF, based
on Pillow.
Images read with this format are always RGBA. Currently,
the alpha channel is ignored when saving RGB images with this
format.
Parameters for reading
----------------------
None
Parameters for saving
---------------------
loop : int
The number of iterations. Default 0 (meaning loop indefinitely).
duration : {float, list}
The duration (in seconds) of each frame. Either specify one value
that is used for all frames, or one value for each frame.
Note that in the GIF format the duration/delay is expressed in
hundredths of a second, which limits the precision of the duration.
fps : float
The number of frames per second. If duration is not given, the
duration for each frame is set to 1/fps. Default 10.
palettesize : int
The number of colors to quantize the image to. Is rounded to
the nearest power of two. Default 256.
subrectangles : bool
If True, will try and optimize the GIF by storing only the
rectangular parts of each frame that change with respect to the
previous. Default False.

tesseract ocr is not working on image which have text length of only 2 or less. Works fine for Image with text length greater than 3

import pytesseract
from PIL import Image
def textFromTesseractOCR(croppedImage):
for i in range(14):
text = pytesseract.image_to_string(croppedImage, lang = 'eng', boxes = False ,config = '--psm '+ str(i) +' --oem 3')
print("PSM Mode", i)
print("Text detected: ",text)
imgPath = "ImagePath" #you can use image I have uploaded
img = Image.open(imgPath)
textFromTesseractOCR(img)
I am working on extracting Table data from PDF. For this I am converting pdf to png. Detecting Lines, ascertaining table by line intersection and then cropping individual cells to get their text.
This all works fine, but tesseract is not working on cells image which has text of length 2 or less.
Works for this image:
Result from tesseract:
Does not work for this image:
Result from tesseract: return empty string.
It also returns empty for numbers of text length 2 or less.
I have tried resizing the image(which I knew wouldn't work), also tried appending dummy text to the image but the result was bad(was working only for few and I didn't the exact location to append the dummy text in the image)
It would be great if someone could help me with this.
So I finally came with a workaround for this situation. The situation being tesseract-OCR giving empty string when the image contains only 1 or 2 length string(eg "1" or "25").
To get output in this situation I appended the same image multiple time at the original image so as to make its length greater than 2. For example, if the original image contained only "3", I appended "3" image(the same image) 4 more times and thereby making it an image which contains the text "33333". We then give this image to tesseract which gives output "33333"(most of the times).Then we just have to replace space with blank in the text output from the Tesseract and divide the resulting string length by 5 to get the index up to which we would want to text out from the whole text.
Please see code for reference, hope this helps:
import pytesseract ## pip3 install pytesseract
Method which calls tesseract for OCR or calls our workaround code if we get an empty string from tesseract output.
def textFromTesseractOCR(croppedImage):
text = pytesseract.image_to_string(croppedImage)
if text.strip() == '': ### program that handles our problem
if 0 not in croppedImage:
return ""
yDir = 3
xDir = 3
iterations = 4
img = generate_blocks_dilation(croppedImage, yDir, xDir, iterations)
## we dilation to get only the text portion of the image and not the whole image
kernelH = np.ones((1,5),np.uint8)
kernelV = np.ones((5,1),np.uint8)
img = cv2.dilate(img,kernelH,iterations = 1)
img = cv2.dilate(img,kernelV,iterations = 1)
image = cropOutMyImg(img, croppedImage)
concateImg = np.concatenate((image, image), axis = 1)
concateImg = np.concatenate((concateImg, image), axis = 1)
concateImg = np.concatenate((concateImg, image), axis = 1)
concateImg = np.concatenate((concateImg, image), axis = 1)
textA = pytesseract.image_to_string(concateImg)
textA = textA.strip()
textA = textA.replace(" ","")
textA = textA[0:int(len(textA)/5)]
return textA
return text
Method for dilation.This method is used to dilate only the text region of the image
def generate_blocks_dilation(img, yDir, xDir, iterations):
kernel = np.ones((yDir,xDir),np.uint8)
ret,img = cv2.threshold(img, 0, 1, cv2.THRESH_BINARY_INV)
return cv2.dilate(img,kernel,iterations = iterations)
Method to crop the dilated part of the image
def cropOutMyImg(gray, OrigImg):
mask = np.zeros(gray.shape,np.uint8) # mask image the final image without small pieces
_ , contours, hierarchy = cv2.findContours(gray,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
for cnt in contours:
if cv2.contourArea(cnt)!=0:
cv2.drawContours(mask,[cnt],0,255,-1) # the [] around cnt and 3rd argument 0 mean only the particular contour is drawn
# Build a ROI to crop the QR
x,y,w,h = cv2.boundingRect(cnt)
roi=mask[y:y+h,x:x+w]
# crop the original QR based on the ROI
QR_crop = OrigImg[y:y+h,x:x+w]
# use cropped mask image (roi) to get rid of all small pieces
QR_final = QR_crop * (roi/255)
return QR_final
I tried running tesseract on given 2 image but it does not returns text in shorter text image.
Another thing you can try is "Train a machine learning model (probably neural net) to on alphabets, numbers and special character, then when you want to get text from image, feed that image to model and it will predict text/characters."
Training dataset would look like :
Pair of (Image of character, 'character').
First element of pair is independent variable for model.
Second element of pair is corresponding character present in that image. It will be dependent variable for model.

Resources