google vision API returns empty bounding box vertexes, instead it returns normalised_vertexes - python-3.x

I am using vision.enums.Feature.Type.DOCUMENT_TEXT_DETECTION to extract some dense text in a pdf document. Here is my code:
from google.cloud import vision
def extract_text(bucket, filename, mimetype):
print('Looking for text in PDF {}'.format(filename))
# BATCH_SIZE; How many pages should be grouped into each json output file.
# """OCR with PDF/TIFF as source files on GCS"""
# Detect text
feature = vision.types.Feature(
type=vision.enums.Feature.Type.DOCUMENT_TEXT_DETECTION)
# Extract text from source bucket
gcs_source_uri = 'gs://{}/{}'.format(bucket, filename)
gcs_source = vision.types.GcsSource(uri=gcs_source_uri)
input_config = vision.types.InputConfig(
gcs_source=gcs_source, mime_type=mimetype)
request = vision.types.AnnotateFileRequest(features=[feature], input_config=input_config)
print('Waiting for the ORC operation to finish.')
ocr_response = vision_client.batch_annotate_files(requests=[request])
print('OCR completed.')
In the response, I am expecting to find into ocr_response.responses[1...n].pages[1...n].blocks[1...n].bounding_box a list of vertices filled in, but this list is empty. Instead, there is a normalized_vertices list which are the normalised vertices between 0 and 1. Why is that so? why the vertices structure is empty?
I am following this article, and the author there uses vertices, but I don't understand why I don't get them.
To convert them to the non normalised form, I am multiplying the normalised vertex by height and width, but the result is awful, the boxes are not well positioned.

To convert Normalized Vertex to Vertex you should multiply the x field of your NormalizedVertex with the width value to get the x field of the Vertex and multiply the y field of your NormalizedVertex with the height value to get the y of the Vertex.
The reason why you get Normalized Vertex, and the author of Medium article get Vertex is because the TEXT_DETECTION and DOCUMENT_TEXT_DETECTION models have been upgraded to newer versions since May 15, 2020, and medium article was written on Dec 25, 2018.
To use legacy models for results, you must specify "builtin/legacy_20190601" in the model field of a Feature object to get the old model results.
But the Google's doc mention that after November 15, 2020 the old models will not longer be offered.

Related

Flat-field correction on hyperspectral data

I am working on hyperspectral data set using the spectral python library. I started using python for the first time on Monday, so everything is taking me a long time.
My data is in envi format, and i believe I have successfully read it in and connverted to numpy arrays.
I am attempting a flat field correction using this code
corrected_nparr = np.divide(np.subtract(data_nparr, dark_nparr), np.subtract(white_nparr, dark_nparr))
ValueError: operands could not be broadcast together with shapes (1367,384,288) (100,384,288)
This doesnt work because my white reference and dark reference are a different size to the data capture.
print(white_nparr.shape)
(297, 384, 288)
print(dark_nparr.shape)
(100, 384, 288)
print(data_nparr.shape)
(1367, 384, 288)
So, I understand why I am getting the error. The original white and dark ref were captured using different image sizes to the dataset. So, my problem is creating a correction for the dataset whilst only having access to references of different sizes
Has anyone handled this before? What approach did you use?
btw the data I am using is mineral hyperspectral data captured from drill core, there is a huge dataset held by Geological Survey Ireland and is free upon request
So, I recieved and extremely helpful answer, which actually sparked a further question
# created these files to broadcast as they are a horizontal line of spectra,
#a 2D array which captures the variation
white_nparr_horiz = white_nparr[-2]
dark_nparr_horiz = dark_nparr[-2]
corrected_nparr = np.divide(np.subtract(data_nparr, dark_nparr_horiz), np.subtract(white_nparr_horiz, dark_nparr_horiz))
white_nparr_horiz.shape
Out[28]: (384, 288)
dark_nparr_horiz.shape Out[29]: (384, 288)
So the shape of these arrays are broadcastable accross the data_ref, and I have tested that it works as I expect with this, on a few different indices, and it does.
a = white_nparr_horiz[150, 144]
b = dark_nparr_horiz[150, 144]
c = data_nparr[500, 150, 144]
d = (c - b)/(a-b)
test = d == corrected_nparr[500, 150, 144]
print(test)
The output from this looks much more as I would expect reflectance data for this material to look, so I believe I am on the right path.
What I would like to do now is have white_nparr_horiz be the mean of each band along the original first axis in the white_ref (297, 384, 288), returned in an array of (384, 288), as opposed to a single value as I believe it is now. I am sure that this is possible, but I cannot figure out how.
As I said above, very new to python, numpy and image analysis, so apologies if this is obvious or I am going in the wrong direction
The problem is that your white and dark references should each be a single spectrum (1D array with 288 values), whereas yours are both 3-dimensional arrays (likely corresponding to image regions). To convert them to 1D, you can compute the mean, max, or min of each array, as appropriate. For example, to take the min of the dark reference and max of the white reference, you could convert them as follows:
dark_nparr = np.min(dark_nparr.reshape(-1, dark_nparr.shape[-1]), axis=0)
white_nparr = np.max(white_nparr.reshape(-1, white_nparr.shape[-1]), axis=0)
The lines above reshape the arrays to 2 dimensions and compute the max (or min) of the reshaped arrays.
If you prefer to use the spectral mean of each array instead, just replace np.max and np.min above with np.mean.
If you want each array to just be averaged over its first dimension, then (i.e., have shape (384, 288)), then just don't reshape the arrays when doing the reduction.
dark_nparr = np.min(dark_nparr, axis=0)
white_nparr = np.max(white_nparr, axis=0)

combine overlapping labelled objects and modify label values

I have a Z-stack of 2D confocal microscopy images (2D slices) and I want to segment cells. The Z-stack of 2D images is actually a 3D data. In different slices along the Z-axis, I see same cells do appear in multiple slices. I am interested in cell shape in the XY so I want to preserve the largest cell area from different Z-axis slices. I thought to combine the consecutive 2D slices after converting them to labelled binary images but I am having few issues and I need some help to proceed further.
I have two images img_a and img_b. I first converted them to binary images using OTSU, then applied some morphological operations and then used cv2.connectedComponentsWithStats() to obtain labelled objects. After labeling images, I combined them using cv2.bitwise_or() but it messes up with the labels. You can see this in the attached processed image (cell higlighted by red circles). I see multiple labels for overlapping cell. However, I want to assign one unique label for every combined overlapping object.
What I want at the end is that when I combine two labelled images, I want to assign one single label (a unique value) to the combined overlapping objects and keep the largest cell area by combining both images. Does anyone know how to do it?
Here is the code:
from matplotlib import pyplot as plt
from skimage import io, color, measure
from skimage.util import img_as_ubyte
from skimage.segmentation import clear_border
import cv2
import numpy as np
cells_a=img_a[:,:,1] # get the green channel
#Threshold image to binary using OTSU.
ret_a, thresh_a = cv2.threshold(cells_a, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
# Morphological operations to remove small noise - opening
kernel = np.ones((3,3),np.uint8)
opening_a = cv2.morphologyEx(thresh_a,cv2.MORPH_OPEN,kernel, iterations = 2)
opening_a = clear_border(opening_a) #Remove edge touchingpixels
numlabels_a, labels_a, stats_a, centroids_a = cv2.connectedComponentsWithStats(opening_a)
img_a1 = color.label2rgb(labels_a, bg_label=0)
## now do the same with image_b
cells_b=img_b[:,:,1] # get the green channel
#Threshold image to binary using OTSU.
ret_b, thresh_b = cv2.threshold(cells_b, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
# Morphological operations to remove small noise - opening
opening_b = cv2.morphologyEx(thresh_b,cv2.MORPH_OPEN,kernel, iterations = 2)
opening_b = clear_border(opening_b) #Remove edge touchingpixels
numlabels_b, labels_b, stats_b, centroids_b = cv2.connectedComponentsWithStats(opening_b)
img_b1 = color.label2rgb(labels_b, bg_label=0)
## Now combined two images
combined = cv2.bitwise_or(labels_a, labels_b) ## combined both labelled images to get maximum area per cell
combined_img = color.label2rgb(combined, bg_label=0)
plt.imshow(combined_img)
Images can be found here:
Based on the comments from Christoph Rackwitz and beaker, I started to look around for 3D connected components labeling. I found one python library that can handle such things and I installed it and give it a try. It seems to be doing pretty good. It does assign labels in each slice and keeps the labels same for the same cells in different slices. This is exactly what I wanted.
Here is the link to the library that I used to label objects in 3D.
https://pypi.org/project/connected-components-3d/

OSMNx : get coordinates of nodes/corners/edges of polygons/buildings

I am trying to retrieve the coordinates of all nodes/corners/edges of each commercial building in a list. E.g. for the supermarket Aldi in Macclesfield (UK), I can get from the UI 10 nodes (all the corners/edges of the supermarket) but I can only retrieve from osmnx 2 of those 10 nodes. I would need to access to the complete list of nodes but it truncates the results giving only 2 nodes of 10 in this case.Using this code below:
import osmnx as ox
test = ox.geocode_to_gdf('aldi, Macclesfield, Cheshire, GB')
ax = ox.project_gdf(test).plot()
test.geometry
or
gdf = ox.geometries_from_place('Grosvenor, Macclesfield, Cheshire, GB', tags)
gdf.geometry
Both return just two coordinates and truncate other info/results that is available in openStreetMap UI (you can see it in the first column of the image attached geometry>POLYGON>only two coordinates and other results truncated...). I would appreciate some help on this, thanks in advance.
It's hard to guess what you're doing here because you didn't provide a reproducible example (e.g., tags is undefined). But I'll try to guess what you're going for.
I am trying to retrieve the coordinates of all nodes/corners/edges of commercial buildings
Here I retrieve all the tagged commercial building footprints in Macclesfield, then extract the first one's polygon coordinates. You could instead filter these by other attribute values as you see fit if you only want certain kinds of buildings. Proper usage of OSMnx's geometries module is described in the documentation.
import osmnx as ox
# get the building footprints in Macclesfield
place = 'Macclesfield, Cheshire, England, UK'
tags = {'building': 'commercial'}
gdf = ox.geometries_from_place(place, tags)
# how many did we get?
print(gdf.shape) # (57, 10)
# extract the coordinates for the first building's footprint
gdf.iloc[0]['geometry'].exterior.coords
Alternatively, if you want a specific building's footprint, you can look up its OSM ID and tell OSMnx to geocode that value:
gdf = ox.geocode_to_gdf('W251154408', by_osmid=True)
polygon = gdf.iloc[0]['geometry']
polygon.exterior.coords
gdf = ox.geocode_to_gdf('W352332709', by_osmid=True)
polygon = gdf.iloc[0]['geometry']
polygon.exterior.coords
list(polygon.exterior.coords)

How to change only array for Dicom file with Simple ITK in python

I have a bunch of medical images in dicom that I want to correct for bias field inhomogeneity using SimpleITK in Python. The workflow is straightforward: I want to (1) open the dicom image, (2) create a binary mask of the object in the image, (3) apply N4 bias field correction to the masked image, (4) write back the corrected image in dicom format. Note that no spatial transformation is applied to the image, but only intensity transformation, so that I could copy all spatial information and all meta data (except for date/hour of creation and instance number) from the original to the corrected image.
I have written this function to achieve my goal:
def n4_dcm_correction(dcm_in_file):
metadata_to_set = ["0008|0012", "0008|0013", "0020|0013"]
filepath = PurePath(dcm_in_file)
root_dir = str(filepath.parent)
file_name = filepath.stem
dcm_reader = sitk.ImageFileReader()
dcm_reader.SetFileName(dcm_in_file)
dcm_reader.LoadPrivateTagsOn()
inputImage = dcm_reader.Execute()
metadata_to_copy = [k for k in inputImage.GetMetaDataKeys() if k not in metadata_to_set]
maskImage = sitk.OtsuThreshold(inputImage,0,1,200)
filledImage = sitk.BinaryFillhole(maskImage)
floatImage = sitk.Cast(inputImage,sitk.sitkFloat32)
corrector = sitk.N4BiasFieldCorrectionImageFilter();
output = corrector.Execute(floatImage, filledImage)
output.CopyInformation(inputImage)
for k in metadata_to_copy:
print("key is: {}; value is {}".format(k, inputImage.GetMetaData(k)))
output.SetMetaData(k, inputImage.GetMetaData(k))
output.SetMetaData("0008|0012", time.strftime("%Y%m%d"))
output.SetMetaData("0008|0013", time.strftime("%H%M%S"))
output.SetMetaData("0008|0013", str(float(inputImage.GetMetaData("0008|0013")) + randint(1, 999)))
out_file = "{}/{}_biascorrected.dcm".format(root_dir, file_name)
writer = sitk.ImageFileWriter()
writer.KeepOriginalImageUIDOn()
writer.SetFileName(out_file)
writer.Execute(sitk.Cast(output, sitk.sitkUInt16))
return
n4_dcm_correction("/path/to/my/dcm/image.dcm")
As much as the bias correction part works (the bias is removed), the writing part is a mess. I would expect my output dicom to have the exact same metadata of the original one, however they are all missing, notably the patient name, the protocol name and the manufacturer. Similalry, something is very wrong with the spatial information, since if I try to convert the dicom to the nifti format with dcm2niix, the directions are reversed: superior is down and inferior is up, forward is back and backward is front. What step am I missing ?
I suspect you are working with a MRI series, not a single file. Likely this example does what you want, read-modify-write a volume stored in a set of files.
If the example did not resolve your issue, please post to the ITK discourse which is the primary location for ITK/SimpleITK related discussions.

Creating a map with basemap, filling countries

I'm currently working in my final project for my Coding class (my first coding class, so kind of an amateur).
My idea is for a code to search every newspaper in the world for a specific word within the titles (using bs4) and then obtaining a dictionary with the average mentions by country, taking into account the number of newspaper in each country. Afterwards, and this is the part where I'm stuck, I want to put this in a map.
The whole program is already working properly, until the part where I have a CSV with the following form:
'Country','Average'
'Afghanistan',10
'Albania',5
'Algeria',0
'Andorra',2
'Antigua and Barbuda',7
'Argentina',0
'Armenia',4
Now, I want to create a worldmap where the higher the number, the redder (or any other color) the whole polygon of the country. So far I've found many codes that work well placing points in space, but I haven't found one that "appends" the CSV data presented above and then fills each country accordingly. Below is the part of the code that currently created the worldmap:
# Now we proceed with the creation of the map
fig, ax = plt.subplots(figsize=(15,10)) # We define the size of the map
m = Basemap(resolution='c', # c, l, i, h, f or None
projection='merc', # Mercator projection
lat_0=24.20, lon_0=-6.67, # The center of the mas, so that the whole world is shown without splitting Asia
llcrnrlon=-180, llcrnrlat= -85,urcrnrlon=180, urcrnrlat=85) # The coordinates of the whole world
m.drawmapboundary(fill_color='#46bcec') # We choose a color for the boundary of the map
m.fillcontinents(color='#f2f2f2',lake_color='#46bcec') # We choose a color for the land and one for the lakes
m.drawcoastlines() # We choose to draw the lines of the map
m.readshapefile('Final project\\vincent_map_data-master\\ne_110m_admin_0_countries\\ne_110m_admin_0_countries', 'areas') # We import the shape file of the whole world
df_poly = pd.DataFrame({ # We define the polygon structure
'shapes': [Polygon(np.array(shape), True) for shape in m.areas],
'area': [area['name'] for area in m.areas_info]
})
cmap = plt.get_cmap('Oranges')
pc = PatchCollection(df_poly.shapes, zorder=2)
norm = Normalize()
mapper = matplotlib.cm.ScalarMappable(norm=norm, cmap=cmap)
# We show the map
plt.show(m)
I opened the shapefile of the countries and the way to identify the countries is with the variable "sovereignty". There might be some non-sensical things within my code, since I've extracted things from many places. Sorry about that.
If someone could help me out, I would really appreciated.
Thanks

Resources