I have a bunch of medical images in dicom that I want to correct for bias field inhomogeneity using SimpleITK in Python. The workflow is straightforward: I want to (1) open the dicom image, (2) create a binary mask of the object in the image, (3) apply N4 bias field correction to the masked image, (4) write back the corrected image in dicom format. Note that no spatial transformation is applied to the image, but only intensity transformation, so that I could copy all spatial information and all meta data (except for date/hour of creation and instance number) from the original to the corrected image.
I have written this function to achieve my goal:
def n4_dcm_correction(dcm_in_file):
metadata_to_set = ["0008|0012", "0008|0013", "0020|0013"]
filepath = PurePath(dcm_in_file)
root_dir = str(filepath.parent)
file_name = filepath.stem
dcm_reader = sitk.ImageFileReader()
inputImage = dcm_reader.Execute()
metadata_to_copy = [k for k in inputImage.GetMetaDataKeys() if k not in metadata_to_set]
maskImage = sitk.OtsuThreshold(inputImage,0,1,200)
filledImage = sitk.BinaryFillhole(maskImage)
floatImage = sitk.Cast(inputImage,sitk.sitkFloat32)
corrector = sitk.N4BiasFieldCorrectionImageFilter();
output = corrector.Execute(floatImage, filledImage)
for k in metadata_to_copy:
print("key is: {}; value is {}".format(k, inputImage.GetMetaData(k)))
output.SetMetaData(k, inputImage.GetMetaData(k))
output.SetMetaData("0008|0012", time.strftime("%Y%m%d"))
output.SetMetaData("0008|0013", time.strftime("%H%M%S"))
output.SetMetaData("0008|0013", str(float(inputImage.GetMetaData("0008|0013")) + randint(1, 999)))
out_file = "{}/{}_biascorrected.dcm".format(root_dir, file_name)
writer = sitk.ImageFileWriter()
writer.Execute(sitk.Cast(output, sitk.sitkUInt16))
As much as the bias correction part works (the bias is removed), the writing part is a mess. I would expect my output dicom to have the exact same metadata of the original one, however they are all missing, notably the patient name, the protocol name and the manufacturer. Similalry, something is very wrong with the spatial information, since if I try to convert the dicom to the nifti format with dcm2niix, the directions are reversed: superior is down and inferior is up, forward is back and backward is front. What step am I missing ?

I suspect you are working with a MRI series, not a single file. Likely this example does what you want, read-modify-write a volume stored in a set of files.
If the example did not resolve your issue, please post to the ITK discourse which is the primary location for ITK/SimpleITK related discussions.


How to store raster with geospatial information in hdf5 format?

I would like to store many PRISM rasters in hdf5 format using h5py in python but I'm having difficulty figuring out how to get the coordinate reference systems stored so that a GIS software like ARCGIS or QGIS or other python modules (RasterIO) can read the file and know where it exists in space.
Basically I'm trying to follow the structure of how MODIS data is stored which is HDF4 but in hdf5 and with h5py.
precip_path = os.path.join(wrk_dir,"prism_data","Yearly_PRISM_PRCP_Clipped_1961_2021","*" )
# Get list of precip rasters
precip_list = sorted(glob(precip_path))
# Open on raster and get all needed information
raster_ds = rxr.open_rasterio(precip_list[0]).squeeze()
ds_size = raster_ds.shape
ds_name = os.path.basename(precip_list[0])[:-4]
x_dims = raster_ds.coords['x'].values
y_dims = raster_ds.coords['y'].values
# # Get georeference information
crs =
# # # Create Groups
hf = h5py.File("precip_hdf.h5",'w')
grp = hf.create_group("PRISM_PRCP")
# Create datasets that contain the x and y coordinates of the prism dataset
x_coords = grp.create_dataset("x_coords",data=x_dims)
y_coords = grp.create_dataset("y_coords", data=y_dims)
dset = grp.create_dataset(ds_name, data=raster_ds)
grp[ds_name].dims[0].label = 'x'
grp[ds_name].dims[1].label = 'y'
# Attach a scale to the dimensions of the prism dataset
When I ran the above code all the information is stored and I can view the raster in HDFview and see all the groups with spatial information but when I try to open this file in GIS software it doesn't recognize the spatial information. I'm guessing there is a bit of code that does this but I can't figure out what it is.

Removing Gridlines from Scanned Graph Paper Documents

I would like to remove gridlines from a scanned document using Python to make them easier to read.
Here is a snippet of what we're working with:
As you can see, there are inconsistencies in the grid, and to make matters worse the scanning isn't always square. Five example documents can be found here.
I am open to whatever methods you may suggest for this, but using openCV and pypdf might be a good place to start before any more involved breaking out the machine learning techniques.
This post addresses a similar question, but does not have a solution. The user posted the following code snippet which may be of interest (to be honest I have not tested it, I am just putting it here for your convivence).
import cv2
import numpy as np
def rmv_lines(Image_Path):
img = cv2.imread(Image_Path)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray,50,150,apertureSize = 3)
minLineLength, maxLineGap = 100, 15
lines = cv2.HoughLinesP(edges,1,np.pi/180,100,minLineLength,maxLineGap)
for x in range(0, len(lines)):
for x1,y1,x2,y2 in lines[x]:
#if x1 != x2 and y1 != y2:
return cv2.imwrite('removed.jpg',img)
I would prefer the final documents be in pdf format if possible.
(disclaimer: I am the author of pText, the library being used in this answer)
I can help you part of the way (extracting the images from the PDF).
Start by loading the Document.
You'll see that I'm passing an extra parameter in the PDF.loads method.
SimpleImageExtraction acts like an EventListener for PDF instructions. Whenever it encounters an instruction that would render an image, it intercepts the instruction and stores the image.
with open(file, "rb") as pdf_file_handle:
l = SimpleImageExtraction()
doc = PDF.loads(pdf_file_handle, [l])
Now that we have loaded the Document, and SimpleImageExtraction should have had a chance to work its magic, we can output the images. In this example I'm just going to store them.
for i, img in enumerate(l.get_images_per_page(0)):
output_file = "image_" + str(i) + ".jpg"
with open(output_file, "wb") as image_file_handle:
You can obtain pText either on GitHub, or using PyPi
There are a ton more examples, check them out to find out more about working with images.

google vision API returns empty bounding box vertexes, instead it returns normalised_vertexes

I am using vision.enums.Feature.Type.DOCUMENT_TEXT_DETECTION to extract some dense text in a pdf document. Here is my code:
from import vision
def extract_text(bucket, filename, mimetype):
print('Looking for text in PDF {}'.format(filename))
# BATCH_SIZE; How many pages should be grouped into each json output file.
# """OCR with PDF/TIFF as source files on GCS"""
# Detect text
feature = vision.types.Feature(
# Extract text from source bucket
gcs_source_uri = 'gs://{}/{}'.format(bucket, filename)
gcs_source = vision.types.GcsSource(uri=gcs_source_uri)
input_config = vision.types.InputConfig(
gcs_source=gcs_source, mime_type=mimetype)
request = vision.types.AnnotateFileRequest(features=[feature], input_config=input_config)
print('Waiting for the ORC operation to finish.')
ocr_response = vision_client.batch_annotate_files(requests=[request])
print('OCR completed.')
In the response, I am expecting to find into ocr_response.responses[1...n].pages[1...n].blocks[1...n].bounding_box a list of vertices filled in, but this list is empty. Instead, there is a normalized_vertices list which are the normalised vertices between 0 and 1. Why is that so? why the vertices structure is empty?
I am following this article, and the author there uses vertices, but I don't understand why I don't get them.
To convert them to the non normalised form, I am multiplying the normalised vertex by height and width, but the result is awful, the boxes are not well positioned.
To convert Normalized Vertex to Vertex you should multiply the x field of your NormalizedVertex with the width value to get the x field of the Vertex and multiply the y field of your NormalizedVertex with the height value to get the y of the Vertex.
The reason why you get Normalized Vertex, and the author of Medium article get Vertex is because the TEXT_DETECTION and DOCUMENT_TEXT_DETECTION models have been upgraded to newer versions since May 15, 2020, and medium article was written on Dec 25, 2018.
To use legacy models for results, you must specify "builtin/legacy_20190601" in the model field of a Feature object to get the old model results.
But the Google's doc mention that after November 15, 2020 the old models will not longer be offered.

Open cv compare two face embeddings

I went through Pyimagesearch face Recognition tutorial,
but my application need to compare two faces only,
I have embedding of two faces, how to compare them using opencv ?
about the trained model which is use to extract embedding from face is mentioned in link,
I want to know that what methods I should try to compare two face embedding.
(Note: I am new to this field)
First of all your case is similar to given tutorial, instead of multiple images you have single image that you need to compare with test image,
So you don't really need training step here.
You can do
# read 1st image and store encodings
image = cv2.imread(args["image"])
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
boxes = face_recognition.face_locations(rgb, model=args["detection_method"])
encodings1 = face_recognition.face_encodings(rgb, boxes)
# read 2nd image and store encodings
image = cv2.imread(args["image"])
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
boxes = face_recognition.face_locations(rgb, model=args["detection_method"])
encodings2 = face_recognition.face_encodings(rgb, boxes)
# now you can compare two encodings
# optionally you can pass threshold, by default it is 0.6
matches = face_recognition.compare_faces(encoding1, encoding2)
matches will give you True or False based on your images
Based on the article you mentioned, you can actually compare if two faces are the same using only the face_recognition library.
You can use the compare faces to determine if two pictures have the same face
import face_recognition
known_image = face_recognition.load_image_file("biden.jpg")
unknown_image = face_recognition.load_image_file("unknown.jpg")
biden_encoding = face_recognition.face_encodings(known_image)[0]
unknown_encoding = face_recognition.face_encodings(unknown_image)[0]
results = face_recognition.compare_faces([biden_encoding], unknown_encoding)

How do i convert an image read with cv2.imread('img.png',cv2.IMREAD_UNCHANGED) to the format of cv2.imread('img.png',cv2.IMREAD_COLOR)

I'm trying to read an image in unchanged format, do some operations and convert it back to the colored format
im = cv2.imread(fname,cv2.IMREAD_UNCHANGED) # shape(240,240,4)
im2 = cv2.imread(im,cv2.IMREAD_COLOR) # required shape(240,240,3)
But, looks like I can't input the result of first numpy array into the second imread.
So currently I've created a temporary image after the operations and reading that value to get the required im2 value.
im = cv2.imread(fname,cv2.IMREAD_UNCHANGED) # shape(240,240,4)
im2 = cv2.imread('img.png',cv2.IMREAD_COLOR) # required shape(240,240,3)
However I would like to avoid the step of creating temporary image. How would I achieve the same with a better approach
OpenCV has a function for color conversion cvtColor
im2 = cv2.cvtColor(im, <conversion code>)
You should figure out conversion code yourself, based on image format you have. Probably, it would be cv2.COLOR_BGRA2BGR
