Receiving coordinates from inference Pytorch - python-3.x

I'm trying to get the coordinates of the pixels inside of a mask that is generated by Pytorches DefaultPredictor, to later on get the polygon corners and use this in my application.
However, DefaultPredictor produced a tensor of pred_masks, in the following format: [False, False ... False], ... [False, False, .. False]
Where the length of each individual list is length of the image, and the number of total lists is the height of the image.
Now, as I need to get the pixel coordinates that are inside of the mask, the simple solution seemed to be looping through the pred_masks, checking the value and if == "True" creating tuples of these and adding them to a list. However, as we are talking about images with width x height of about 3200 x 1600, this is a relatively slow process (~4 seconds to loop through a single 3200x1600, yet as there are quite some objects for which I need to get the inference in the end - this will end up being incredibly slow).
What would be the smarter way to get the the coordinates (mask) of the detected object using the pytorch (detectron2) model?
Please find my code below for reference:
from __future__ import print_function
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.data import MetadataCatalog
from detectron2.data.datasets import register_coco_instances
import cv2
import time
# get image
start = time.time()
im = cv2.imread("inputImage.jpg")
# Create config
cfg = get_cfg()
cfg.merge_from_file("detectron2_repo/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5 # Set threshold for this model
cfg.MODEL.WEIGHTS = "model_final.pth" # Set path model .pth
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1
cfg.MODEL.DEVICE='cpu'
register_coco_instances("dataset_test",{},"testval.json","Images_path")
test_metadata = MetadataCatalog.get("dataset_test")
# Create predictor
predictor = DefaultPredictor(cfg)
# Make prediction
outputs = predictor(im)
#Loop through the pred_masks and check which ones are equal to TRUE, if equal, add the pixel values to the true_cords_list
outputnump = outputs["instances"].pred_masks.numpy()
true_cords_list = []
x_length = range(len(outputnump[0][0]))
#y kordinaat on range number
for y_cord in range(len(outputnump[0])):
#x cord
for x_cord in x_length:
if str(outputnump[0][y_cord][x_cord]) == "True":
inputcoords = (x_cord,y_cord)
true_cords_list.append(inputcoords)
print(str(true_cords_list))
end = time.time()
print(f"Runtime of the program is {end - start}") # 14.29468035697937
//
EDIT:
After changing the for loop partially to compress - I've managed to reduce the runtime of the for loop by ~3x - however, ideally I would like to receive this from the predictor itself if possible.
y_length = len(outputnump[0])
x_length = len(outputnump[0][0])
true_cords_list = []
for y_cord in range(y_length):
x_cords = list(compress(range(x_length), outputnump[0][y_cord]))
if x_cords:
for x_cord in x_cords:
inputcoords = (x_cord,y_cord)
true_cords_list.append(inputcoords)

The problem is easily solvable with sufficient knowledge about NumPy or PyTorch native array handling, which allows 100x speedups compared to Python loops. You can study the NumPy library, and PyTorch tensors are similar to NumPy in behaviour.
How to get indices of values in NumPy:
import numpy as np
arr = np.random.rand(3,4) > 0.5
ind = np.argwhere(arr)[:, ::-1]
print(arr)
print(ind)
In your particular case this will be
ind = np.argwhere(outputnump[0])[:, ::-1]
How to get indices of values in PyTorch:
import torch
arr = torch.rand(3, 4) > 0.5
ind = arr.nonzero()
ind = torch.flip(ind, [1])
print(arr)
print(ind)
[::-1] and .flip are used to inverse the order of coordinates from (y, x) to (x, y).
NumPy and PyTorch even allow checking simple conditions and getting the indices of values that meet these conditions, for further understanding see the according NumPy docs article
When asking, you should provide links for your problem context. This question is actually about Facebook object detector, where they provide a nice demo Colab notebook.

Related

Python scipy interpolation meshgrid data

Dear all I want to interpolate an experimental data in order to make it look with higher resolution but apparently it does not work. I followed the example in this link for mgrid data the csv data can be found goes as follow.
Csv data
My code
import pandas as pd
import numpy as np
import scipy
x=np.linspace(0,2.8,15)
y=np.array([2.1,2,1.9,1.8,1.7,1.6,1.5,1.4,1.3,1.2,1.1,0.9,0.7,0.5,0.3,0.13])
[X, Y]=np.meshgrid(x,y)
Vx_df=pd.read_csv("Vx.csv", header=None)
Vx=Vx_df.to_numpy()
tck=scipy.interpolate.bisplrep(X,Y,Vx)
plt.pcolor(X,Y,Vx, shading='nearest');
plt.show()
xi=np.linspace(0.1, 2.5, 30)
yi=np.linspace(0.15, 2.0, 50)
[X1, Y1]=np.meshgrid(xi,yi)
VxNew = scipy.interpolate.bisplev(X1[:,0], Y1[0,:], tck, dx=1, dy=1)
plt.pcolor(X1,Y1,VxNew, shading='nearest')
plt.show()
CSV DATA:
0.73,,,-0.08,-0.19,-0.06,0.02,0.27,0.35,0.47,0.64,0.77,0.86,0.90,0.93
0.84,,,0.13,0.03,0.12,0.23,0.32,0.52,0.61,0.72,0.83,0.91,0.96,0.95
1.01,1.47,,0.46,0.46,0.48,0.51,0.65,0.74,0.80,0.89,0.99,0.99,1.07,1.06
1.17,1.39,1.51,1.19,1.02,0.96,0.95,1.01,1.01,1.05,1.06,1.05,1.11,1.13,1.19
1.22,1.36,1.42,1.44,1.36,1.23,1.24,1.17,1.18,1.14,1.14,1.09,1.08,1.14,1.19
1.21,1.30,1.35,1.37,1.43,1.36,1.33,1.23,1.14,1.11,1.05,0.98,1.01,1.09,1.15
1.14,1.17,1.22,1.25,1.23,1.16,1.23,1.00,1.00,0.93,0.93,0.80,0.82,1.05,1.09
,0.89,0.95,0.98,1.03,0.97,0.94,0.84,0.77,0.68,0.66,0.61,0.48,,
,0.06,0.25,0.42,0.55,0.55,0.61,0.49,0.46,0.56,0.51,0.40,0.28,,
,0.01,0.05,0.13,0.23,0.32,0.33,0.37,0.29,0.30,0.32,0.27,0.25,,
,-0.02,0.01,0.07,0.15,0.21,0.23,0.22,0.20,0.19,0.17,0.20,0.21,0.13,
,-0.07,-0.05,-0.02,0.06,0.07,0.07,0.16,0.11,0.08,0.12,0.08,0.13,0.16,
,-0.13,-0.14,-0.09,-0.07,0.01,-0.03,0.06,0.02,-0.01,0.00,0.01,0.02,0.04,
,-0.16,-0.23,-0.21,-0.16,-0.10,-0.08,-0.05,-0.11,-0.14,-0.17,-0.16,-0.11,-0.05,
,-0.14,-0.25,-0.29,-0.32,-0.31,-0.33,-0.31,-0.34,-0.36,-0.35,-0.31,-0.26,-0.14,
,-0.02,-0.07,-0.24,-0.36,-0.39,-0.45,-0.45,-0.52,-0.48,-0.41,-0.43,-0.37,-0.22,
The image of the low resolution (without iterpolation) is Low resolution and the image I get after interpolation is High resolution
Can you please give me some advice? why it does not interpolate properly?
Ok so to interpolate we need to set up an input and output grid an possibly need to remove values from the grid that are missing. We do that like so
array = pd.read_csv(StringIO(csv_string), header=None).to_numpy()
def interp(array, scale=1, method='cubic'):
x = np.arange(array.shape[1]*scale)[::scale]
y = np.arange(array.shape[0]*scale)[::scale]
x_in_grid, y_in_grid = np.meshgrid(x,y)
x_out, y_out = np.meshgrid(np.arange(max(x)+1),np.arange(max(y)+1))
array = np.ma.masked_invalid(array)
x_in = x_in_grid[~array.mask]
y_in = y_in_grid[~array.mask]
return interpolate.griddata((x_in, y_in), array[~array.mask].reshape(-1),(x_out, y_out), method=method)
Now we need to call this function 3 times. First we fill the missing values in the middle with spline interpolation. Then we fill the boundary values with nearest neighbor interpolation. And finally we size it up by interpreting the pixels as being a few pixels apart and filling in gaps with spline interpolation.
array = interp(array)
array = interp(array, method='nearest')
array = interp(array, 50)
plt.imshow(array)
And we get the following result

How to bin a netcdf data using xarray

I have some spatiotemporal data derived from the CHIRPS Database. It is a NetCDF that contains daily precipitation for all over the world with a spatial resolution of 1x1km2. The DataSet possesses 3 dimensions ('time', 'longitude', 'latitude').
I would like to bin this precipitation data according to each pixel's coordinate ('latitude' & 'longitude') temporal distribution. Therefore, the dimension I wish to apply the binnarization is the 'time' domain.
This is a similar question already discussed in StackOverflow (see in here). The difference between their Issue and mine is that, in my case, I need to binnarize the data according to each specific pixel's temporal distribution, instead of applying a single range of values for binnarization for all my coordinates (pixels). As a consequence, I expect to have different binning thresholds ('n' sets of thresholds), one for each of the 'n' pixels in my dataset.
As far as I understand, the simplest and fastest way to apply a function over each of the coordinates (pixels) of a Xarray's DataArray/DataSet is to use the xarray.apply_ufunc.
For the binnarization, I am using the pandas qcut method, which only requires an array of values and some given relative frequency (i.e.: [0.1%, 0.5%, 25%, 99%]) in order for it to work.
Since pandas binning function requires an array of data, and it also returns another array of binnarized data, I understand that I have to use the argument "vectorize"=True in the U_function (described in here).
Finally, when I run the analysis, The resulted Xarray DataSet ends up losing the 'time' dimension after the processing. Also, I get unsure whether that processing truly returned an Xarray DataSet with data properly classified.
Here is a reproducible snippet code. Notice that the 'time' dimension of the "ds_binned" is lost. Therefore, I have to later insert the binned data back to the original xarray dataset (ds). Also notice that the dimensions are not set in proper order. That also is causing problems for my analysis.
import pandas as pd
pd.set_option('display.width', 50000)
pd.set_option('display.max_rows', 50000)
pd.set_option('display.max_columns', 5000)
import numpy as np
import xarray as xr
from dask.diagnostics import ProgressBar
ds = xr.tutorial.open_dataset('rasm').load()
def parse_datetime(time):
return pd.to_datetime([str(x) for x in time])
ds.coords['time'] = parse_datetime(ds.coords['time'].values)
def binning_function(x, distribution_type='Positive', b=False):
y = np.where(np.abs(x)==np.inf, 0, x)
y = np.where(np.isnan(y), 0, y)
if np.all(y) == 0:
return x
else:
Classified = pd.qcut(y, np.linspace(0.01, 1, 10))
return Classified.codes
def xarray_parse_extremes(ds, dim=['time'], dask='allowed', new_dim_name=['classes'], kwargs={'b': False, 'distribution_type':'Positive'}):
filtered = xr.apply_ufunc(binning_function,
ds,
dask=dask,
vectorize=True,
input_core_dims=[dim],
#exclude_dims = [dim],
output_core_dims=[new_dim_name],
kwargs=kwargs,
output_dtypes=[float],
join='outer',
dataset_fill_value=np.nan,
).compute()
return filtered
with ProgressBar():
da_binned = xarray_parse_extremes(ds['Tair'] ,
['time'],
dask='allowed')
da_binned.name = 'classes'
ds_binned = da_binned.to_dataset()
ds['classes'] = (('y', 'x', 'time'), ds_binned['classes'].values)
mask = (ds['classes'] >= 5) & (ds['classes'] != 0)
ds.where(mask, drop=True).resample({'time':'Y'}).count('time')['Tair'].isel({'time':-1}).plot()
print(ds)
(ds.where(mask, drop=True).resample({'time':'Y'}).count('time')['Tair']
.to_dataframe().dropna().sort_values('Tair', ascending=False)
)
delayed_to_netcdf = ds.to_netcdf(r'F:\Philipe\temp\teste_tutorial.nc',
engine='netcdf4',
compute =False)
print('saving data classified')
with ProgressBar():
delayed_to_netcdf.compute()

How to deform/scale a 3 dimensional numpy array in one dimension?

I would like to deform/scale a three dimensional numpy array in one dimension. I will visualize my problem in 2D:
I have the original image, which is a 2D numpy array:
Then I want to deform/scale it for some factor in dimension 0, or horizontal dimension:
For PIL images, there are a lot of solutions, for example in pytorch, but what if I have a numpy array of shapes (w, h, d) = (288, 288, 468)? I would like to upsample the width with a factor of 1.04, for example, to (299, 288, 468). Each cell contains a normalized number between 0 and 1.
I am not sure, if I am simply not looking for the correct vocabulary, if I try to search online. So also correcting my question would help. Or tell me the mathematical background of this problem, then I can write the code on my own.
Thank you!
You can repeat the array along the specific axis a number of times equal to ceil(factor) where factor > 1 and then evenly space indices on the stretched dimension to select int(factor * old_length) elements. This does not perform any kind of interpolation but just repeats some of the elements:
import math
import cv2
import numpy as np
from scipy.ndimage import imread
img = imread('/tmp/example.png')
print(img.shape) # (512, 512)
axis = 1
factor = 1.25
stretched = np.repeat(img, math.ceil(factor), axis=axis)
print(stretched.shape) # (512, 1024)
indices = np.linspace(0, stretched.shape[axis] - 1, int(img.shape[axis] * factor))
indices = np.rint(indices).astype(int)
result = np.take(stretched, indices, axis=axis)
print(result.shape) # (512, 640)
cv2.imwrite('/tmp/stretched.png', result)
This is the result (left is original example.png and right is stretched.png):
Looks like it is as easy as using the torch.nn.functional.interpolate functional from pytorch and choosing 'trilinear' as interpolation mode:
import torch
PET = torch.tensor(data)
print("Old shape = {}".format(PET.shape))
scale_factor_x = 1.4
# Scaling.
PET = torch.nn.functional.interpolate(PET.unsqueeze(0).unsqueeze(0),\
scale_factor=(scale_factor_x, 1, 1), mode='trilinear').squeeze().squeeze()
print("New shape = {}".format(PET.shape))
output:
>>> Old shape = torch.Size([288, 288, 468])
>>> New shape = torch.Size([403, 288, 468])
I verified the results by looking at the data, but I can't show them here due to data privacy. Sorry!
This is an example for linear up-sampling a 3D Image with scipy.interpolate, hope it helps.
(I worked quite a lot with np.meshgrid here, if you not familiar with it i recently explained it here)
import numpy as np
import matplotlib.pyplot as plt
import scipy
from scipy.interpolate import RegularGridInterpolator
# should be 1.3.0
print(scipy.__version__)
# =============================================================================
# producing a test image "image3D"
# =============================================================================
def some_function(x,y,z):
# output is a 3D Gaussian with some periodic modification
# its only for testing so this part is not impotent
out = np.sin(2*np.pi*x)*np.cos(np.pi*y)*np.cos(4*np.pi*z)*np.exp(-(x**2+y**2+z**2))
return out
# define a grid to evaluate the function on.
# the dimension of the 3D-Image will be (20,20,20)
N = 20
x = np.linspace(-1,1,N)
y = np.linspace(-1,1,N)
z = np.linspace(-1,1,N)
xx, yy, zz = np.meshgrid(x,y,z,indexing ='ij')
image3D = some_function(xx,yy,zz)
# =============================================================================
# plot the testimage "image3D"
# you will see 5 images that corresponds to the slicing of the
# z-axis similar to your example picture_
# https://sites.google.com/site/linhvtlam2/fl7_ctslices.jpg
# =============================================================================
def plot_slices(image_3d):
f, loax = plt.subplots(1,5,figsize=(15,5))
loax = loax.flatten()
for ii,i in enumerate([8,9,10,11,12]):
loax[ii].imshow(image_3d[:,:,i],vmin=image_3d.min(),vmax=image_3d.max())
plt.show()
plot_slices(image3D)
# =============================================================================
# interpolate the image
# =============================================================================
interpolation_function = RegularGridInterpolator((x, y, z), image3D, method = 'linear')
# =============================================================================
# evaluate at new grid
# =============================================================================
# create the new grid that you want
x_new = np.linspace(-1,1,30)
y_new = np.linspace(-1,1,40)
z_new = np.linspace(-1,1,N)
xx_new, yy_new, zz_new = np.meshgrid(x_new,y_new,z_new,indexing ='ij')
# change the order of the points to match the input shape of the interpolation
# function. That's a bit messy but i couldn't figure out a way around that
evaluation_points = np.rollaxis(np.array([xx_new,yy_new,zz_new]),0,4)
interpolated = interpolation_function(evaluation_points)
plot_slices(interpolated)
The original (20,20,20) dimensional 3D Image:
And the upsampeled (30,40,20) dimensional 3D Image:

Keras layer for slicing image data into sliding windows

I have a set of images, all of varying widths, but with fixed height set to 100 pixels and 3 channels of depth. My task is to classify if each vertical line in the image is interesting or not. To do that, I look at the line in context of its 10 predecessor and successor lines. Imagine the algorithm sweeping from left to right of the image, detecting vertical lines containing points of interest.
My first attempt at doing this was to manually cut out these sliding windows using numpy before feeding the data into the Keras model. Like this:
# Pad left and right
s = np.repeat(D[:1], 10, axis = 0)
e = np.repeat(D[-1:], 10, axis = 0)
# D now has shape (w + 20, 100, 3)
D = np.concatenate((s, D, e))
# Sliding windows creation trick from SO question
idx = np.arange(21)[None,:] + np.arange(len(D) - 20)[:,None]
windows = D[indexer]
Then all windows and all ground truth 0/1 values for all vertical lines in all images would be concatenated into two very long arrays.
I have verified that this works, in principle. I fed each window to a Keras layer looking like this:
Conv2D(20, (5, 5), input_shape = (21, 100, 3), padding = 'valid', ...)
But the windowing causes the memory usage to increase 21 times so doing it this way becomes impractical. But I think my scenario is a very common in machine learning so there must be some standard method in Keras to do this efficiently? E.g I would like to feed Keras my raw image data (w, 100, 80) and tell it what the sliding window sizes are and let it figure out the rest. I have looked at some sample code but I'm a ml noob so I don't get it.
Unfortunately this isn't an easy problem because it can involve using a variable sized input for your Keras model. While I think it is possible to do this with proper use of placeholders that's certainly no place for a beginner to start. your other option is a data generator. As with many computationally intensive tasks there is often a trade off between compute speed and memory requirements, using a generator is more compute heavy and it will be done entirely on your cpu (no gpu acceleration), but it won't make the memory increase.
The point of a data generator is that it will apply the operation to images one at a time to produce the batch, then train on that batch, then free up the memory - so you only end up keeping one batch worth of data in memory at any time. Unfortunately if you have a time consuming generation then this can seriously affect performance.
The generator will be a python generator (using the 'yield' keyword) and is expected to produce a single batch of data, keras is very good at using arbitrary batch sizes, so you can always make one image yield one batch, especially to start.
Here is the keras page on fit_generator - I warn you, this starts to become a lot of work very quickly, consider buying more memory:
https://keras.io/models/model/#fit_generator
Fine I'll do it for you :P
import numpy as np
import pandas as pd
import keras
from keras.models import Model, model_from_json
from keras.layers import Dense, Concatenate, Multiply,Add, Subtract, Input, Dropout, Lambda, Conv1D, Flatten
from tensorflow.python.client import device_lib
# check for my gpu
print(device_lib.list_local_devices())
# make some fake image data
# 1000 random widths
data_widths = np.floor(np.random.random(1000)*100)
# producing 1000 random images with dimensions w x 100 x 3
# and a vector of which vertical lines are interesting
# I assume your data looks like this
images = []
interesting = []
for w in data_widths:
images.append(np.random.random([int(w),100,3]))
interesting.append(np.random.random(int(w))>0.5)
# this is a generator
def image_generator(images, interesting):
num = 0
while num < len(images):
windows = None
truth = None
D = images[num]
# this should look familiar
# Pad left and right
s = np.repeat(D[:1], 10, axis = 0)
e = np.repeat(D[-1:], 10, axis = 0)
# D now has shape (w + 20, 100, 3)
D = np.concatenate((s, D, e))
# Sliding windows creation trick from SO question
idx = np.arange(21)[None,:] + np.arange(len(D) - 20)[:,None]
windows = D[idx]
truth = np.expand_dims(1*interesting[num],axis=1)
yield (windows, truth)
num+=1
# the generator MUST loop
if num == len(images):
num = 0
# basic model - replace with your own
input_layer = Input(shape = (21,100,3), name = "input_node")
fc = Flatten()(input_layer)
fc = Dense(100, activation='relu',name = "fc1")(fc)
fc = Dense(50, activation='relu',name = "fc2")(fc)
fc = Dense(10, activation='relu',name = "fc3")(fc)
output_layer = Dense(1, activation='sigmoid',name = "output")(fc)
model = Model(input_layer,output_layer)
model.compile(optimizer="adam", loss='binary_crossentropy')
model.summary()
#and training
training_history = model.fit_generator(image_generator(images, interesting),
epochs =5,
initial_epoch = 0,
steps_per_epoch=len(images),
verbose=1
)

Index 150 out of bounds in axis0 with size 1

I was making histogram using numpy array in Python with open cv. The code is as follows:
#finding histogram of an image
import numpy as np
import cv2
img = cv2.imread("cr7.jpg")
gry_img=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
a=np.zeros((1,256),dtype=np.uint8)
#finding how many times a particular pixel intensity repeats
for x in range (0,183): #size of gray_img is (184,275)
for y in range (0,274):
g=gry_ img[x,y]
a[g]=a[g]+1
print(a)
Error is as follows:
IndexError: index 150 is out of bounds for axis 0 with size 1
Since you haven't supplied the image, it is only from guessing that it seems you've made a mistake with the dimensions of the image. Alternatively the issue is entirely with the shape of your results array a.
The code you have is rather fragile, and here is a cleaner way to interact with images. I use an image from opencv's data directory: aero1.jpg.
The code here resolves both potential issues identified above, whichever one it was:
fname = 'aero1.jpg'
im = cv2.imread(fname)
gry_img = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
gry_img.shape
>>> (480, 640)
# note that the image is 640pix wide by 480 tall;
# the numpy array shows the number of rows first.
# rows are in y / columns are in x
# NOTE the results array `a` need only be 1-dimensional, not 2d (1x256)
a=np.zeros((256, ), dtype=np.uint8)
# iterating over all pixels, whatever the shape of the image.
height, width = gry_img.shape
for x in xrange(width):
for y in xrange(height):
g = gry_img[y, x] # NOTE y, x not x, y
a[g] += 1
But note that you could also achieve this easily with a numpy function np.histogram (docs), with slightly careful handling of the bin edges.
histb, bin_edges = np.histogram(gry_img.reshape(-1), bins=xrange(0, 257))
# check that we arrived at the same result as iterating manually:
(a == histb).all()
>>> True

Resources