operating on array with condition - python-3.x

Consider the following code,
import numpy as np
xx = np.asarray([1,0,1])
def ff(x):
return np.sin(x)/x
# this throws an error because of division by zero
# C:\Users\User\AppData\Local\Temp/ipykernel_2272/525615690.py:4:
# RuntimeWarning: invalid value encountered in true_divide
# return np.sin(x)/x
yy = ff(xx)
# to avoid the error, I did the following
def ff_smart(x):
if (x==0):
# because sin(x)/x = 1 as x->0
return 1
else:
return np.sin(x)/x
# but then I cannot do
# yy_smart = ff_smart(xx)
# because of ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
# I therefore have to do:
*yy_smart, = map(ff_smart,xx)
yy_smart = np.asarray(yy_smart)
Is there a way (some numpy magic) to write ff_smart such that I can call it without using map and ff_smart remains operable on scalars (non numpy arrays). I'd like to avoid type-checking in ff_smart.

you can do:
yy = [np.sin(x)/x if x != 0 else 1 for x in xx]
If you want to use the power of numpy, a different answer, still useful to know is to use masked arrays:
# initialize x
x = np.array([2, 3, 1, 0, 2])
# compute the masked array of x, masking out 0s
masked_x = np.ma.array(x, mask= x == 0, dtype=x.dtype)
# perform operation only on non-zero values
y = np.sin(masked_x) / masked_x
# get the value back, filling the masked out values with 1s.
y = np.ma.filled(y, fill_value=1)

For conditional operations as you describe numpy has the numpy where function.
You can do
np.where(x==0, 1, np.sin(x)/x)

Related

Curve fitting with known coefficients in Python

I tried using Numpy, Scipy and Scikitlearn, but couldn't find what I need in any of them, basically I need to fit a curve to a dataset, but restricting some of the coefficients to known values, I found how to do it in MATLAB, using fittype, but couldn't do it in python.
In my case I have a dataset of X and Y and I need to find the best fitting curve, I know it's a polynomial of second degree (ax^2 + bx + c) and I know it's values of b and c, so I just needed it to find the value of a.
The solution I found in MATLAB was https://www.mathworks.com/matlabcentral/answers/216688-constraining-polyfit-with-known-coefficients which is the same problem as mine, but with the difference that their polynomial was of degree 5th, how could I do something similar in python?
To add some info: I need to fit a curve to a dataset, so things like scipy.optimize.curve_fit that expects a function won't work (at least as far as I tried).
The tools you have available usually expect functions only inputting their parameters (a being the only unknown in your case), or inputting their parameters and some data (a, x, and y in your case).
Scipy's curve-fit handles that use-case just fine, so long as we hand it a function that it understands. It expects x first and all your parameters as the remaining arguments:
from scipy.optimize import curve_fit
import numpy as np
b = 0
c = 0
def f(x, a):
return c+x*(b+x*a)
x = np.linspace(-5, 5)
y = x**2
# params == [1.]
params, _ = curve_fit(f, x, y)
Alternatively you can reach for your favorite minimization routine. The difference here is that you manually construct the error function so that it only inputs the parameters you care about, and then you don't need to provide that data to scipy.
from scipy.optimize import minimize
import numpy as np
b = 0
c = 0
x = np.linspace(-5, 5)
y = x**2
def error(a):
prediction = c+x*(b+x*a)
return np.linalg.norm(prediction-y)/len(prediction)**.5
result = minimize(error, np.array([42.]))
assert result.success
# params == [1.]
params = result.x
I don't think scipy has a partially applied polynomial fit function built-in, but you could use either of the above ideas to easily build one yourself if you do that kind of thing a lot.
from scipy.optimize import curve_fit
import numpy as np
def polyfit(coefs, x, y):
# build a mapping from null coefficient locations to locations in the function
# coefficients we're passing to curve_fit
#
# idx[j]==i means that unknown_coefs[i] belongs in coefs[j]
_tmp = [i for i,c in enumerate(coefs) if c is None]
idx = {j:i for i,j in enumerate(_tmp)}
def f(x, *unknown_coefs):
# create the entire polynomial's coefficients by filling in the unknown
# values in the right places, using the aforementioned mapping
p = [(unknown_coefs[idx[i]] if c is None else c) for i,c in enumerate(coefs)]
return np.polyval(p, x)
# we're passing an initial value just so that scipy knows how many parameters
# to use
params, _ = curve_fit(f, x, y, np.zeros((sum(c is None for c in coefs),)))
# return all the polynomial's coefficients, not just the few we just discovered
return np.array([(params[idx[i]] if c is None else c) for i,c in enumerate(coefs)])
x = np.linspace(-5, 5)
y = x**2
# (unknown)x^2 + 1x + 0
# params == [1, 0, 0.]
params = fit([None, 0, 0], x, y)
Similar features exist in nearly every mainstream scientific library; you just might need to reshape your problem a bit to frame it in terms of the available primitives.

Receiving coordinates from inference Pytorch

I'm trying to get the coordinates of the pixels inside of a mask that is generated by Pytorches DefaultPredictor, to later on get the polygon corners and use this in my application.
However, DefaultPredictor produced a tensor of pred_masks, in the following format: [False, False ... False], ... [False, False, .. False]
Where the length of each individual list is length of the image, and the number of total lists is the height of the image.
Now, as I need to get the pixel coordinates that are inside of the mask, the simple solution seemed to be looping through the pred_masks, checking the value and if == "True" creating tuples of these and adding them to a list. However, as we are talking about images with width x height of about 3200 x 1600, this is a relatively slow process (~4 seconds to loop through a single 3200x1600, yet as there are quite some objects for which I need to get the inference in the end - this will end up being incredibly slow).
What would be the smarter way to get the the coordinates (mask) of the detected object using the pytorch (detectron2) model?
Please find my code below for reference:
from __future__ import print_function
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.data import MetadataCatalog
from detectron2.data.datasets import register_coco_instances
import cv2
import time
# get image
start = time.time()
im = cv2.imread("inputImage.jpg")
# Create config
cfg = get_cfg()
cfg.merge_from_file("detectron2_repo/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5 # Set threshold for this model
cfg.MODEL.WEIGHTS = "model_final.pth" # Set path model .pth
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1
cfg.MODEL.DEVICE='cpu'
register_coco_instances("dataset_test",{},"testval.json","Images_path")
test_metadata = MetadataCatalog.get("dataset_test")
# Create predictor
predictor = DefaultPredictor(cfg)
# Make prediction
outputs = predictor(im)
#Loop through the pred_masks and check which ones are equal to TRUE, if equal, add the pixel values to the true_cords_list
outputnump = outputs["instances"].pred_masks.numpy()
true_cords_list = []
x_length = range(len(outputnump[0][0]))
#y kordinaat on range number
for y_cord in range(len(outputnump[0])):
#x cord
for x_cord in x_length:
if str(outputnump[0][y_cord][x_cord]) == "True":
inputcoords = (x_cord,y_cord)
true_cords_list.append(inputcoords)
print(str(true_cords_list))
end = time.time()
print(f"Runtime of the program is {end - start}") # 14.29468035697937
//
EDIT:
After changing the for loop partially to compress - I've managed to reduce the runtime of the for loop by ~3x - however, ideally I would like to receive this from the predictor itself if possible.
y_length = len(outputnump[0])
x_length = len(outputnump[0][0])
true_cords_list = []
for y_cord in range(y_length):
x_cords = list(compress(range(x_length), outputnump[0][y_cord]))
if x_cords:
for x_cord in x_cords:
inputcoords = (x_cord,y_cord)
true_cords_list.append(inputcoords)
The problem is easily solvable with sufficient knowledge about NumPy or PyTorch native array handling, which allows 100x speedups compared to Python loops. You can study the NumPy library, and PyTorch tensors are similar to NumPy in behaviour.
How to get indices of values in NumPy:
import numpy as np
arr = np.random.rand(3,4) > 0.5
ind = np.argwhere(arr)[:, ::-1]
print(arr)
print(ind)
In your particular case this will be
ind = np.argwhere(outputnump[0])[:, ::-1]
How to get indices of values in PyTorch:
import torch
arr = torch.rand(3, 4) > 0.5
ind = arr.nonzero()
ind = torch.flip(ind, [1])
print(arr)
print(ind)
[::-1] and .flip are used to inverse the order of coordinates from (y, x) to (x, y).
NumPy and PyTorch even allow checking simple conditions and getting the indices of values that meet these conditions, for further understanding see the according NumPy docs article
When asking, you should provide links for your problem context. This question is actually about Facebook object detector, where they provide a nice demo Colab notebook.

Python 3: Getting Value Error, sequence larger than 32

I'm trying to write a script that computes numerical derivatives using the forward, backward, and centered approximations, and plots the results. I've made a linspace from 0 to 2pi with 100 points. I've made many arrays and linspaces in the past, but I've never seen this error: "ValueError: sequence too large; cannot be greater than 32"
I don't understand what the problem is. Here is my script:
import numpy as np
import matplotlib.pyplot as plt
def f(x):
return np.cos(x) + np.sin(x)
def f_diff(x):
return np.cos(x) - np.sin(x)
def forward(x,h): #forward approximation
return (f(x+h)-f(x))/h
def backward(x,h): #backward approximation
return (f(x)-f(x-h))/h
def center(x,h): #center approximation
return (f(x+h)-f(x-h))/(2*h)
x0 = 0
x = np.linspace(0,2*np.pi,100)
forward_result = np.zeros(x)
backward_result = np.zeros(x)
center_result = np.zeros(x)
true_result = np.zeros(x)
for i in range(x):
forward_result[i] = forward[x0,i]
true_result[i] = f_diff[x0]
print('Forward (x0={}) = {}'.format(x0,forward(x0,x)))
#print('Backward (x0={}) = {}'.format(x0,backward(x0,dx)))
#print('Center (x0={}) = {}'.format(x0,center(x0,dx)))
plt.figure()
plt.plot(x, f)
plt.plot(x,f_diff)
plt.plot(x, abs(forward_result-true_result),label='Forward difference')
I did try setting the linspace points to 32, but that gave me another error: "TypeError: 'numpy.float64' object cannot be interpreted as an integer"
I don't understand that one either. What am I doing wrong?
The issue starts at forward_result = np.zeros(x) because x is a numpy array not a dimension. Since x has 100 entries, np.zeros wants to create object in R^x[0] times R^x[1] times R^x[3] etc. The maximum dimension is 32.
You need a flat np array.
UPDATE: On request, I add corrected lines from code above:
forward_result = np.zeros(x.size) creates the array of dimension 1.
Corrected evaluation of the function is done via circular brackets. Also fixed the loop:
for i, h in enumerate(x):
forward_result[i] = forward(x0,h)
true_result[i] = f_diff(x0)
Finally, in the figure, you want to plot numpy array vs function. Fixed version:
plt.plot(x, [f(val) for val in x])
plt.plot(x, [f_diff(val) for val in x])

Index 150 out of bounds in axis0 with size 1

I was making histogram using numpy array in Python with open cv. The code is as follows:
#finding histogram of an image
import numpy as np
import cv2
img = cv2.imread("cr7.jpg")
gry_img=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
a=np.zeros((1,256),dtype=np.uint8)
#finding how many times a particular pixel intensity repeats
for x in range (0,183): #size of gray_img is (184,275)
for y in range (0,274):
g=gry_ img[x,y]
a[g]=a[g]+1
print(a)
Error is as follows:
IndexError: index 150 is out of bounds for axis 0 with size 1
Since you haven't supplied the image, it is only from guessing that it seems you've made a mistake with the dimensions of the image. Alternatively the issue is entirely with the shape of your results array a.
The code you have is rather fragile, and here is a cleaner way to interact with images. I use an image from opencv's data directory: aero1.jpg.
The code here resolves both potential issues identified above, whichever one it was:
fname = 'aero1.jpg'
im = cv2.imread(fname)
gry_img = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
gry_img.shape
>>> (480, 640)
# note that the image is 640pix wide by 480 tall;
# the numpy array shows the number of rows first.
# rows are in y / columns are in x
# NOTE the results array `a` need only be 1-dimensional, not 2d (1x256)
a=np.zeros((256, ), dtype=np.uint8)
# iterating over all pixels, whatever the shape of the image.
height, width = gry_img.shape
for x in xrange(width):
for y in xrange(height):
g = gry_img[y, x] # NOTE y, x not x, y
a[g] += 1
But note that you could also achieve this easily with a numpy function np.histogram (docs), with slightly careful handling of the bin edges.
histb, bin_edges = np.histogram(gry_img.reshape(-1), bins=xrange(0, 257))
# check that we arrived at the same result as iterating manually:
(a == histb).all()
>>> True

Highlighting arbitrary points in a matplotlib plot

I am new to python and matplotlib.
I am trying to highlight a few points that match a certain criteria in an already existing plot in matplotlib.
The code for the initial plot is as below:
pl.plot(t,y)
pl.title('Damped Sine Wave with %.1f Hz frequency' % f)
pl.xlabel('t (s)')
pl.ylabel('y')
pl.grid()
pl.show()
In the above plot I wanted to highlight some specific points which match the criteria abs(y)>0.5. The code coming up with the points is as below:
markers_on = [x for x in y if abs(x)>0.5]
I tried using the argument 'markevery', but it throws an error saying
'markevery' is iterable but not a valid form of numpy fancy indexing;
The code that was giving the error is as below:
pl.plot(t,y,'-gD',markevery = markers_on)
pl.title('Damped Sine Wave with %.1f Hz frequency' % f)
pl.xlabel('t (s)')
pl.ylabel('y')
pl.grid()
pl.show()
The markevery argument to the plotting function accepts different types of inputs. Depending on the input type, they are interpreted differently. Find a nice list of possibilities in this matplotlib example.
In the case where you have a condition for the markers to show, there are two options. Assuming t and y are numpy arrays and one has imported numpy as np,
Either specify a boolean array,
plt.plot(t,y,'-gD',markevery = np.where(y > 0.5, True, False))
or
an array of indices.
plt.plot(t,y,'-gD',markevery = np.arange(len(t))[y > 0.5])
Complete example
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(42)
t = np.linspace(0,3,14)
y = np.random.rand(len(t))
plt.plot(t,y,'-gD',markevery = np.where(y > 0.5, True, False))
# or
#plt.plot(t,y,'-gD',markevery = np.arange(len(t))[y > 0.5])
plt.xlabel('t (s)')
plt.ylabel('y')
plt.show()
resulting in
markevery uses boolean values to mark every point where a boolean is True
so instead of markers_on = [x for x in y if abs(x)>0.5]
you'd do markers_on = [abs(x)>0.5 for x in y] which will return a list of boolean values the same size of y, and every point where |x| > 0.5 you'd get True
Then you'd use your code as is:
pl.plot(t,y,'-gD',markevery = markers_on)
pl.title('Damped Sine Wave with %.1f Hz frequency' % f)
pl.xlabel('t (s)')
pl.ylabel('y')
pl.grid()
pl.show()
I know this question is old, but I found this solution while trying to do the top answer as I'm not familiar with numpy and it seemed to overcomplicate things
The markevery argument only takes indices of type None, integer or boolean arrays as input. Since I was passing the values directly it was throwing the error.
I know it is not very pythonic but I used the below code to come up with the indices.
marker_indices = []
for x in range(len(y)):
if abs(y[x]) > 0.5:
marker_indices.append(x)
I was having this issue because I was trying to mark some points that were out of the bounds of the data frame.
For example:
some_df.shape
-> (276, 9)
markers = [1000, 1080, 1120]
some_df.plot(
x='date',
y=['speed'],
figsize=(17, 7), title="Performance",
legend=True,
marker='o',
markersize=10,
markevery=markers,
)
-> ValueError: markevery=[1000, 1080, 1120] is iterable but not a valid numpy fancy index
Just make sure that the values you are giving as markers are within the bounds of the data frame you want to plot.

Resources