Size of objects in Numpy Array: Sympy Points example - python-3.x

Here is attached some code in which it is attempted to instantiate a Numpy Array of Sympy Points given their coordinates.
from sympy.geometry import Point
import numpy as np
coordinates = np.array([2,0, 1,-1, 0,0, -2,3, -3,-2, 0.01,-0.01]).reshape(6,2)
v1 = np.empty((coordinates.shape[0],),dtype=np.dtype(Point))
for index, value in enumerate(coordinates):
v1[index] = Point(value)
print(v1)
# [Point2D(2, 0) Point2D(1, -1) Point2D(0, 0) Point2D(-2, 3) Point2D(-3, -2) Point2D(1/100, -1/100)]
v2 = np.empty((coordinates.shape[0],2),dtype=np.dtype(Point))
for index, value in enumerate(coordinates):
v2[index] = Point(value)
print(v2)
# [[2 0]
# [1 -1]
# [0 0]
# [-2 3]
# [-3 -2]
# [1/100 -1/100]]
v3 = np.empty((coordinates.shape[0],1),dtype=np.dtype(Point))
for index, value in enumerate(coordinates):
v3[index] = Point(value)
print(v3)
# ValueError: cannot copy sequence with size 2 to array axis with dimension 1
I would have expected that size 1 for the second index of the result should be correct way to do it (v3), because a Point should be 1 object but it gives an error.
v1 is what I would have expected to get doing it the v3 way. Why does it have to be done the v1 way? Why is v1 wrong? Why is v2 not a vector of Sympy Points?

Related

Pytorch tensor changing colors

I would appreciate any help on that.
Why after putting tensor of 3d (image) into 4d tensor, the image colors changed.
p = "path/to/image"
p = Image.open(p)
p = transforms.PILToTensor()(p)
transforms.ToPILImage()(p).show() # ok (left pic)
temp = torch.zeros(4, p.size()[0], p.size()[1], p.size()[2])
temp[0] = p
transforms.ToPILImage()(temp[0]).show() # not ok (right pic)
The reason is that the first tensor p is an integer tensor and values range between 0 - 255. The second image is a float tensor and the values range between 0.0 - 255.0. imshow function expects integer values between 0 - 255 or float values between 0 - 1, you can read more here.
To fix this problem, you have two options either add the dtype=torch.uint8 when you define a temp tensor or divide the values of the tensor by 255 to scale it between 0 -1.
# cell 1
from PIL import Image
from torchvision import transforms
import torch
from matplotlib import pyplot as plt
p = Image.open("pi.png")
p = transforms.PILToTensor()(p).permute(1, 2, 0)
plt.imshow( p ) #ok
# cell 2
temp = torch.zeros(4, p.size()[0], p.size()[1], p.size()[2], dtype=torch.uint8)
temp[0] = p
plt.imshow(temp[0]) # or you can use plt.imshow(temp[0]/255)

operating on array with condition

Consider the following code,
import numpy as np
xx = np.asarray([1,0,1])
def ff(x):
return np.sin(x)/x
# this throws an error because of division by zero
# C:\Users\User\AppData\Local\Temp/ipykernel_2272/525615690.py:4:
# RuntimeWarning: invalid value encountered in true_divide
# return np.sin(x)/x
yy = ff(xx)
# to avoid the error, I did the following
def ff_smart(x):
if (x==0):
# because sin(x)/x = 1 as x->0
return 1
else:
return np.sin(x)/x
# but then I cannot do
# yy_smart = ff_smart(xx)
# because of ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
# I therefore have to do:
*yy_smart, = map(ff_smart,xx)
yy_smart = np.asarray(yy_smart)
Is there a way (some numpy magic) to write ff_smart such that I can call it without using map and ff_smart remains operable on scalars (non numpy arrays). I'd like to avoid type-checking in ff_smart.
you can do:
yy = [np.sin(x)/x if x != 0 else 1 for x in xx]
If you want to use the power of numpy, a different answer, still useful to know is to use masked arrays:
# initialize x
x = np.array([2, 3, 1, 0, 2])
# compute the masked array of x, masking out 0s
masked_x = np.ma.array(x, mask= x == 0, dtype=x.dtype)
# perform operation only on non-zero values
y = np.sin(masked_x) / masked_x
# get the value back, filling the masked out values with 1s.
y = np.ma.filled(y, fill_value=1)
For conditional operations as you describe numpy has the numpy where function.
You can do
np.where(x==0, 1, np.sin(x)/x)

Is there a way to generate correlated variable array from an existing array in Python 3? [duplicate]

I have a non-generated 1D NumPy array. For now, we will use a generated one.
import numpy as np
arr1 = np.random.uniform(0, 100, 1_000)
I need an array that will be correlated 0.3 with it:
arr2 = '?'
print(np.corrcoef(arr1, arr2))
Out[1]: 0.3
I've adapted this answer by whuber on stats.SE to NumPy. The idea is to generate a second array noise randomly, and then compute the residuals of a least-squares linear regression of noise on arr1. The residuals necessarily have a correlation of 0 with arr1, and of course arr1 has a correlation of 1 with itself, so an appropriate linear combination of a*arr1 + b*residuals will have any desired correlation.
import numpy as np
def generate_with_corrcoef(arr1, p):
n = len(arr1)
# generate noise
noise = np.random.uniform(0, 1, n)
# least squares linear regression for noise = m*arr1 + c
m, c = np.linalg.lstsq(np.vstack([arr1, np.ones(n)]).T, noise)[0]
# residuals have 0 correlation with arr1
residuals = noise - (m*arr1 + c)
# the right linear combination a*arr1 + b*residuals
a = p * np.std(residuals)
b = (1 - p**2)**0.5 * np.std(arr1)
arr2 = a*arr1 + b*residuals
# return a scaled/shifted result to have the same mean/sd as arr1
# this doesn't change the correlation coefficient
return np.mean(arr1) + (arr2 - np.mean(arr2)) * np.std(arr1) / np.std(arr2)
The last line scales the result so that the mean and standard deviation are the same as arr1's. However, arr1 and arr2 will not be identically distributed.
Usage:
>>> arr1 = np.random.uniform(0, 100, 1000)
>>> arr2 = generate_with_corrcoef(arr1, 0.3)
>>> np.corrcoef(arr1, arr2)
array([[1. , 0.3],
[0.3, 1. ]])

Issue when trying to plot after applying PCA on a dataset

I am trying to plot the results of PCA of the dataset pima-indians-diabetes.csv. My code shows a problem only in the plotting piece:
import numpy
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import pandas as pd
# Dataset Description:
# 1. Number of times pregnant
# 2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
# 3. Diastolic blood pressure (mm Hg)
# 4. Triceps skin fold thickness (mm)
# 5. 2-Hour serum insulin (mu U/ml)
# 6. Body mass index (weight in kg/(height in m)^2)
# 7. Diabetes pedigree function
# 8. Age (years)
# 9. Class variable (0 or 1)
path = 'pima-indians-diabetes.data.csv'
dataset = numpy.loadtxt(path, delimiter=",")
X = dataset[:,0:8]
Y = dataset[:,8]
features = ['1','2','3','4','5','6','7','8','9']
df = pd.read_csv(path, names=features)
x = df.loc[:, features].values # Separating out the values
y = df.loc[:,['9']].values # Separating out the target
x = StandardScaler().fit_transform(x) # Standardizing the features
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
# principalDf = pd.DataFrame(data=principalComponents, columns=['pca1', 'pca2'])
# finalDf = pd.concat([principalDf, df[['9']]], axis = 1)
plt.figure()
colors = ['navy', 'turquoise', 'darkorange']
lw = 2
for color, i, target_name in zip(colors, [0, 1, 2], ['Negative', 'Positive']):
plt.scatter(principalComponents[y == i, 0], principalComponents[y == i, 1], color=color, alpha=.8, lw=lw,
label=target_name)
plt.legend(loc='best', shadow=False, scatterpoints=1)
plt.title('PCA of pima-indians-diabetes Dataset')
The error is located at the following line:
Traceback (most recent call last):
File "test.py", line 53, in <module>
plt.scatter(principalComponents[y == i, 0], principalComponents[y == i, 1], color=color, alpha=.8, lw=lw,
IndexError: too many indices for array
Kindly, how to fix this?
As the error indicates some kind of shape/dimension mismatch, a good starting point is to check the shapes of the arrays involved in the operation:
principalComponents.shape
yields
(768, 2)
while
(y==i).shape
(768, 1)
Which leads to a shape mismatch when trying to run
principalComponents[y==i, 0]
as the first array is already multidimensional, therefore the error is indicating that you used too many indices for the array.
You can fix this by forcing the shape of y==i to a 1D array ((768,)), e.g. by changing your call to scatter to
plt.scatter(principalComponents[(y == i).reshape(-1), 0],
principalComponents[(y == i).reshape(-1), 1],
color=color, alpha=.8, lw=lw, label=target_name)
which then creates the plot for me
For more information on the difference between arrays of the shape (R, 1)and (R,) this question on StackOverflow provides a nice starting point.

Python Shape function for k-means clustering

I have one geotiff grey scale image which gave me the (4377, 6172) 2D array. In the first part, I am considering (:1024, :1024) values(Total values are -> 1024 * 1024 = 1048576) for my compression algorithm. Through this algorithm, I am getting total 4 values in finalmatrix list var through the algorithm. After this, I am applying K-means algorithm on that values. A program is below :
import numpy as np
from osgeo import gdal
from sklearn import cluster
import matplotlib.pyplot as plt
dataset =gdal.Open("1.tif")
band = dataset.GetRasterBand(1)
img = band.ReadAsArray()
finalmat = [255, 0, 2, 2]
#Converting list to array for dimensional change
ay = np.asarray(finalmat).reshape(-1,1)
fig = plt.figure()
k_means = cluster.KMeans(n_clusters=2)
k_means.fit(ay)
cluster_means = k_means.cluster_centers_.squeeze()
a_clustered = k_means.labels_
print('# of observation :',ay.shape)
print('Cluster Means : ', cluster_means)
a_clustered.shape= img.shape
fig=plt.figure(figsize=(125,125))
ax = plt.subplot(2,4,8)
plt.axis('off')
xlabel = str(1) , ' clusters'
ax.set_title(xlabel)
plt.imshow(a_clustered)
plt.show()
fig.savefig('kmeans-1 clust ndvi08jan2010_guj 12 .png')
In the above Program I am getting error in the line a_clustered.shape= img.shape. The error which I am getting is below:
Error line:
a_clustered.shape= img.shape
ValueError: cannot reshape array of size 4 into shape (4377,6172)
<matplotlib.figure.Figure at 0x7fb7c63975c0>
Actually, I want to visualize the clustering on Original image through compressed value which I am getting. Can you please give suggestion what to do
It does not make a lot of sense to use KMeans on 1 dimensional data.
And it makes even less sense to use it on a 4 x 1 array!
Your site then comes from the fact that you can't just resize a 4 x 1 integer array into a large picture.
Just print the array a_clustered you are trying to plot. It probably contains [0, 1, 1, 1].

Resources