convert numpy array to Matrix rpy2, Kmeans - rpy2

I have a numpy 2D array self.sub
and i want to use it in rpy2 kmeans.
k = robjects.r.kmeans(self.sub,2,20)
i always get the following error:
valueError: nothing can be done for the type at the moment!
what can i do?

From the rpy2 docs, R matrices are just vectors with their dim attribute set. So for a numpy two-dimensional array x
import rpy2.robjects as robj
nr, nc = x.shape
xvec = robj.FloatVector(x.transpose().reshape((x.size))
xr = robj.r.matrix(xvec, nrow=nr, ncol=nc)
You have to transpose the numpy array because R fills matrices by columns.
Edit: Actually, you could just set byrow=True in the R matrix function, and then you wouldn't need to transpose.

Related

Pandas Dataframe Matrix Multiplication using #

I am attempting to perform matrix multiplication between a Pandas DataFrame and a Pandas Series. I've set them up like so:
c = pd.DataFrame({
"s1":[.04, .018, .0064],
"s2":[.018, .0225, .0084],
"s3":[.0064, .0084, .0064],
})
x = pd.Series([0,0,0], copy = False)
I want to perform x # c # x, but I keep getting a ValueError: matrices are not aligned error flag. Am I not setting up my matrices properly? I'm not sure where I am going wrong.
x # c returns a Series object which has different indices as x. You can use the underlying numpy array to do the calculation:
(x # c).values # x.values
# 0.39880000000000004

Why .fit() needs 2D array as the first parameter?

import pandas as pd
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt
df = pd.read_csv('homeprices.csv')
plt.xlabel('area')
plt.ylabel('price')
plt.scatter(df.area,df.price,color='red',marker='.')
reg = linear_model.LinearRegression()
reg.fit(df.area,df.price)
Error Message:
ValueError: Expected 2D array, got 1D array instead:
array=[2600 3000 3200 3600 4000].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
It works fine if I write it as :
reg.fit(df[['area']],df.price)
I would like to know the reason behind it because The second argument is passed as df.price.
My csv file:
area,price
2600,550000
3000,565000
3200,610000
3600,680000
4000,725000
From the documentation, variable x should be declared as
X{array-like, sparse matrix} of shape (n_samples, n_features)
When you declare:
x = df.area or x = df['area'] the x will become Series type with the size (n,). The size should be (n, z), where z can be any positive integer.
x = df[['area']] the x will become DataFrame type with the size (5, 1) which makes an x an acceptable input.
y = df.price the y will become Series type with the size (5,) which s acceptable input.
y: array-like of shape (n_samples,)
But if I were you I declare x and y as:
x = [[i] for i in df['area']]
y = [i for i in df['price']]
which makes both x and y as the list structure and set the size to the (5, 1), so in the future if you want to run in any ML library (tensorflow, pytorch, keras, ...) you won't have any difficulties.
It's all about the input shape, the error was raised because its shape was (N,) while the correct one should be (N,1). That's why the error message suggests you to reshape.

How to find eigenvalues for two matrices

here are my two vectors-:
y1=[2,3,4,5,6,7]
y2=[1,5,3,6,7,8]
when i solve it with pen and paper!
it gives me an ans -: y1= 1.117y2
when i do that in python
import numpy as np
from numpy import linalg as LA
A = np.array([y1,y2])
w, v = LA.eig(A)
print(w)
print(v)
this error occurs LinAlgError: Last 2 dimensions of the array must be square
how can i solve this problem!
please help me , how can i do that!!
The issue here is that eigenvalues can only exist for square matrices, therefore Numpy expects to see an n x n dimensional matrix and not an n x m dimensional matrix such as the 2 x 6 matrix A in your example.

How can I interpolate a numpy array so that it becomes a certain length?

I have three numpy arrays each with different lengths:
A.shape = (3401,)
B.shape = (2200,)
C.shape = (4103,)
I would like to average the three arrays to produce a new array with size of the largest array (in this case C):
D.shape = (4103,)
Problem is, I don't think I can do this without adding "fake" data to A and B, by interpolation.
How can I perform interpolation on the first two numpy arrays so that they are of the same length as array C?
Do I even need to interpolate here?
First thing that comes to mind is zoom from scipy:
The array is zoomed using spline interpolation of the requested order.
Code:
import numpy as np
from scipy.ndimage import zoom
A = np.random.rand(3401)
B = np.random.rand(2200)
C = np.ones(4103)
for arr in [A, B]:
zoom_rate = C.shape[0] / arr.shape[0]
arr = zoom(arr, zoom_rate)
print(arr.shape)
Output:
(4103,)
(4103,)
I think the simplest option is to do the following:
D = np.concatenate([np.average([A[:2200], B, C[:2200]], axis=0),
np.average([A[2200:3401], C[2200:3401]], axis=0),
C[3401:]])

is there a way to return names from R vectors, matrices, etc. in rpy2 >= 3.0.0

I'd like to get the names from named R vectors (or matrices, etc.) back into Python. In rpy2 < 3.0.0 this was possible, e.g.,
>>> stats.quantile(numpy.array([1,2,3,4]))
R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x7f3e664d6d88 / R:0x55c939a540c8>
[1.000000, 1.750000, 2.500000, 3.250000, 4.000000]
>>> stats.quantile(numpy.array([1,2,3,4])).names
R object with classes: ('character',) mapped to:
<StrVector - Python:0x7f3e66510788 / R:0x55c939a53648>
['0%', '25%', '50%', '75%', '100%']
>>> stats.quantile(numpy.array([1,2,3,4])).rx('25%')
R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x7f3e68770bc8 / R:0x55c938f23ba8>
[1.750000]
But in rpy2 >= 3.0.0, the output is getting converted to a numpy array so of course there is no .names or .rx and therefore the names seem to be lost.
>>> stats.quantile(numpy.array([1,2,3,4]))
array([1. , 1.75, 2.5 , 3.25, 4. ])
rpy2 3.0.0 is trying to simplify its conversion system, and with this make its imperfections easier to both anticipate and mitigate.
Here, what is happening when the numpy conversion layer is active is that:
numpy arrays are converted to R arrays whenever needed by R
R arrays are converted to numpy arrays when returning from R
That symmetry is not a requirement, but just the way the default numpy conversion layer is. One can set up an asymmetrical conversion layer, which will be here converting numpy arrays to R arrays but leaving R arrays as such when returning from R, relatively quickly and easily.
import numpy
from rpy2.rinterface_lib import sexp
from rpy2 import robjects
from rpy2.robjects import conversion
from rpy2.robjects import numpy2ri
# We are going to build our custom converter by subtraction, that is
# starting from the numpy converter and only revert the part converting R
# objects into numpy arrays to the default conversion. We could have also
# build it by addition.
myconverter = conversion.Converter('assym. numpy',
template=numpy2ri.converter)
myconverter.rpy2py.register(sexp.Sexp,
robjects.default_converter.rpy2py)
That custom conversion can then be used when we need it:
with conversion.localconverter(myconverter):
res = stats.quantile(numpy.array([1, 2, 3, 4]))
The outcome is:
>>> print(res.names)
[1] "0%" "25%" "50%" "75%" "100%"
If this looks like too much effort, you can also skip the numpy converter altogether, only use the default converter, and manually cast your numpy arrays to suitable R arrays whenever you judge it necessary:
>>> stats.quantile(robjects.vectors.IntVector(numpy.array([1, 2, 3, 4]))).names
R object with classes: ('character',) mapped to:
['0%', '25%', '50%', '75%', '100%']

Resources