RPy2 Convert Dataframe to SpatialGridDataFrame - rpy2

how can a Dataframe be converted to a SpatialGridDataFrame using the R maptools library? I am new to Rpy2, so this might be a very basic question.
The R Code is:
coordinates(dataf)=~X+Y
In Python:
import rpy2
import rpy2.robjects as robjects
r = robjects.r
# Create a Test Dataframe
d = {'TEST': robjects.IntVector((221,412,332)), 'X': robjects.IntVector(('25', '31', '44')), 'Y': robjects.IntVector(('25', '35', '14'))}
dataf = robjects.r['data.frame'](**d)
r.library('maptools')
# Then i could not manage to write the above mentioned R-Code using the Rpy2 documentation
Apart this particular question i would be pleased to get some feedback on a more general idea: My final goal would be to make regression-kriging with spatial data using the gstat library. The R-script is working fine, but i would like to call my Script from Python/Arcgis. What do you think about this task, is this possible via rpy2?
Thanks a lot!
Richard

In some cases, Rpy2 is still unable to dynamically (and automagically) generate smart bindings.
An analysis of the R code will help:
coordinates(dataf)=~X+Y
This can be more explicitly written as:
dataf <- "coordinates<-"(dataf, formula("~X+Y"))
That last expression makes the Python/rpy2 straigtforward:
from rpy2.robjects.packages import importr
sp = importr('sp') # "coordinates<-()" is there
from rpy2.robjects import baseenv, Formula
maptools_set = baseenv.get('coordinates<-')
dataf = maptools_set(dataf, Formula(' ~ X + Y'))
To be (wisely) explicit about where "coordinates<-" is coming from, use:
maptools_set = getattr(sp, 'coordinates<-')

Related

How to get the interceipt from model summary in Python linearmodels?

I am running a panel reggression using Python linearmodels, something like:
import pandas as pd
from linearmodels.panel import PanelOLS
data = pd.read_csv('data.csv', sep=',')
data = data.set_index(['panel_id', 'date'])
controls = ['A','B','C']
controls['const'] = 1
model = PanelOLS(data.Y, controls, entity_effects= True)
result = model.fit(use_lsdv=True)
I really need to pull out the coefficient on the constant, but looks like this would not work
intercept = result.summary.const
Could not really find the answer in
linearmodels' documentation on github
More generally, does anyone know how to pull out the estimate coefficients from the linearmodels summary? Thank you!
result.params['const']
would give the intercept, in general result.params gives the series of regression coefficients in linearmodels

What would the equivalent machine learning program in R language of this Python one?

As part of a school assignment on DSL and code generation, I have to translate the following program written in Python/Scikit-learn into R language (the topic of the exercise is an hypothetic Machine Learning DSL).
import pandas as pd
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import cross_validate
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
df = pd.read_csv('boston.csv', sep=',')
df.head()
y = df["medv"]
X = df.drop(columns=["medv"])
clf = DecisionTreeRegressor()
scoring = ['neg_mean_absolute_error','neg_mean_squared_error']
results = cross_validate(clf, X, y, cv=6,scoring=scoring)
print('mean_absolute_errors = '+str(results['test_neg_mean_absolute_error']))
print('mean_squared_errors = '+str(results['test_neg_mean_squared_error']))
Since I'm a perfect newbie in Machine Learning, and especially in R, I can't do it.
Could someone help me ?
Sorry for the late answer, probably you have already finished your school assignment. Of course we cannot just do it for you, you probably have to figure it out by yourself. Moreover, I don't get exactly what you need to do. But some tips are:
Read a csv file
data <-read.csv(file="name_of_the_file", header=TRUE, sep=",")
data <-as.data.frame(data)
The header=TRUE indicates that the file has one row which includes the names of the columns, the sep=',' is the same as in python (the seperator in the file is ',')
The as.data.frame makes sure that your data is kept in a dataframe format.
Add/delete a column
data<- data[,-"name_of_the_column_to_be_deleted"] #delete a column
data$name_of_column_to_be_added<- c(1:10) #add column
In order to add a column you will need to add the elements it will include. Also the # symbol indicates the beginning of a comment.
Modelling
For the modelling part I am not sure about what you want to achieve, but R offers a huge selection of algorithms to choose from (i.e. if you want to grow a tree take a look into the page https://www.statmethods.net/advstats/cart.html where it uses the following script to grow a tree
fit <- rpart(Kyphosis ~ Age + Number + Start,
method="class", data=kyphosis))

Calculating Log base c in python3

Is there a way to calculate log base c in python?
c is a variable and may change due to some dependencies.
I am new to programming and also python3.
There is already a built in function in the math module in python that does this.
from math import log
def logOc(c, num):
return log(num,c)
print(log(3,3**24))
You can read more about log and the python math module here
Yes, you can simply use math's function log():
import math
c = 100
val = math.log(10000,c) #Where the first value is the number and the second the base.
print(val)
Example:
print(val)
2.0

Is there an automatic way where python select the variables that makes the most sense in an OLS regression?

Suppose I have the following Dataframe
import pandas as pd, numpy as np, statsmodels.formula.api as smf
# Generate the data
Stocks=100
mean = [0.5, 1000, 10]
Var = [0.5, 60, 3]
A=np.random.normal(loc=0.5,scale=0.5,size=(Stocks, 1))
for a, b in zip(mean, Var):
A=np.concatenate((A, np.random.normal(loc=a,scale=b, size=(Stocks,1))), axis=1)
df1=pd.DataFrame(A, columns=['Betas','M/B','Size', 'P/E'])
df1['PAR_stock']=0.08+0.801*df1['Size']+0.321*df1['M/B']+0.164*df1['P/E']-0.084*df1['Betas']
and I now have the following DataFrame. I want to select the variables that make the best fit between Beta, size and P/E and M/B.
formula = 'PAR_stock ~ Betas + Size + Q("P/E") + Q("M/B")'
results = smf.ols(formula, df1).fit()
print(results.summary())
I want python to do each and tell me which variables are the best to use in an OLS regression and tell me that this is the best model.
Is there a way to do this in python using machine learning codes.
To the best of my knowledge, there is a library in R, called glmulti is there something similar in python?
PS: I am still new to this so please do not be harsh in your comments. If you have any suggestion or a book that explains these things explicitly feel free to share it. Thank you for your co-operation

Is it possible to display the venn diagram within a universal set?

Is it possible to display the universal set using matplotlib-venn? I'm new to both python and the matplotlib package, so I'm not actually sure what's possible and what's not.
I'm trying to create a venn diagram generator that accepts the values for each circle and then an argument (ex. A intersection B), then highlights only the intersection of the two circles. Basically, this is what I want the output to be.
Python has package matplotlib_venn but it needs some tricks in your case.
import matplotlib.pyplot as plt
from matplotlib_venn import venn3
A = set([9,3,6])
B = set([2,4,6,8])
C = set([0,5,1,7])
v = venn3([A,B,C], ('P', 'Q', 'U'))
v.get_label_by_id('100').set_text('\n'.join(map(str,A-B)))
v.get_label_by_id('110').set_text('\n'.join(map(str,A&B)))
v.get_label_by_id('010').set_text('\n'.join(map(str,B-A)))
v.get_label_by_id('001').set_text('\n'.join(map(str,C)))
v.get_patch_by_id('001').set_color('white')
plt.axis('on')
plt.show()
You can use plt.annotate for configuring the position of values

Resources