Is there a way to calculate standard errors when using predict.mppm? - spatstat

I'm using spatstat to run some mppm models and would like to be able to calculate standard errors for the predictions as in predict.ppm. I could use predict.ppm on each point process individually of course, but I'm wondering if this in invalid for any reason or if there is a better way of doing so?

This is not yet implemented as an option in predict.mppm. (It is on our long list of things to do. I will move it closer to the top of the list.)
However, it is available by applying predict.ppm to each element of subfits(model), where model was the original fitted model of class mppm. Something like:
m <- mppm(......)
fits <- subfits(m)
Y <- lapply(fits, predict, se=TRUE)
Just to clarify, fits[[i]] is a point process model, of class ppm, for the data in row i of the data hyperframe, implied by the big model m. The parameter estimates and variance estimates in fits[[i]] are based on information from the entire hyperframe. This is not the same as fitting a separate model of class ppm to the data in each row of the hyperframe and calculating predictions and standard errors for those fits.

Related

In the scikit learn implementation of LDA what is the difference between transform and decision_function?

I am currently working on a project that uses Linear Discriminant Analysis to transform some high-dimensional feature set into a scalar value according to some binary labels.
So I train LDA on the data and the labels and then use either transform(X) or decision_function(X) to project the data into a one-dimensional space.
I would like to understand the difference between these two functions. My intuition would be that the decision_function(X) would be transform(X) + bias, but this is not the case.
Also, I found that those two functions give a different AUC score, and thus indicate that it is not a monotonic transformation as I would have thought.
In the documentation, it states that the transform(X) projects the data to maximize class separation, but I would have expected decision_function(X) to do this.
I hope someone could help me understand the difference between these two.
LDA projects your multivariate data onto a 1D space. The projection is based on a linear combination of all your attributes (columns in X). The weights of each attribute are determined by maximizing the class separation. Subsequently, a threshold value in 1D space is determined which gives the best classification results. transform(X) gives you the value of each observation in this 1D space x' = transform(X). decision_function(X) gives you the log-likelihood of an attribute being a positive class log(P(y=1|x')).

What's the selection criterion for the "best" model in "regsubsets" & how do I access several "best" models?

I've been playing around with the regsubsets function a bit, using the "forward" method to select variables for a linear regression model. However, despite also reading the documentation I can't seem to figure out, how the leaps.setup underlying this function determines the "best" model for each separate number of variables in a model.
Say I have a model with potential 10 variables in it (and nvmax = 10), I get exactly one "best" model for a model with 1 var, 2 vars etc. But how is this model selected by the function? I wonder particularly because after having run this function, I'm able to extract the best model of all models with different(!) sizes by determining a specific criterion (e.g., adjr2).
Related to this, I wonder: If I set, for example, nbest = 5 I understand that the function calculates the five best models for each model size (i.e., for a model with ten variables it gives five different variations that perform better than the rest). If I understand that correctly, is there any way to extract these five models for a specific model size? That is, for example, display the coefficients of these five best models?
I hope, I'm being clear about my problems here... Please, let me know, if exemplary data or any further information will help to clarify the issue!
The "best" model picked by regsubsets is the one that minimizes the sum of the squares of the residuals.
I'm still working on the second question...
Addressing the second question: the next code displays the coefficients of the 5 best models for each quantity of explanatory variables, from 1 to 3 variables. Y is the response variable of the models.
library(leaps)
best_models = regsubsets( Y ~ ., data = data_set, nvmax=3, nbest=5)
coef(best_models, 1:15)

Specify log-normal family in BAMBI model

I'm trying to fit a simple Bayesian regression model to some right-skewed data. Thought I'd try setting family to a log-normal distribution. I'm using pymc3 wrapper BAMBI. Is there a way to build a custom family with a log-normal distribution?
It depends on what you want the mean function of the model to look like.
If you want a model like
then Yes, this is easily achieved by simply log transforming Y and then estimating the usual linear model with Normal response. Notice that in this model Y is an exponential function of the predictor X, so when plotting Y against X (both untransformed), the regression line can curve up or down. It also has a multiplicative error term so that the variance is greater for larger predicted Y values. We can say that such a model has a log link function and a lognormal response.
But if you want a model like
then No, this kind of model is not currently supported by bambi*. This is a model with a lognormal response but an identity link function. The regression of Y on X is a straight line, but the errors have the same lognormal distribution at every point along X, so that the variance does not increase for larger predicted Y values. Note that this is an unusual model that I personally have never actually seen used.
* It's possible in theory to roll your own custom Families (although it would require some slight hacking), but the way this is designed in bambi ultimately depends on the families implemented in statsmodels.genmod, which does not currently include lognormal.
Unless I'm misunderstanding something, I think all you need to do is specify link='log' in the fit() call. If your assumption is correct, the exponentiated linear prediction will be normally distributed, and the default error distribution is gaussian, so I don't think you need to build a custom family for this—the default gaussian family with a log link should work fine. But feel free to clarify if this doesn't address your question.

Improving linear regression model by taking absolute value of predicted output?

I have a particular classification problem that I was able to improve using Python's abs() function. I am still somewhat new when it comes to machine learning, and I wanted to know if what I am doing is actually "allowed," so to speak, for improving a regression problem. The following line describes my method:
lr = linear_model.LinearRegression()
predicted = abs(cross_val_predict(lr, features, labels_postop_IS, cv=10))
I attempted this solution because linear regression can sometimes produce negative predictions values, even though my particular case, these predictions should never be negative, as they are a physical quantity.
Using the abs() function, my predictions produce a better fit for the data.
Is this allowed?
Why would it not be "allowed". I mean if you want to make certain statistical statements (like a 95% CI e.g.) you need to be careful. However, most ML practitioners do not care too much about underlying statistical assumptions and just want a blackbox model that can be evaluated based on accuracy or some other performance metric. So basically everything is allowed in ML, you just have to be careful not to overfit. Maybe a more sensible solution to your problem would be to use a function that truncates at 0 like f(x) = x if x > 0 else 0. This way larger negative values don't suddenly become large positive ones.
On a side note, you should probably try some other models as well with more parameters like a SVR with a non-linear kernel. The thing is obviously that a LR fits a line, and if this line is not parallel to your x-axis (thinking in the single variable case) it will inevitably lead to negative values at some point on the line. That's one reason for why it is often advised not to use LRs for predictions outside the "fitted" data.
A straight line y=a+bx will predict negative y for some x unless a>0 and b=0. Using logarithmic scale seems natural solution to fix this.
In the case of linear regression, there is no restriction on your outputs.
If your data is non-negative (as in your case the values are physical quantities and cannot be negative), you could model using a generalized linear model (GLM) with a log link function. This is known as Poisson regression and is helpful for modeling discrete non-negative counts such as the problem you described. The Poisson distribution is parameterized by a single value λ, which describes both the expected value and the variance of the distribution.
I cannot say your approach is wrong but a better way is to go towards the above method.
This results in an approach that you are attempting to fit a linear model to the log of your observations.

How to get started with Tensorflow

I am pretty new to Tensorflow, and I am currently learning it through given website https://www.tensorflow.org/get_started/get_started
It is said in the manual that:
We've created a model, but we don't know how good it is yet. To evaluate the model on training data, we need a y placeholder to provide the desired values, and we need to write a loss function.
A loss function measures how far apart the current model is from the provided data. We'll use a standard loss model for linear regression, which sums the squares of the deltas between the current model and the provided data. linear_model - y creates a vector where each element is the corresponding example's error delta. We call tf.square to square that error. Then, we sum all the squared errors to create a single scalar that abstracts the error of all examples using tf.reduce_sum:"
q1."we don't know how good it is yet.", I didn't understand this
quote as the simple model created is a simple slope equation and on
what it should train for?, as the model is a simple slope. Is it
require an perfect slope or what? why am I training that model and
for what?
q2.what is a loss function? Is loss function is used to determine the
accuracy of the model? Why is it required?
q3. I didn't understand " 'sums the squares of the deltas' between
the current model and the provided data."
q4.I didn't understood this part of code,"squared_deltas =
tf.square(linear_model - y)
this is the code:
y = tf.placeholder(tf.float32)
squared_deltas = tf.square(linear_model - y)
loss = tf.reduce_sum(squared_deltas)
print(sess.run(loss, {x:[1,2,3,4], y:[0,-1,-2,-3]}))
this may be simple questions, but I am a beginner to Tensorflow and having a hard time understanding it.
1) So you're kind of right about "Why should we train for a simple problem" but this is just an introduction piece. With any machine learning task you need to evaluate your model to see how good it is. In this case you are just trying to train to find the coefficients for the line of best fit.
2) A loss function in any machine learning context represents your error with your model. This usually means a function of your "distance" of your calculated value to the ground truth value. Think of it as an internal evaluation score. You want to minimise your loss so the gradients and parameter changes are based on your loss.
3/4) Your question here is more to do with least square regression. It's a statistical method to create lines of best fit between points. The deltas represent the differences between your calculated values and the truth values. The aim is to minimise the area of the squares and hence minise the error and have a better line of best fit.
What you are doing in this Tensorflow example is creating a machine learning model that will learn the coefficients for the line of best fit automatically using a least squares based system.
Pretty much all of your question have to-do with the loss function.
The loss function is a function that determines how far apart your output are from the expected (correct) output.
It has two usages:
Help the algorithm determine if the tweaking of the weight is helping going in the good or bad direction
Determinate the accuracy (~the number of time your system guesses the correct answer)
The loss function is the sum of the deltas witch is: the addition of the diff (delta) between the expected output and the actual output.
I think It's squared to magnifies the error the algorithm makes.

Resources