prediction in tobit survival model - survival-analysis

Consider a tobit model:
ln(T) = xb + e
where e~ iid N(0,s2). Abusing notation,
we get ML estimates b and s2.
Now suppose we want an estimate
(prediction) of T. As I understand it,
the usual procedure is to exponentiate the
linear predictor, ie estimate T by exp(xb).
But isn't this wrong? If the tobit model
holds, then
T = exp(xb)exp(e)
and if e is normal, then exp(e) is
lognormal, with an expectation that
isn't unity but instead depends on
(the estimate of) s2. Shouldn't we
take this into account when doing
a prediction? Am I missing something?

I may be able to answer my own question. When we do exp(bx) we're basing the prediction on the median, not the mean.

Related

Modelling known noise with gaussian process model

I need some help understanding how to model known noise using a Gaussian process model.
I have some noisy data; for the purpose of this discussion lets say the noise is Gaussian.
I can model this noise using a GP model with the following kernel.
k1 = Matern(length_scale=[3, 0.2, 0.2])
k2 = WhiteKernel(noise_level=1)
kernel = k1 + k2
I fit to the data, and get a nice result (image here - sorry I am apparently not allowed to post inline images?)
In this case, I am fitting the kernel hyperparameters to the data, and everything seems to be working well. However, this is a rather artificial situation - normally I won't have the luxury of specifically feeding the GP multiple data points at the same parameter values, but I will normally have an estimate of noise in advance, so I need to figure out how to model this noise without explicitly fitting the model.
What I am confused about is how to interpret and set the noise_level. From what I have read in the docs, I should interpret noise_level parameter as 'corresponding to the variance of Gaussian noise'. In the case of the above data, following a model fit it predict the standard deviation is 0.5, corresponding to a variance of 0.25. This is correct - however, the noise_level in gp.kernel_.k2.noise_level is 1.09902. I would think there is some correspondence between this number and the predicted std, but I can't figure out what it is. Furthermore, when I perform the same experiment with the different data noise levels, the prediction of the standard deviation remains good butgp.kernel_.k2.noise_level actually doesn't change at all...
What I would like to do is be able to say "In this data, I know that the noise is roughly x" and then set up the kernel accordingly like this:
k1 = Matern()
k2 = WhiteKernel(noise_level=something_derived_from_x, noise_level_bounds='fixed')
kernel = k1 + k2
However, I cannot figure out how I should do this. I feel like I've fundamentally misunderstood something, can anyone help me out?

Does SciKit Have A InHouse Function That Tallies The Accuracy For Each Y Solution?

I have LinearSVC algorithm that predicts some data for stock. It has a 90% acc rating, but I think this might be due to the fact that some y's are far more likely than others. I want to see if there is a way to see if for each y I've defined, how accurately that y was predicted.
I haven't seen anything like this in the docs, but it just makes sense to have it.
If what your really want is a measure of confidence rather than actual probabilities, you can use the method LinearSVC.decision_function(). See the documentation or the probability calibration CalibratedClassifierCV using this documentation.
You can use a confusion matrix representation implemented in SciKit to generate an accuracy matrix between the predicted and real values of your classification problem for each individual attribute. The diagonal represents the raw accuracy, which can easily be converted to a percentage accuracy.

Specify log-normal family in BAMBI model

I'm trying to fit a simple Bayesian regression model to some right-skewed data. Thought I'd try setting family to a log-normal distribution. I'm using pymc3 wrapper BAMBI. Is there a way to build a custom family with a log-normal distribution?
It depends on what you want the mean function of the model to look like.
If you want a model like
then Yes, this is easily achieved by simply log transforming Y and then estimating the usual linear model with Normal response. Notice that in this model Y is an exponential function of the predictor X, so when plotting Y against X (both untransformed), the regression line can curve up or down. It also has a multiplicative error term so that the variance is greater for larger predicted Y values. We can say that such a model has a log link function and a lognormal response.
But if you want a model like
then No, this kind of model is not currently supported by bambi*. This is a model with a lognormal response but an identity link function. The regression of Y on X is a straight line, but the errors have the same lognormal distribution at every point along X, so that the variance does not increase for larger predicted Y values. Note that this is an unusual model that I personally have never actually seen used.
* It's possible in theory to roll your own custom Families (although it would require some slight hacking), but the way this is designed in bambi ultimately depends on the families implemented in statsmodels.genmod, which does not currently include lognormal.
Unless I'm misunderstanding something, I think all you need to do is specify link='log' in the fit() call. If your assumption is correct, the exponentiated linear prediction will be normally distributed, and the default error distribution is gaussian, so I don't think you need to build a custom family for this—the default gaussian family with a log link should work fine. But feel free to clarify if this doesn't address your question.

How to get started with Tensorflow

I am pretty new to Tensorflow, and I am currently learning it through given website https://www.tensorflow.org/get_started/get_started
It is said in the manual that:
We've created a model, but we don't know how good it is yet. To evaluate the model on training data, we need a y placeholder to provide the desired values, and we need to write a loss function.
A loss function measures how far apart the current model is from the provided data. We'll use a standard loss model for linear regression, which sums the squares of the deltas between the current model and the provided data. linear_model - y creates a vector where each element is the corresponding example's error delta. We call tf.square to square that error. Then, we sum all the squared errors to create a single scalar that abstracts the error of all examples using tf.reduce_sum:"
q1."we don't know how good it is yet.", I didn't understand this
quote as the simple model created is a simple slope equation and on
what it should train for?, as the model is a simple slope. Is it
require an perfect slope or what? why am I training that model and
for what?
q2.what is a loss function? Is loss function is used to determine the
accuracy of the model? Why is it required?
q3. I didn't understand " 'sums the squares of the deltas' between
the current model and the provided data."
q4.I didn't understood this part of code,"squared_deltas =
tf.square(linear_model - y)
this is the code:
y = tf.placeholder(tf.float32)
squared_deltas = tf.square(linear_model - y)
loss = tf.reduce_sum(squared_deltas)
print(sess.run(loss, {x:[1,2,3,4], y:[0,-1,-2,-3]}))
this may be simple questions, but I am a beginner to Tensorflow and having a hard time understanding it.
1) So you're kind of right about "Why should we train for a simple problem" but this is just an introduction piece. With any machine learning task you need to evaluate your model to see how good it is. In this case you are just trying to train to find the coefficients for the line of best fit.
2) A loss function in any machine learning context represents your error with your model. This usually means a function of your "distance" of your calculated value to the ground truth value. Think of it as an internal evaluation score. You want to minimise your loss so the gradients and parameter changes are based on your loss.
3/4) Your question here is more to do with least square regression. It's a statistical method to create lines of best fit between points. The deltas represent the differences between your calculated values and the truth values. The aim is to minimise the area of the squares and hence minise the error and have a better line of best fit.
What you are doing in this Tensorflow example is creating a machine learning model that will learn the coefficients for the line of best fit automatically using a least squares based system.
Pretty much all of your question have to-do with the loss function.
The loss function is a function that determines how far apart your output are from the expected (correct) output.
It has two usages:
Help the algorithm determine if the tweaking of the weight is helping going in the good or bad direction
Determinate the accuracy (~the number of time your system guesses the correct answer)
The loss function is the sum of the deltas witch is: the addition of the diff (delta) between the expected output and the actual output.
I think It's squared to magnifies the error the algorithm makes.

scikit learn linearsvc confidence

I'm using scikit-learn's LinearSVC SVM implementation, and I'm trying understand the multi-class prediction. Looking at coef_ and intercept_ I can get the hyperplane weights. For example, on my learning problem with two features and four labels I get
f0 = 1.99861379*x1 - 0.09489263*x2 + 0.89433196
f1 = -2.04309715*x1 - 3.51285420*x2 - 3.1206355
f2 = 0.73536996*x1 + 2.52111207*x2 - 3.04176149
f3 = -0.56607817*x1 - 0.16981337*x2 - 0.92804815
When I use the decision_function method I get the values that correspond to the above functions. But the documentation says
The confidence score for a sample is the signed distance of that
sample to the hyperplane.
But decision_function does not return the signed distance, it just returns f().
To be more specific, I'm assuming that the LinearSVC uses the standard trick of having a constant 1 feature to represent a threshold. (This might be wrong.) For my example problem this gives a three dimensional feature space where instances are always of the form (1,x1,x2). Assuming no other threshold term, the algorithm learns a hyperplane w=(w0, w1, w2) that goes through the origin in this three dimensional space. Now I get a point to predict, call it z=(1,a,b). What is the signed distance (margin) of this point to the hyperplane. It's just dot(w,z)/2norm(w). The LinearSVC code is returning dot(w,z)
Thanks,
Chris

Resources