I have a question regarding time-dependant variables in survival analysis. Do you usually count age as a time varying variable?
I am looking at a population of cancer patients who received certain treatment and were cured initially. They were then followed up for certain period of time. The total follow up time is up to 15 years, so it is relatively long, but some have of course a much shorter follow up time. The event of interest is whether they developed recurrence of their cancer or not.
So in SAS, here is how I am doing this
proc phreg data=have ;
Title 'Cox for cancer recurrence';
class sex tumor_differentiation;
model Time*Recurrence(0)= age sex tumor_size tumor_differentiation/rl;
run;
And in R
surv_object <- Surv(time = df$Time, event = df$Recurrence)
fit.coxph <- coxph(surv_object ~ Age +Sex + TumorSize + TumorDifferentiation,
data = have)
The question here is, would you put in Age as a time dependant co-variate or would you just put in age at baseline in your model?
Thank you for your insights and appreciate your help
In the cancer research, the age at diagnosis has clinical importance, so in the usual setting, we use age at diagnosis as a variable, and it is fixed!
Related
I need to perform a mixed effect model and account for autocorrelation as data is taken in a time series. I have a data set that consists of a certain behaviour value (Activity) measured per day in different individuals and several years. I've grouped the behaviour values in 4 different periods.
I want to test if there are differences in behaviour values between the 4 different periods, including id and year as random effects. I am also interested in the interactions between periods and Sex. Right now I am using "nlme" as "lme4" doesn't allow to account for autocorrelation as far as I know.
model31 <- lme(Activity ~ periods * Sex, random = ~1|Year/Individual,
data = mydata)
However when I try to account for autocorrelation I am a bit lost and I am not sure how to do this. So far this is what I have tried:
model32 <- lme(Activity ~ periods * Sex, random = ~1|Year/Individual,
data = mydata, correlation = corAR1()) #what does this do?
Also, I want to account for temporal autocorrelation (activity at time t is influence by the previous value at t-1). For this I have dates for each sampled value, but this is not included in the original model, and I don't know how to do it. Hopelessly I though of this code, but it doesn't work:
model33 <- lme(Activity ~ periods * Sex, random = ~ 1|Year/Individual,
data = mydata, correlation= corCAR1(form = ~ date|periods))
And I get this error:
incompatible formulas for groups in 'random' and 'correlation'
But of course I am not interested in the autocorrelation of the variables included under random effects.
I am a bit lost here and any orientation on these would be very much appreciated.
Thank you
My data set has an age range variable, but I would like to calculate the mean and standard deviation of age.
Since your data is categorical, there isn't a way to calculate the "true" sample mean and standard deviation of respondent age. There are a few different ways you could estimate, depending on how sophisticated you'd like to get.
The simplest way would be to assign an age to each band (say, the mid-point) and summarize on that. The downside is that you will be underestimating the standard deviation (clumping data together tends to do that). To the extent your categories are not uniformly distributed (and from your image they don't appear to be), your estimate of the mean will also be off.
* set point estimates for each age band .
RECODE age (1=22) (2=30) (3=40) (4=50) (5=60) (6=70) (7=80) .
EXE .
* calculate mean and std dev .
MEANS age /CELLS MEAN STDDEV .
More sophisticated estimation techniques might try to account for skews in data (e.g. your sample seems to skew younger) and convert each age band into its own distribution.
For example, instead of assuming 203 respondents are age 22 (as is done in the code above), you might assume 25 respondents each are 18, 19, 20, ... 25. More realistically than that even, you might assume that even that distribution skews younger (e.g. 50 18-yr olds, 40 19-yr old, etc etc).
Automated approaches to that would be interesting as its own question. :)
A car's fuel consumption may be expressed in many different ways. For example, in Europe, it is shown as the amount of fuel consumed per 100 kilometers.
In the USA, it is shown as the number of miles traveled by a car using one gallon of fuel.
Your task is to write a pair of functions converting l/100km into mpg, and vice versa.
The functions:
are named l100kmtompg and mpgtol100km respectively;
take one argument (the value corresponding to their names)
Complete the code in the editor.
Run your code and check whether your output is the same as ours.
Here is some information to help you:
1 American mile = 1609.344 metres;
1 American gallon = 3.785411784 litres.
def l100kmtompg(liters):
def mpgtol100km(miles):
I know the question is confusion , cuz I spent hours doing this. ngl this is the real question
3.9 is the value given
so for first function
100*0.625/(3.9 *0.265)
2nd def(60.3)
(3.78/(60.3 * 1.6))*100
I am stuck in a statistics assignment, and would really appreciate some qualified help.
We have been given a data set and are then asked to find the 10% with the lowest rate of profit, in order to decide what Profit rate is the maximum in order to be considered for a program.
the data has:
Mean = 3,61
St. dev. = 8,38
I am thinking that i need to find the 10th percentile, and if i run the percentile function in excel it returns -4,71.
However I tried to run the numbers by hand using the z-score.
where z = -1,28
z=(x-μ)/σ
Solving for x
x= μ + z σ
x=3,61+(-1,28*8,38)=-7,116
My question is which of the two methods is the right one? if any at all.
I am thoroughly confused at this point, hope someone has the time to help.
Thank you
This is the assignment btw:
"The Danish government introduces a program for economic growth and will
help the 10 percent of the rms with the lowest rate of prot. What rate
of prot is the maximum in order to be considered for the program given
the mean and standard deviation found above and assuming that the data
is normally distributed?"
The excel formula is giving the actual, empirical 10th percentile value of your sample
If the data you have includes all possible instances of whatever you’re trying to measure, then go ahead and use that.
If you’re sampling from a population and your sample size is small, use a t distribution or increase your sample size. If your sample size is healthy and your data are normally distributed, use z scores.
Short story is the different outcomes suggest the data you’ve supplied are not normally distributed.
My objective is to predict what Mortgage Type a person will take on, based on their age, using Azure Machine Learning.
Note that I have 220,000 rows of data. There are several different Mortgage types but Purchases, Remortgages and Buy to Lets dominate the data.
A typical cross section of data might be: -
Age 20, Purchase
Age 30, Purchase
Age 30, Remortgage
Age 40, Remortgage
Age 55, Buy to Let
Age 55, Equity Release
My Azure Machine Learning Experiment is shown below.
My Metadata Edits are to change the MortgageType column to a label and the Age to an Integer. I have also played around with making them categorical/non-categorical.
When I view the Evaluation results, I get the following.
Does this mean that I can only really predict Buy to Lets and Purchases with a 60% confidence?
Am I doing this correctly and is there any other way of achieving my objective?
The plot shown by AzureML is called a confusion matrix. In your case, it should be interpreted as:
For each mortgage which was actually a Bridging mortgage, there is a 64.7% chance the model predicted a buy to let mortgage, a 17.6% chance a purchase mortgage, and a 17.6% chance a remortgage.
Your model only ever predicts a selected mortgage to be a buy to let, purchase or remortgage. This is probably because you are only using age as a feature which does not give a lot of information to the model. Consider adding additional features to your model in order to increase its predictive power.