Survival analysis for multiple distinct outcomes - statistics

I have been playing around a Kaplan Meier Survival analysis. I have 3 conditions mutually exclusive. Let says:
condition 1 is 'I am not able to work (and not working)'
condition 2 is 'I am able to work and I am working'
condition 3 is 'I am able to work but I am not working'
I am trying to have an overall likelihood to be either way in C1, C2 or C3.
I have done 3 separates Survival analysis (one for each condition) and add the cumulative proportion for the same time, but to the total is slightly superior to 1 (between 1.02 to 1.06 to be exact). I was wondering how to explain this over estimation. Is it something in my logic or the way the censored data are estimated? (or else).
Thanks,

Related

How to deal with non triplicated data in triplicated dataset

My problem is the following. I have a dataset with 10 variables and 8 samples. Each sample has been analysed for triplicate, therefore I have a dataset of 24 rows. However, some analyses (variable) were not performed in triplicate. In the case where the analysis was only done once, I have to introduce NA in order to fill the blanks. In the cases where the analysis was performed more than three times, I have to introduce new rows that add NA to the analysis which were in fact done three times.
My ulterior goal is to apply ANOVA to this dataset.I have thought about repeating the value in the case where I only have 1 analysis, and randomly eliminating values in the cases where I have more than 3 analysis, but I have the feeling this is not the most orthodox way to proceed.
I hope it is clear enough.
Thanks in advance!

Excel logical numeric evaluation

So I have a sheet where I am assigning levels to individuals based on their training, IE: Level 4 SME, Level 3 trained and can train others, level 2 trained, and level 1 untrained. For each shift, I want at least 2 level 2 individuals if so readiness is 100% anything above that is over percentage(which is fine) but anything less I want it to be less than 100%. I am trying to do this with formulas but it is not working the way I want.
Table Layout
Formulas
The above example would show more than 100% becuse there is more then two people at level 2 I wish there was a way to loop in excel to allow for me to increment and number for every count of 2.
How about this
=IF(COUNTIF(B2:E2,">=2")>=2,">=100%","not ready")
you can do a certain amount of looping in the UI with the LAMBDA function, if you have Office 365, although I'm not clear as to your requirement.
=COUNTIF(B2:E2,">=2") * 0.5 --> format as percent

Small data anomaly detection algo

I have the following 3 cases of a numeric metric on a time series(t,t1,t2 etc denotes different hourly comparisons across periods)
If you notice the 3 graphs t(period of interest) clearly has a drop off for image 1 but not so much for image 2 and image 3. Assume this is some sort of numeric metric(raw metric or derived) and I want to create a system/algo which specifically catches case 1 but not case 2 or 3 with t being the point of interest. While visually this makes sense and is very intuitive I am trying to design a way to this in python using the dataframes shown in the picture.
Generally the problem is how do I detect when the time series is behaving very differently from any of the prior weeks.
Edit: When I say different what I really mean is, my metric trends together across periods in t1 to t4 but if they dont and try to separate out of the envelope, that to me is an anomaly. If you notice chart 1 you can see t tries to split out from rest of the tn this is an anomaly for me. in other cases t is within the bounds of other time periods. Hope this helps.
With small data the best is if you can come up with a good transformation into a simpler representation.
In this case I would try the following:
Distance to the median along the time-axis. Then a summary of that, could be median, Mean-Squared-Error etc
Median of the cross-correlation of the signals

Excel Solver solver is messing up my optimization

I have set up an optimization problem but i must be doing something wrong and I could use your help. I have three firms: alpha, Bravo, Charlie. They each complete three tasks: Milling, Inspecting, Drilling. They each require different amounts of minutes to complete each task. Alpha requires 12 minutes to mill, 5 minutes to inspect and 10 minutes to drill. Bravo requires 10 minutes to mill, 4 to inspect, and 8 to drill. Charlie requires 8 to mill, 4 to inspect, and 16 to drill. After each firm completes all of these tasks they will earn a certain amount of profit, Alpha will earn $2.40, Bravo will earn $2.50, and Charlie will earn $3.00. All three firms have a maximum allotted time of 1200 minutes to mill, 900 to inspect, and 1440 to drill. The goal is to maximize the profit of these three firms. I have set it up so that the sums of the tasks will take away from the available time left when changed by the solver. I have also set constraints within the solver to cap each task to the allotted time allowed per task. I must be missing a vital step however because it keeps trying to just max out the allotted time for an individual firm, not taking in to account the opportunity cost of the other firms or something. Please help! (shown in photos)
Data
Solver
After executing Solver
I have changed the logic a bit different in order to take the minimum unit into consideration:
UNITS portion are the variable cells. Since the final produced unit will be the minimum of these cells, E9 formula is =MIN(B9:D9) and copied down.
TIME portion is multiplication of Unit Times and Units. So the formula of B14 is =B9*B2 and copied down & right.
I9:I11 are the earnings calculated by multiplying the unit earning with the minimum units
I12 is our total earning and is our Objective cell.
Please also be careful about the constraints since when you do not set an integer constrain, finding a solution becomes more difficult and of course our units should be integer in any case.
And also fill B9:D11 cells with some values such as 100, since otherwise iteration does not start correctly and solver ends up with a very small objective cell.
I have just had a go at this and I get a different answer as I have made the assumption that to achieve the profit the company must complete a milling process, then inspect, then drill and once all are complete then that is 1 unit for the profit - I hope that is valid.
But if not, then this layout may help you anyway. Note I have set this as a Linear model for the solver and also note the use of integer and non-negative.
It was fun anyway !

Compute statistical significance with Excel

I have 2 columns and multiple rows of data in excel. Each column represents an algorithm and the values in rows are the results of these algorithms with different parameters. I want to make statistical significance test of these two algorithms with excel. Can anyone suggest a function?
As a result, it will be nice to state something like "Algorithm A performs 8% better than Algorithm B with .9 probability (or 95% confidence interval)"
The wikipedia article explains accurately what I need:
http://en.wikipedia.org/wiki/Statistical_significance
It seems like a very easy task but I failed to find a scientific measurement function.
Any advice over a built-in function of excel or function snippets are appreciated.
Thanks..
Edit:
After tharkun's comments, I realized I should clarify some points:
The results are merely real numbers between 1-100 (they are percentage values). As each row represents a different parameter, values in a row represents an algorithm's result for this parameter. The results do not depend on each other.
When I take average of all values for Algorithm A and Algorithm B, I see that the mean of all results that Algorithm A produced are 10% higher than Algorithm B's. But I don't know if this is statistically significant or not. In other words, maybe for one parameter Algorithm A scored 100 percent higher than Algorithm B and for the rest Algorithm B has higher scores but just because of this one result, the difference in average is 10%.
And I want to do this calculation using just excel.
Thanks for the clarification. In that case you want to do an independent sample T-Test. Meaning you want to compare the means of two independent data sets.
Excel has a function TTEST, that's what you need.
For your example you should probably use two tails and type 2.
The formula will output a probability value known as probability of alpha error. This is the error which you would make if you assumed the two datasets are different but they aren't. The lower the alpha error probability the higher the chance your sets are different.
You should only accept the difference of the two datasets if the value is lower than 0.01 (1%) or for critical outcomes even 0.001 or lower. You should also know that in the t-test needs at least around 30 values per dataset to be reliable enough and that the type 2 test assumes equal variances of the two datasets. If equal variances are not given, you should use the type 3 test.
http://depts.alverno.edu/nsmt/stats.htm

Resources