I am currently doing some population analyses with the package "FSA" in R.
By using the mrOpencommand, I want to get the survival rate.
My rawdata is a simple table with one row per indidivual, one column per sample date and values of 0 and 1 (for not capured or captured during that respective sampling).
id
total.captures
date1
date2
date3
etc
1
3
1
1
1
...
2
1
1
0
0
...
The first two columns contain the individual id and the aggregated number of captures which is why I excluded them in the analysis.
This is the exact code:
hold.data<-capHistSum(data, cols2use = c(3:13))
est.data<-mrOpen(hold.data)
summary(est.data)
confint(est.data)
It seems to work out, as I get the tables and summaries with all the parameters. See here as an example:
Screenshot_Results
However, there's a problem with the survival estimate phi.
The phi value is not between 0 and 1, but in some cases, exceeds 1.
Any idea, what went wrong here?
Thanks,
Pia
Related
I have a range of cells in excel. How to increment numbers when meeting data validation from three different columns?
I tried using formula COUNTIF($A$2:A2,A2) which creates a number sequence. But I have other data to validate from another column for it to return the correct number sequence.
First validation: count the emp no in column range A1:A5 which return a result under Hierarchy column.
Second validation: check the % value under column L as per below level of hierarchy in which the problem comes from.
1 - 0.25
2 - 0.25
3 - 0.5
4 - 0.5
5 - 1
Third validation: check the type of Relation (see Relation column) that needs to check when returning number of sequence too. Below is the Relation Level table.
I don't know on how to join these three conditions for the result to be as below.
My really problem here is on how will i get a sequence number if a person does have 3 children and should be tagged as 2,3,4 (next to spouse which is 1) then the next relation which is parent will be tagged then as next number sequence from the last count of child wherein will be 5 given that as per relation table, Parent level is 3 but it will be adjusted as per count of relations a person has. And for this specific instance, if Parent count will be 5, it still should have 0.5 EE % (see relation table level vs % hierarchy level) even though the count of number is 5. I hope this will make sense. But let me know if you have any questions.
Hope someone could help me on this coz I am not that expert when it comes to excel formula. Thank you!
I have a dataframe like this:
df = pd.DataFrame({'id':[10,20,30,40],'text':['some text','another text','random stuff', 'my cat is a god'],
'A':[0,0,1,1],
'B':[1,1,0,0],
'C':[0,0,0,1],
'D':[1,0,1,0]})
Here I have columns from Ato D but my real dataframe has 100 columns with values of 0and 1. This real dataframe has 100k reacords.
For example, the column A is related to the 3rd and 4rd row of text, because it is labeled as 1. The Same way, A is not related to the 1st and 2nd rows of text because it is labeled as 0.
What I need to do is to sample this dataframe in a way that I have the same or about the same number of features.
In this case, the feature C has only one occurrece, so I need to filter all others columns in a way that I have one text with A, one text with B, one text with Cetc..
The best would be: I can set using for example n=100 that means I want to sample in a way that I have 100 records with all the features.
This dataset is a multilabel dataset training and is higly unbalanced, I am looking for the best way to balance it for a machine learning task.
Important: I don't want to exclude the 0 features. I just want to have ABOUT the same number of columns with 1 and 0
For example. with a final data set with 1k records, I would like to have all columns from A to the final_column and all these columns with the same numbers of 1 and 0. To accomplish this I will need to random discard text rows and id only.
The approach I was trying was to look to the feature with the lowest 1 and 0 counts and then use this value as threshold.
Edit 1: One possible way I thought is to use:
df.sum(axis=0, skipna=True)
Then I can use the column with the lowest sum value as threshold to filter the text column. I dont know how to do this filtering step
Thanks
The exact output you expect is unclear, but assuming you want to get 1 random row per letter with 1 you could reshape (while dropping the 0s) and use GroupBy.sample:
(df
.set_index(['id', 'text'])
.replace(0, float('nan'))
.stack()
.groupby(level=-1).sample(n=1)
.reset_index()
)
NB. you can rename the columns if needed
output:
id text level_2 0
0 30 random stuff A 1.0
1 20 another text B 1.0
2 40 my cat is a god C 1.0
3 30 random stuff D 1.0
I am attempting to get the last 3 months average of out%. so far could not find any sample that suit for my requirement. Appreciate any help that could be provided.
There's a multiple column that i need to consider. I have to add the current month good% and the out% in the calculation to get the forecast.
This code gives me an error and the values reflected in the column is incorrect.
Sum([out]) / Sum([in]) over (LastPeriods(3,[month]))
Dt TotalIN TotalOut Good OUT% Good%
2/1/2019 79606 51384 0 64.55% 0
3/1/2019 84194 61211 0 72.70% 0
4/1/2019 92458 67807 0 73.34% 0
5/1/2019 94531 66988 95 70.86% 0.10%
6/1/2019 29623 18181 2903 60.94% 9.73%
Thanks for adding some data I could copy in. Below is the early morning hack I made a column of the summary table. I also binned the row axes to make this work.
Column Calculation:
(Sum([Good%]) * ((
Sum([TotalOut]) OVER (Previous([Axis.Rows],3)) + Sum([TotalOut]) OVER (Previous([Axis.Rows],2)) + Sum([TotalOut]) OVER (Previous([Axis.Rows],1)))
/ (
Sum([TotalIn]) OVER (Previous([Axis.Rows],3)) + Sum([TotalIn]) OVER (Previous([Axis.Rows],2)) + Sum([TotalIn]) OVER (Previous([Axis.Rows],1)))))
+ Sum([Out%])
Row Axes:
<BinByDateTime([Dt],"Year.Month",1)>
There might be a cleaner version with below but I'm not good with THEN. The problem with LastPeriods is that you want NULL if there aren't enough months.
THEN If(Count() OVER (LastPeriods(3,[Axis.X]))=3,[Value],NULL)
I am having trouble determining the correct way to calculate a final rank order for four categories. Each of the four metrics make up a higher group. A Top 10 of each category is applied to the respective product to risk analysis.
CURRENT LOGIC - Assignment of 25% max per category.
Columns - Y4
Parts
0.25
25
=IF(L9=1,$Y$4,IF(L9=2,$Y$4*0.9, IF(L9=3,$Y$4*0.8, IF(L9=4,$Y$4*0.7, IF(L9=5,$Y$4*0.6, IF(L9=6,$Y$4*0.5, IF(L9=7,$Y$4*0.4, IF(L9=8,$Y$4*0.3, IF(L9=9,$Y$4*0.2, IF(L9=10,$Y$4*0.1,0))))))))))
DESIRED...
I would like to use a statement to determine three criteria in order to apply a score (1=100, 2=90, 3=80, etc..).
SUM the rank positions of each of the four categories-apply product rank ascending (not including NULL since it's not in the Top 10)
IF a product is identified in more than one metric-apply a significant contribution weight of (*.75),
IF a product has the number 1 rank in any of the four metrics-apply a score of (100).
Data - UPDATED EXAMPLE
(Product) Parts Labor Overhead External Final Score
"XYZ" 3 1 7 7 100
"ABC" NULL 6 NULL 2 100
"LMN" 4 NULL NULL NULL 70
This is way beyond my capability. ANY assistance is appreciated greatly!!!
Jim
I figured this is a good start and I can alter the weight as needed to reflect the reality of the situation.
=AVERAGE(G28:I28)+SUM(G28:I28)*0.25
However, I couldn't figure out how to put a cap on the score of no more than 100 points.
I am still unclear of what exactly you are attempting and if this will work, but how about this simple matrix using an array formula and some conditional formatting.
Array Formula in F2 (make sure to press Ctrl+Shift+Enter when exiting formula edit mode)
=MIN(100,SUM(IF(B2:E2<>"NULL",CHOOSE(B2:E2,100,90,80,70,60,50,40,30,20,10))))
Conditional Formatting defined as shown below.
Red = 100 value where it comes from a 1
Yellow = 100 value where it comes from more than 1 factor, but without a 1.
I have a table that shows summed monthly values grouped by different analysis codes
TableId Month Value Analysis1ID Analysis2ID
1 1 100 1 NULL
2 1 50 NULL 3
3 1 50 2 NULL
4 1 50 3 NULL
I have set the above as a fact table (also have a dimension for the analysis values).
As you can see the table has a new row for each unique ID for the analysis column.
We are then analysing the data in excel, Simply summing the Value column and grouping by Analyis1ID, Month
This give us :
AnalysisID1 1 = 100
AnalysisID1 2 = 50
AnalysisID1 3 = 50
Unknown = 50
Total = 250
This all looks ok apart from the Unknown, which is summed total of NULL....
I have tried excluding the NULL Value in the Dimension by setting the UnknownMember to "Hidden".
This does work but it does not exclude the amount from the total. How can i exclude it from the total value?
I am guessing that the table structure is not correct for that data, I'm unsure though how else to structure it?
Any help or guidance would be appreciated
I would not have NULL values in dimension members, in the past i've always used an Unallocated Member with a -1 ID.
You could then use Cube Security to filter out the Unknown or Unallocated members.
I would Filter that row out using Excel. Right-click on the cell labelled 'Unknown' and you can choose Filter / Hide Selected Items.