How to resolve the "Object 'Note' not found error" when using rmst2 function from survRM2 package? - survival-analysis

I aim to compare the restricted mean survival time between the two treatment groups in the Anderson dataset
Anderson dataset
Here is the structure of my data frame:
'data.frame': 42 obs. of 5 variables:
$ survt : num 19 17 13 11 10 10 9 7 6 6 ...
$ status: num 0 0 1 0 0 1 0 1 0 1 ...
$ sex : Factor w/ 2 levels "female","male": 1 1 1 1 1 1 1 1 1 1 ...
$ logwbc: 'labelled' num 2.05 2.16 2.88 2.6 2.7 2.96 2.8 4.43 3.2 2.31 ...
..- attr(*, "label")= Named chr "log WBC"
.. ..- attr(*, "names")= chr "logwbc"
$ rx : Factor w/ 2 levels "New treatment",..: 1 1 1 1 1 1 1 1 1 1 ...
..- attr(*, "label")= Named chr "Treatment"
.. ..- attr(*, "names")= chr "rx"
- attr(*, "codepage")= int 65001
I used the following code to compare the restricted mean survival time between the two treatment groups ("New treatment" vs. "Standard treatment):
time <- anderson$survt
status <- anderson$status
arm <- anderson$rx
rmst2(time, status, arm )
I get the following error:
Error in rmst2(time, status, arm) : object 'NOTE' not found
In addition: Warning messages:
1: In max(tt) : no non-missing arguments to max; returning -Inf
2: In min(ss[tt == tt0max]) :
no non-missing arguments to min; returning Inf
3: In max(tt) : no non-missing arguments to max; returning -Inf
4: In min(ss[tt == tt1max]) :
no non-missing arguments to min; returning Inf
Thanks

I converted the sex and rx variables from factor to numeric and the function worked.

Related

Forest plot for coxme models?

I have a mixed-effects coxme model and wanted to plot a forest graph(similar to ggforest for coxph). I'm slightly new to this so not sure how to plot this.
My df:
str(cats_52weeks)
'data.frame': 487 obs. of 50 variables:
$ Cat_ID : chr "Mor02" "Mor03" "Mor04" "Mor05" ...
$ Sex : chr "female" "male" "male" "male" ...
$ Weight_Initialcapture.kg. : num 2.45 5.1 5 4.9 5.95 4.4 4.8 5.5 5.6 5 ...
$ Study_region : chr "Central Kimberley" "Central Kimberley" "Central Kimberley" "Central Kimberley" ...
$ cat_density : num 0.17 0.17 0.17 0.17 0.17 0.17 0.17 0.17 0.17 0.17 ...
$ Study_length : Factor w/ 3 levels "short","baiting",..: 3 3 3 3 3 3 3 3 3
$ Rabbits_present : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
$ Fox_present : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
$ Dingo_present : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
$ Habitat_type : Factor w/ 4 levels "Savannah","Desert",..: 1 1 1 1 1 1 1
$ Time2 : num 36 7 27 52 52 28 52 36 40 52 ...
$ Time1 : int 1 1 1 1 1 1 1 1 1 1 ...
$ Status : num 1 0 0 0 0 1 0 0 0 0 ...
$ age_category : Factor w/ 4 levels "tom","female",..: 4 1 1 1 1 3 1 1 1 1
And the model that I want to produce a forest plot is
m52_all <- coxme(Surv(cats_52weeks$Time2,cats_52weeks$Status) ~ Habitat_type+
Fox_present + Dingo_present + Rabbits_present + Weight_Initialcapture.kg. +
cat_density + (1|Study_region) + (1|Study_length),data=cats_52weeks)
Any help would be appreciated, thanks!!

Calibrate with cph function (with external validation)

I have two questions for calibrate with cph function.
My data have 5 independent variables(from BMI to RT), and 2 dependent variables (time, event).
> head(data)
BMI Taxanes Surgery LND RT Event Time
1 19 0 0 2 5 0 98
2 20 0 0 3 3 0 97
3 21 0 0 8 2 0 17
4 18 0 0 1 3 0 35
5 20 1 0 3 1 0 27
6 20 1 0 2 3 1 2
> str(data)
$ BMI : num 19 20 21 18 20 20 20 ...
$ Taxanes: int 0 0 0 0 1 1 1 0 0 0 ...
$ Surgery: num 0 0 0 0 0 0 1 0 0 0 ...
$ LND : int 2 3 8 1 3 2 2 2 5 2 ...
$ RT : Factor w/ 7 levels "0","1","2","3",..: 5 3 2 3 1 3 ...
$ Event : int 0 0 0 0 0 1 0 0 0 0 ...
$ Time : num 98 97 17 35 27 2 22 ...
(1) With this data, I did survival analysis with cph model. And I want to make a calibration plot using this data. But I got an error which "Error in x(x) : argument "y" is missing, with no default". I was finding lots of material. But I don't know the reason for this error. Even if I found the calibrate function in web, But I can't find for the element 'y'. please help me for this question.
> ddist <- datadist(data)
> options(datadist='ddist')
>
> fit = cph(Surv(Time,Event) ~ BMI + Surgery + Taxanes + RT + LND, data=data, x=TRUE, y=TRUE, surv=TRUE, dxy=TRUE, time.inc=36)
> plot(calibrate(fit))
Using Cox survival estimates at 36 Days
**Error in x(x) : argument "y" is missing, with no default**
(2) Eventually I want to do external validation for this cph model(fit).
If new data name is kind of dat2 (which has the same variable with data), then what is the observed and predicted survival? I know that the predicted value calculate like this code
val<-val.surv(fit, newdata=dat2, S=Surv(dat2$Time,dat2$Event))
But how I get a actual(observed) survival in new data(dat2)? Please help for this problem. Thank you so much in advance!

Use # (Copy) as selection or filter on 2-d array

The J primitive Copy (#) can be used as a filter function, such as
k =: i.8
(k>3) # k
4 5 6 7
That's essentially
0 0 0 0 1 1 1 1 # i.8
The question is if the right-hand side of # is 2-d or higher rank shaped array, how to make a selection using #, if possible. For example:
k =: 2 4 $ i.8
(k > 3) # k
I got length error
What is the right way to make such a selection?
You can use the appropriate verb rank to get something like a 2d-selection:
(2 | k) #"1 1 k
1 3
5 7
but the requested axes have to be filled with 0s (or !.) to keep the correct shape:
(k > 3) #("1 1) k
0 0 0 0
4 5 6 7
(k > 2) #("1 1) k
3 0 0 0
4 5 6 7
You have to better define select for dimensions > 1 because now you have a structure. How do you discard values? Do you keep empty "cells"? Do you replace with 0s? Is structure important for the result?
If, for example, you only need the "values where" then just ravel , the array:
(,k > 2) # ,k
3 4 5 6 7
If you need to "replace where", then you can use amend }:
u =: 5 :'I. , 5 > y' NB. indices where 5 > y
0 u } k
0 0 0 0
0 5 6 7
z =: 3 2 4 $ i.25
u =: 4 :'I. , (5 > y) +. (0 = 3|y)' NB. indices where 5>y or 3 divides y
_999 u } z
_999 _999 _999 _999
_999 5 _999 7
8 _999 10 11
_999 13 14 _999
16 17 _999 19
20 _999 22 23

In Python Pandas using cumsum with groupby and reset of cumsum when value is 0

I'm rather new at python.
I try to have a cumulative sum for each client to see the consequential months of inactivity (flag: 1 or 0). The cumulative sum of the 1's need therefore to be reset when we have a 0. The reset need to happen as well when we have a new client. See below with example where a is the column of clients and b are the dates.
After some research, I found the question 'Cumsum reset at NaN' and 'In Python Pandas using cumsum with groupby'. I assume that I kind of need to put them together.
Adapting the code of 'Cumsum reset at NaN' to the reset towards 0, is successful:
cumsum = v.cumsum().fillna(method='pad')
reset = -cumsum[v.isnull() !=0].diff().fillna(cumsum)
result = v.where(v.notnull(), reset).cumsum()
However, I don't succeed at adding a groupby. My count just goes on...
So, a dataset would be like this:
import pandas as pd
df = pd.DataFrame({'a' : [1,1,1,1,1,1,1,2,2,2,2,2,2,2],
'b' : [1/15,2/15,3/15,4/15,5/15,6/15,1/15,2/15,3/15,4/15,5/15,6/15],
'c' : [1,0,1,0,1,1,0,1,1,0,1,1,1,1]})
this should result in a dataframe with the columns a, b, c and d with
'd' : [1,0,1,0,1,2,0,1,2,0,1,2,3,4]
Please note that I have a very large dataset, so calculation time is really important.
Thank you for helping me
Use groupby.apply and cumsum after finding contiguous values in the groups. Then groupby.cumcount to get the integer counting upto each contiguous value and add 1 later.
Multiply with the original row to create the AND logic cancelling all zeros and only considering positive values.
df['d'] = df.groupby('a')['c'] \
.apply(lambda x: x * (x.groupby((x != x.shift()).cumsum()).cumcount() + 1))
print(df['d'])
0 1
1 0
2 1
3 0
4 1
5 2
6 0
7 1
8 2
9 0
10 1
11 2
12 3
13 4
Name: d, dtype: int64
Another way of doing would be to apply a function after series.expanding on the groupby object which basically computes values on the series starting from the first index upto that current index.
Use reduce later to apply function of two args cumulatively to the items of iterable so as to reduce it to a single value.
from functools import reduce
df.groupby('a')['c'].expanding() \
.apply(lambda i: reduce(lambda x, y: x+1 if y==1 else 0, i, 0))
a
1 0 1.0
1 0.0
2 1.0
3 0.0
4 1.0
5 2.0
6 0.0
2 7 1.0
8 2.0
9 0.0
10 1.0
11 2.0
12 3.0
13 4.0
Name: c, dtype: float64
Timings:
%%timeit
df.groupby('a')['c'].apply(lambda x: x * (x.groupby((x != x.shift()).cumsum()).cumcount() + 1))
100 loops, best of 3: 3.35 ms per loop
%%timeit
df.groupby('a')['c'].expanding().apply(lambda s: reduce(lambda x, y: x+1 if y==1 else 0, s, 0))
1000 loops, best of 3: 1.63 ms per loop
I think you need custom function with groupby:
#change row with index 6 to 1 for better testing
df = pd.DataFrame({'a' : [1,1,1,1,1,1,1,2,2,2,2,2,2,2],
'b' : [1/15,2/15,3/15,4/15,5/15,6/15,1/15,2/15,3/15,4/15,5/15,6/15,7/15,8/15],
'c' : [1,0,1,0,1,1,1,1,1,0,1,1,1,1],
'd' : [1,0,1,0,1,2,3,1,2,0,1,2,3,4]})
print (df)
a b c d
0 1 0.066667 1 1
1 1 0.133333 0 0
2 1 0.200000 1 1
3 1 0.266667 0 0
4 1 0.333333 1 1
5 1 0.400000 1 2
6 1 0.066667 1 3
7 2 0.133333 1 1
8 2 0.200000 1 2
9 2 0.266667 0 0
10 2 0.333333 1 1
11 2 0.400000 1 2
12 2 0.466667 1 3
13 2 0.533333 1 4
def f(x):
x.ix[x.c == 1, 'e'] = 1
a = x.e.notnull()
x.e = a.cumsum()-a.cumsum().where(~a).ffill().fillna(0).astype(int)
return (x)
print (df.groupby('a').apply(f))
a b c d e
0 1 0.066667 1 1 1
1 1 0.133333 0 0 0
2 1 0.200000 1 1 1
3 1 0.266667 0 0 0
4 1 0.333333 1 1 1
5 1 0.400000 1 2 2
6 1 0.066667 1 3 3
7 2 0.133333 1 1 1
8 2 0.200000 1 2 2
9 2 0.266667 0 0 0
10 2 0.333333 1 1 1
11 2 0.400000 1 2 2
12 2 0.466667 1 3 3
13 2 0.533333 1 4 4

Combining pairs in a string (Matlab)

I have a string:
sup_pairs = 'BA CE DF EF AE FC GD DA CG EA AB BG'
How can I combine pairs which have the last character of 1 pair is the first character of the follow pairs into strings? And the new strings must contain all of the character 'A','B','C','D','E','F' , 'G', those characters are appeared in the sup_pairs string.
The expected output should be:
S1 = 'BAEFCGD' % because BA will be followed by AE in sup_pairs string, so we combine BAE, and so on...we continue the rule to generate S1
S2 = 'DFCEABG'
If I have AB, BC and BD, the generated strings should be both : ABC and ABD .
If there is any repeated character in the pairs like : AB BC CA CE . We will skip the second A , and we get ABCE .
This, like all good things in life, is a graph problem. Each letter is a node, and each pair is an edge.
First we must transform your string of pairs into a numeric format so we can use the letters as subscripts. I will use A=2, B=3, ..., G=8:
sup_pairs = 'BA CE DF EF AE FC GD DA CG EA AB BG';
p=strsplit(sup_pairs,' ');
m=cell2mat(p(:));
m=m-'?';
A=sparse(m(:,1),m(:,2),1);
The sparse matrix A is now the adjacency matrix (actually, more like an adjacency list) representing our pairs. If you look at the full matrix of A, it looks like this:
>> full(A)
ans =
0 0 0 0 0 0 0 0
0 0 1 0 0 1 0 0
0 1 0 0 0 0 0 1
0 0 0 0 0 1 0 1
0 1 0 0 0 0 1 0
0 1 0 0 0 0 1 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0
As you can see, the edge BA, which translates to subscript (3,2) is equal to 1.
Now you can use your favorite implementation of Depth-first Search (DFS) to perform a traversal of the graph from your starting node of choice. Each path from the root to a leaf node represents a valid string. You then transform the path back into your letter sequence:
treepath=[3,2,6,7,4,8,5];
S1=char(treepath+'?');
Output:
S1 = BAEFCGD
Here's a recursive implementation of DFS to get you going. Normally in MATLAB you have to worry about not hitting the default limitation on recursion depth, but you're finding Hamiltonian paths here, which is NP-complete. If you ever get anywhere near the recursion limit, the computation time will be so huge that increasing the depth will be the least of your worries.
function full_paths = dft_all(A, current_path)
% A - adjacency matrix of graph
% current_path - initially just the start node (root)
% full_paths - cell array containing all paths from initial root to a leaf
n = size(A, 1); % number of nodes in graph
full_paths = cell(1,0); % return cell array
unvisited_mask = ones(1, n);
unvisited_mask(current_path) = 0; % mask off already visited nodes (path)
% multiply mask by array of nodes accessible from last node in path
unvisited_nodes = find(A(current_path(end), :) .* unvisited_mask);
% add restriction on length of paths to keep (numel == n)
if isempty(unvisited_nodes) && (numel(current_path) == n)
full_paths = {current_path}; % we've found a leaf node
return;
end
% otherwise, still more nodes to search
for node = unvisited_nodes
new_path = dft_all(A, [current_path node]); % add new node and search
if ~isempty(new_path) % if this produces a new path...
full_paths = {full_paths{1,:}, new_path{1,:}}; % add it to output
end
end
end
This is a normal Depth-first traversal except for the added condition on the length of the path in line 15:
if isempty(unvisited_nodes) && (numel(current_path) == n)
The first half of the if condition, isempty(unvisited_nodes) is standard. If you only use this part of the condition you'll get all paths from the start node to a leaf, regardless of path length. (Hence the cell array output.) The second half, (numel(current_path) == n) enforces the length of the path.
I took a shortcut here because n is the number of nodes in the adjacency matrix, which in the sample case is 8 rather than 7, the number of characters in your alphabet. But there are no edges into or out of node 1 because I was apparently planning on using a trick that I never got around to telling you about. Rather than run DFS starting from each of the nodes to get all of the paths, you can make a dummy node (in this case node 1) and create an edge from it to all of the other real nodes. Then you just call DFS once on node 1 and you get all the paths. Here's the updated adjacency matrix:
A =
0 1 1 1 1 1 1 1
0 0 1 0 0 1 0 0
0 1 0 0 0 0 0 1
0 0 0 0 0 1 0 1
0 1 0 0 0 0 1 0
0 1 0 0 0 0 1 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0
If you don't want to use this trick, you can change the condition to n-1, or change the adjacency matrix not to include node 1. Note that if you do leave node 1 in, you need to remove it from the resulting paths.
Here's the output of the function using the updated matrix:
>> dft_all(A, 1)
ans =
{
[1,1] =
1 2 3 8 5 7 4 6
[1,2] =
1 3 2 6 7 4 8 5
[1,3] =
1 3 8 5 2 6 7 4
[1,4] =
1 3 8 5 7 4 6 2
[1,5] =
1 4 6 2 3 8 5 7
[1,6] =
1 5 7 4 6 2 3 8
[1,7] =
1 6 2 3 8 5 7 4
[1,8] =
1 6 7 4 8 5 2 3
[1,9] =
1 7 4 6 2 3 8 5
[1,10] =
1 8 5 7 4 6 2 3
}

Resources