I am trying to learn what the various outputs of predict.coxph() mean. I am currently attempting to fit a cox model on a training set then use the resulting coefficients from the training set to make predictions in a test set (new set of data).
I see from the predict.coxph() help page that I could use type = "survival" to extract and individual's survival probability-- which is equal to exp(-expected).
Here is a code block of what I have attempted so far, using the ISLR2 BrainCancer data.
set.seed(123)
n.training = round(nrow(BrainCancer) * 0.70) # 70:30 split
idx = sample(1:nrow(BrainCancer), size = n.training)
d.training = BrainCancer[idx, ]
d.test = BrainCancer[-idx, ]
# fit a model using the training set
fit = coxph(Surv(time, status) ~ sex + diagnosis + loc + ki + gtv + stereo, data = d.training)
# get predicted survival probabilities for the test set
pred = predict(fit, type = "survival", newdata = d.test)
The predictions generated:
predict(fit, type = "survival", newdata = d.test)
[1] 0.9828659 0.8381164 0.9564982 0.2271862 0.2883800 0.9883625 0.9480138 0.9917512 1.0000000 0.9974775 0.7703657 0.9252100 0.9975044 0.9326234 0.8718161 0.9850815 0.9545622 0.4381646 0.8236644
[20] 0.2455676 0.7289031 0.9063336 0.9126897 0.9988625 0.4399697 0.9360874
Are these survival probabilities associated with a specific time point? From the help page, it sounds like these are survival probabilities at the follow-up times in the newdata argument. Is this correct?
Additional questions:
How is the baseline hazard estimated in predict.coxph? Is it using the Breslow estimator?
If type = "expected" is used, are these values the cumulative hazard? If yes, what are the relevant time points for these?
Thank you!
Related
I am trying to Schoenfeld residuals from competing risk regression using a Fine-and-Gray model. The model works fine, but I cant find a way to calculate the Schoenfeld residuals. The model looks like this:
fg.multi <- FGR(formula = Hist(FUdur30, cause.mort) ~ emp + AGE_JR + CCMTOTAL + CLIN_SEV_SEPS + def.focus + Bacteraemiaclass , data = total.pop, cause = 2)
I have recently started exploring the QuantLib option pricing libraries for python and have come across an error that I don't seem to understand. Basically, I am trying to price an Up&Out Barrier option using the Heston model. The code that I have written has been taken from examples found online and adapted to my specific case. Essentially, the problem is that when I run the code below I get an error that I believe is triggered at the last line of the code, i.e. the european_option.NPV() function
*** RuntimeError: wrong argument type
Can someone please explain me what I am doing wrong?
# option inputs
maturity_date = ql.Date(30, 6, 2020)
spot_price = 969.74
strike_price = 1000
volatility = 0.20
dividend_rate = 0.0
option_type = ql.Option.Call
risk_free_rate = 0.0016
day_count = ql.Actual365Fixed()
calculation_date = ql.Date(26, 6, 2020)
ql.Settings.instance().evaluationDate = calculation_date
# construct the option payoff
european_option = ql.BarrierOption(ql.Barrier.UpOut, Barrier, Rebate,
ql.PlainVanillaPayoff(option_type, strike_price),
ql.EuropeanExercise(maturity_date))
# set the Heston parameters
v0 = volatility*volatility # spot variance
kappa = 0.1
theta = v0
hsigma = 0.1
rho = -0.75
spot_handle = ql.QuoteHandle(ql.SimpleQuote(spot_price))
# construct the Heston process
flat_ts = ql.YieldTermStructureHandle(ql.FlatForward(calculation_date,
risk_free_rate, day_count))
dividend_yield = ql.YieldTermStructureHandle(ql.FlatForward(calculation_date,
dividend_rate, day_count))
heston_process = ql.HestonProcess(flat_ts, dividend_yield,
spot_handle, v0, kappa,
theta, hsigma, rho)
# run the pricing engine
engine = ql.AnalyticHestonEngine(ql.HestonModel(heston_process),0.01, 1000)
european_option.setPricingEngine(engine)
h_price = european_option.NPV()
The problem is that the AnalyticHestonEngine is not able to price Barrier options.
Check here https://www.quantlib.org/reference/group__barrierengines.html for a list of Barrier Option pricing engines.
I'm trying to replicate the network described in this link: https://arxiv.org/pdf/1806.07492.pdf. I've been able to replicate the patch networks of the model so far, so now I have 9 models of (32,32,3) input size each.
However, the next step is to merge those 9 models into a single one to obtain a single (96,96,3) input network. The idea is that of the 27 neurons of the first layer, the first 3 correspond to the pretrained model of the first patch, and so on. Here is an image of how it should result (each colored row is a different model):
Final concatenated model
In this image, each row before the fully-connected layer represents a previously trained model with the following characteristics:
(32,32,3) architecture
As you can see, this network analyzes images of (32,32,3). However, the complete model needs to follow this structure that uses a (96,96,3) model:
(96,96,3) architecture
I already have the (32,32,3) models in their respective hdf5 file, but I do not know how to merge them into a single model to obtain the one of size (96,96,3). I tried using the concatenate function of Keras, but I got an error that said that my inputs needed to be tensors.
Here is the code that I used:
in1 = Input(shape=(32,32,3))
model_patch1 = load_model('patch1.hdf5')
in2 = Input(shape=(32,32,3))
model_patch2 = load_model('patch2.hdf5')
in3 = Input(shape=(32,32,3))
model_patch3 = load_model('patch3.hdf5')
in4 = Input(shape=(32,32,3))
model_patch4 = load_model('patch4.hdf5')
in5 = Input(shape=(32,32,3))
model_patch5 = load_model('patch5.hdf5')
in6 = Input(shape=(32,32,3))
model_patch6 = load_model('patch6.hdf5')
in7 = Input(shape=(32,32,3))
model_patch7 = load_model('patch7.hdf5')
in8 = Input(shape=(32,32,3))
model_patch8 = load_model('patch8.hdf5')
in9 = Input(shape=(32,32,3))
model_patch9 = load_model('patch9.hdf5')
model_final_concat = Concatenate(axis=-1)([model_patch1, model_patch2,
model_patch3, model_patch4, model_patch5, model_patch6, model_patch7,
model_patch8, model_patch9])
model_final_dense_1 = Dense(1, activation='sigmoid')(model_final_concat)
lsCnn_faces = Model(inputs=[in1,in2,in3,in4,in5,in6,in7,in8,in9],
outputs=model_final_dense_1) lsCnn_faces.summary()
Any help will be very much appreciated.
Given that we could use self-defined metric in LightGBM and use parameter 'feval' to call it during training.
And for given metric, we could define it in the parameter dict like metric:(l1, l2)
My question is that how call several self-defined metric at the same time? I cannot use feval=(my_metric1, my_metric2) to get the result
params = {}
params['learning_rate'] = 0.003
params['boosting_type'] = 'goss'
params['objective'] = 'multiclassova'
params['metric'] = ['multi_error', 'multi_logloss']
params['sub_feature'] = 0.8
params['num_leaves'] = 15
params['min_data'] = 600
params['tree_learner'] = 'voting'
params['bagging_freq'] = 3
params['num_class'] = 3
params['max_depth'] = -1
params['max_bin'] = 512
params['verbose'] = -1
params['is_unbalance'] = True
evals_result = {}
aa = lgb.train(params,
d_train,
valid_sets=[d_train, d_dev],
evals_result=evals_result,
num_boost_round=4500,
feature_name=f_names,
verbose_eval=10,
categorical_feature = f_names,
learning_rates=lambda iter: (1 / (1 + decay_rate * iter)) * params['learning_rate'])
Lets' discuss on the code I share here. d_train is my training set. d_dev is my validation set (I have a different test set.) evals_result will record our multi_error and multi_logloss per iteration as a list. verbose_eval = 10 will make LightGBM print multi_error and multi_logloss of both training set and validation set at every 10 iterations. If you want to plot multi_error and multi_logloss as a graph:
lgb.plot_metric(evals_result, metric='multi_error')
plt.show()
lgb.plot_metric(evals_result, metric='multi_logloss')
plt.show()
You can find other useful functions from LightGBM documentation. If you can't find what you need, go to XGBoost documentation, a simple trick. If there is something missing, please do not hesitate to ask more.
I'm trying to make tensorflow mfcc give me the same results as python lybrosa mfcc
i have tried to match all the default parameters that are used by librosa
in my tensorflow code and got a different result
this is the tensorflow code that i have used :
waveform = contrib_audio.decode_wav(
audio_binary,
desired_channels=1,
desired_samples=sample_rate,
name='decoded_sample_data')
sample_rate = 16000
transwav = tf.transpose(waveform[0])
stfts = tf.contrib.signal.stft(transwav,
frame_length=2048,
frame_step=512,
fft_length=2048,
window_fn=functools.partial(tf.contrib.signal.hann_window,
periodic=False),
pad_end=True)
spectrograms = tf.abs(stfts)
num_spectrogram_bins = stfts.shape[-1].value
lower_edge_hertz, upper_edge_hertz, num_mel_bins = 0.0,8000.0, 128
linear_to_mel_weight_matrix =
tf.contrib.signal.linear_to_mel_weight_matrix(
num_mel_bins, num_spectrogram_bins, sample_rate, lower_edge_hertz,
upper_edge_hertz)
mel_spectrograms = tf.tensordot(
spectrograms,
linear_to_mel_weight_matrix, 1)
mel_spectrograms.set_shape(spectrograms.shape[:-1].concatenate(
linear_to_mel_weight_matrix.shape[-1:]))
log_mel_spectrograms = tf.log(mel_spectrograms + 1e-6)
mfccs = tf.contrib.signal.mfccs_from_log_mel_spectrograms(
log_mel_spectrograms)[..., :20]
the equivalent in librosa:
libr_mfcc = librosa.feature.mfcc(wav, 16000)
the following are the graphs of the results:
I'm the author of tf.signal. Sorry for not seeing this post sooner, but you can get librosa and tf.signal.stft to match if you center-pad the signal before passing it to tf.signal.stft. See this GitHub issue for more details.
I spent a whole 1 day trying to make them match. Even the rryan's solution didn't work for me (center=False in librosa), but I finally found out, that TF and librosa STFT's match only for the case win_length==n_fft in librosa and frame_length==fft_length in TF. That's why rryan's colab example is working, but you can try that if you set frame_length!=fft_length, the amplitudes are very different (although visually, after plotting, the patterns look similar). Typical example - if you choose some win_length/frame_length and then you want to set n_fft/fft_length to the smallest power of 2 greater than win_length/frame_length, then the results will be different. So you need to stick with the inefficient FFT given by your window size... I don't know why it is so, but that's how it is, hopefully it will be helpful for someone.
The output of contrib_audio.decode_wav should be DecodeWav with { audio, sample_rate } and audio shape is (sample_rate, 1), so what is the purpose for getting first item of waveform and do transpose?
transwav = tf.transpose(waveform[0])
No straight forward way, since librosa stft uses center=True which does not comply with tf stft.
Had it been center=False, stft tf/librosa would give near enough results. see colab sniff
But even though, trying to import the librosa code into tf is a big headache. Here is what I started and gave up. Near but not near enough.
def pow2db_tf(X):
amin=1e-10
top_db=80.0
ref_value = 1.0
log10 = 2.302585092994046
log_spec = (10.0/log10) * tf.log(tf.maximum(amin, X))
log_spec -= (10.0/log10) * tf.log(tf.maximum(amin, ref_value))
pow2db = tf.maximum(log_spec, tf.reduce_max(log_spec) - top_db)
return pow2db
def librosa_feature_like_tf(x, sr=16000, n_fft=2048, n_mfcc=20):
mel_basis = librosa.filters.mel(sr, n_fft).astype(np.float32)
mel_basis = mel_basis.reshape(1, int(n_fft/2+1), -1)
tf_stft = tf.contrib.signal.stft(x, frame_length=n_fft, frame_step=hop_length, fft_length=n_fft)
print ("tf_stft", tf_stft.shape)
tf_S = tf.matmul(tf.abs(tf_stft), mel_basis);
print ("tf_S", tf_S.shape)
tfdct = tf.spectral.dct(pow2db_tf(tf_S), norm='ortho'); print ("tfdct", tfdct.shape)
print ("tfdct before cut", tfdct.shape)
tfdct = tfdct[:,:,:n_mfcc];
print ("tfdct afer cut", tfdct.shape)
#tfdct = tf.transpose(tfdct,[0,2,1]);print ("tfdct afer traspose", tfdct.shape)
return tfdct
x = tf.placeholder(tf.float32, shape=[None, 16000], name ='x')
tf_feature = librosa_feature_like_tf(x)
print("tf_feature", tf_feature.shape)
mfcc_rosa = librosa.feature.mfcc(wav, sr).T
print("mfcc_rosa", mfcc_rosa.shape)
For anyone still looking for this: I had a similar problem some time ago: Matching librosa's mel filterbanks/mel spectrogram to a tensorflow implementation. The solution was to use a different windowing approach for the spectrogram and librosa's mel matrix as constant tensor. See here and here.