I'm trying to implement Double DQN (not to be confused with DQN with a slightly delayed Q-target network) in PyTorch to train an agent to play an Atari OpenAI Gym game. Here I discuss the implementation of the following formula:
Update of Q-network, formula taken from Sutton & Barto.
My first implementation is:
Q_pred = self.Q_1.forward(s_now)[T.arange(batch_size), actions.long()]
Q_next_all = self.Q_1.forward(s_next)
maxA_id = T.argmax(Q_next_all, dim=1)
Q_pred2 = self.Q_2.forward(s_next)[T.arange(batch_size), maxA_id]
Q_target = (rewards + (~dones) * self.GAMMA * Q_pred2).detach()
self.Q_1.optimizer.zero_grad()
self.Q_1.loss(Q_target, Q_pred).backward()
self.Q_1.optimizer.step()
(Q_1 and Q_2 are nn.Module classes, and all of the variables involved here are already torch tensors lying in the GPU.)
I noticed that my program ran much slower than a previous implementation which used plain DQN.
I realized that I can combine the batches entering Q_1, so there will be one combined batch being forwarded in the neural network, instead of two batches in sequence. The code becomes:
s_combined = T.cat((s_now, s_next))
Q_combined = self.Q_1.forward(s_combined)
Q_pred = Q_combined[T.arange(batch_size), actions.long()]
Q_next_all = Q_combined[batch_size:]
Q_pred2_all = self.Q_2.forward(s_next)
maxA_id = T.argmax(Q_next_all, dim=1)
Q_pred2 = Q_pred2_all[T.arange(batch_size), maxA_id]
Q_target = (rewards + (~dones) * self.GAMMA * Q_pred2).detach()
self.Q_1.optimizer.zero_grad()
self.Q_1.loss(Q_target, Q_pred).backward()
self.Q_1.optimizer.step()
(This proves that I understand how to do batch training in PyTorch, so don't mark this as a duplicate of this question.)
Furthermore, I realized that Q_1 and Q_2 can process their batches in parallel. So I looked up how to do multiprocessing in PyTorch. Unfortunately, I couldn't find a good example. I tried to adapt a code that looks similar to my scenario, and my code becomes:
def spawned():
s_combined = T.cat((s_now, s_next))
Q_combined = self.Q_1.forward(s_combined)
Q_pred = Q_combined[T.arange(batch_size), actions.long()]
Q_next_all = Q_combined[batch_size:]
mp.set_start_method('spawn', force=True)
p = mp.Process(target=spawned)
p.start()
Q_pred2_all = self.Q_2.forward(s_next)
p.join()
maxA_id = T.argmax(Q_next_all, dim=1)
Q_pred2 = Q_pred2_all[T.arange(batch_size), maxA_id]
Q_target = (rewards + (~dones) * self.GAMMA * Q_pred2).detach()
self.Q_1.optimizer.zero_grad()
self.Q_1.loss(Q_target, Q_pred).backward()
self.Q_1.optimizer.step()
This crashes with the error message:
AttributeError: Can't pickle local object 'Agent.learn.<locals>.spawned'
So how do I make this work?
(Achieving this in CUDA programming is trivial. One simply launches two device kernels using a sequential host code, and the two kernels are automatically computed in parallel in the GPU.)
Related
I know we can use torch profiler with tensorboard using something like this:
with torch.profiler.profile(
schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=2),
on_trace_ready=torch.profiler.tensorboard_trace_handler('./log/resnet18'),
record_shapes=True,
with_stack=True
) as prof:
for step, batch_data in enumerate(train_loader):
if step >= (1 + 1 + 3) * 2:
break
train(batch_data)
prof.step() # Need to call this at the end of each step to notify profiler of steps' boundary.
It works perfectly with pytorch, but the problem is I have to use pytorch lightning and if I put this in my training step, it just doesn't create the log file nor does it create an entry for profiler. All I get is lightning_logs which isn't the profiler output. I couldn't find anything in the docs about lightning_profiler and tensorboard so does anyone have any idea?
Here's what my training function looks like:
def training_step(self, train_batch, batch_idx):
with torch.profiler.profile(
activities=[ProfilerActivity.CPU],
schedule=torch.profiler.schedule(
wait=1,
warmup=1,
active=2,
repeat=1),
with_stack=True,
on_trace_ready=torch.profiler.tensorboard_trace_handler('./logs'),
) as profiler:
x, y = train_batch
x = x.float()
logits = self.forward(x)
loss = self.loss_fn(logits, y)
profiler.step()
return loss
You don't have to use raw torch.profiler at all. There is a whole page in Lightning Docs dedicated to Profiling ..
.. and its as easy as passing a trainer flag called profiler like
# other profilers are "simple", "advanced" etc
trainer = pl.Trainer(profiler="pytorch")
Also, set TensorBoardLogger as your preferred logger as you normally do
trainer = pl.Trainer(profiler="pytorch", logger=TensorBoardLogger(..))
I have recently started exploring the QuantLib option pricing libraries for python and have come across an error that I don't seem to understand. Basically, I am trying to price an Up&Out Barrier option using the Heston model. The code that I have written has been taken from examples found online and adapted to my specific case. Essentially, the problem is that when I run the code below I get an error that I believe is triggered at the last line of the code, i.e. the european_option.NPV() function
*** RuntimeError: wrong argument type
Can someone please explain me what I am doing wrong?
# option inputs
maturity_date = ql.Date(30, 6, 2020)
spot_price = 969.74
strike_price = 1000
volatility = 0.20
dividend_rate = 0.0
option_type = ql.Option.Call
risk_free_rate = 0.0016
day_count = ql.Actual365Fixed()
calculation_date = ql.Date(26, 6, 2020)
ql.Settings.instance().evaluationDate = calculation_date
# construct the option payoff
european_option = ql.BarrierOption(ql.Barrier.UpOut, Barrier, Rebate,
ql.PlainVanillaPayoff(option_type, strike_price),
ql.EuropeanExercise(maturity_date))
# set the Heston parameters
v0 = volatility*volatility # spot variance
kappa = 0.1
theta = v0
hsigma = 0.1
rho = -0.75
spot_handle = ql.QuoteHandle(ql.SimpleQuote(spot_price))
# construct the Heston process
flat_ts = ql.YieldTermStructureHandle(ql.FlatForward(calculation_date,
risk_free_rate, day_count))
dividend_yield = ql.YieldTermStructureHandle(ql.FlatForward(calculation_date,
dividend_rate, day_count))
heston_process = ql.HestonProcess(flat_ts, dividend_yield,
spot_handle, v0, kappa,
theta, hsigma, rho)
# run the pricing engine
engine = ql.AnalyticHestonEngine(ql.HestonModel(heston_process),0.01, 1000)
european_option.setPricingEngine(engine)
h_price = european_option.NPV()
The problem is that the AnalyticHestonEngine is not able to price Barrier options.
Check here https://www.quantlib.org/reference/group__barrierengines.html for a list of Barrier Option pricing engines.
Given that we could use self-defined metric in LightGBM and use parameter 'feval' to call it during training.
And for given metric, we could define it in the parameter dict like metric:(l1, l2)
My question is that how call several self-defined metric at the same time? I cannot use feval=(my_metric1, my_metric2) to get the result
params = {}
params['learning_rate'] = 0.003
params['boosting_type'] = 'goss'
params['objective'] = 'multiclassova'
params['metric'] = ['multi_error', 'multi_logloss']
params['sub_feature'] = 0.8
params['num_leaves'] = 15
params['min_data'] = 600
params['tree_learner'] = 'voting'
params['bagging_freq'] = 3
params['num_class'] = 3
params['max_depth'] = -1
params['max_bin'] = 512
params['verbose'] = -1
params['is_unbalance'] = True
evals_result = {}
aa = lgb.train(params,
d_train,
valid_sets=[d_train, d_dev],
evals_result=evals_result,
num_boost_round=4500,
feature_name=f_names,
verbose_eval=10,
categorical_feature = f_names,
learning_rates=lambda iter: (1 / (1 + decay_rate * iter)) * params['learning_rate'])
Lets' discuss on the code I share here. d_train is my training set. d_dev is my validation set (I have a different test set.) evals_result will record our multi_error and multi_logloss per iteration as a list. verbose_eval = 10 will make LightGBM print multi_error and multi_logloss of both training set and validation set at every 10 iterations. If you want to plot multi_error and multi_logloss as a graph:
lgb.plot_metric(evals_result, metric='multi_error')
plt.show()
lgb.plot_metric(evals_result, metric='multi_logloss')
plt.show()
You can find other useful functions from LightGBM documentation. If you can't find what you need, go to XGBoost documentation, a simple trick. If there is something missing, please do not hesitate to ask more.
I'm trying to make tensorflow mfcc give me the same results as python lybrosa mfcc
i have tried to match all the default parameters that are used by librosa
in my tensorflow code and got a different result
this is the tensorflow code that i have used :
waveform = contrib_audio.decode_wav(
audio_binary,
desired_channels=1,
desired_samples=sample_rate,
name='decoded_sample_data')
sample_rate = 16000
transwav = tf.transpose(waveform[0])
stfts = tf.contrib.signal.stft(transwav,
frame_length=2048,
frame_step=512,
fft_length=2048,
window_fn=functools.partial(tf.contrib.signal.hann_window,
periodic=False),
pad_end=True)
spectrograms = tf.abs(stfts)
num_spectrogram_bins = stfts.shape[-1].value
lower_edge_hertz, upper_edge_hertz, num_mel_bins = 0.0,8000.0, 128
linear_to_mel_weight_matrix =
tf.contrib.signal.linear_to_mel_weight_matrix(
num_mel_bins, num_spectrogram_bins, sample_rate, lower_edge_hertz,
upper_edge_hertz)
mel_spectrograms = tf.tensordot(
spectrograms,
linear_to_mel_weight_matrix, 1)
mel_spectrograms.set_shape(spectrograms.shape[:-1].concatenate(
linear_to_mel_weight_matrix.shape[-1:]))
log_mel_spectrograms = tf.log(mel_spectrograms + 1e-6)
mfccs = tf.contrib.signal.mfccs_from_log_mel_spectrograms(
log_mel_spectrograms)[..., :20]
the equivalent in librosa:
libr_mfcc = librosa.feature.mfcc(wav, 16000)
the following are the graphs of the results:
I'm the author of tf.signal. Sorry for not seeing this post sooner, but you can get librosa and tf.signal.stft to match if you center-pad the signal before passing it to tf.signal.stft. See this GitHub issue for more details.
I spent a whole 1 day trying to make them match. Even the rryan's solution didn't work for me (center=False in librosa), but I finally found out, that TF and librosa STFT's match only for the case win_length==n_fft in librosa and frame_length==fft_length in TF. That's why rryan's colab example is working, but you can try that if you set frame_length!=fft_length, the amplitudes are very different (although visually, after plotting, the patterns look similar). Typical example - if you choose some win_length/frame_length and then you want to set n_fft/fft_length to the smallest power of 2 greater than win_length/frame_length, then the results will be different. So you need to stick with the inefficient FFT given by your window size... I don't know why it is so, but that's how it is, hopefully it will be helpful for someone.
The output of contrib_audio.decode_wav should be DecodeWav with { audio, sample_rate } and audio shape is (sample_rate, 1), so what is the purpose for getting first item of waveform and do transpose?
transwav = tf.transpose(waveform[0])
No straight forward way, since librosa stft uses center=True which does not comply with tf stft.
Had it been center=False, stft tf/librosa would give near enough results. see colab sniff
But even though, trying to import the librosa code into tf is a big headache. Here is what I started and gave up. Near but not near enough.
def pow2db_tf(X):
amin=1e-10
top_db=80.0
ref_value = 1.0
log10 = 2.302585092994046
log_spec = (10.0/log10) * tf.log(tf.maximum(amin, X))
log_spec -= (10.0/log10) * tf.log(tf.maximum(amin, ref_value))
pow2db = tf.maximum(log_spec, tf.reduce_max(log_spec) - top_db)
return pow2db
def librosa_feature_like_tf(x, sr=16000, n_fft=2048, n_mfcc=20):
mel_basis = librosa.filters.mel(sr, n_fft).astype(np.float32)
mel_basis = mel_basis.reshape(1, int(n_fft/2+1), -1)
tf_stft = tf.contrib.signal.stft(x, frame_length=n_fft, frame_step=hop_length, fft_length=n_fft)
print ("tf_stft", tf_stft.shape)
tf_S = tf.matmul(tf.abs(tf_stft), mel_basis);
print ("tf_S", tf_S.shape)
tfdct = tf.spectral.dct(pow2db_tf(tf_S), norm='ortho'); print ("tfdct", tfdct.shape)
print ("tfdct before cut", tfdct.shape)
tfdct = tfdct[:,:,:n_mfcc];
print ("tfdct afer cut", tfdct.shape)
#tfdct = tf.transpose(tfdct,[0,2,1]);print ("tfdct afer traspose", tfdct.shape)
return tfdct
x = tf.placeholder(tf.float32, shape=[None, 16000], name ='x')
tf_feature = librosa_feature_like_tf(x)
print("tf_feature", tf_feature.shape)
mfcc_rosa = librosa.feature.mfcc(wav, sr).T
print("mfcc_rosa", mfcc_rosa.shape)
For anyone still looking for this: I had a similar problem some time ago: Matching librosa's mel filterbanks/mel spectrogram to a tensorflow implementation. The solution was to use a different windowing approach for the spectrogram and librosa's mel matrix as constant tensor. See here and here.
I am trying to use tf.train.batch to run enqueue images in multiple threads. When the number of threads is 1, the code works fine. But when I set a higher number of threads I receive an error:
Failed precondition: Attempting to use uninitialized value Variable
[[Node: Variable/read = Identity[T=DT_INT32, _class=["loc:#Variable"], _device="/job:localhost/replica:0/task:0/cpu:0"](Variable)]]
The main thread has to run for some time under one second to index the database of folders and put it into tensor.
I tried to use sess.run([some_image]) before running tf.train.bath loop. In that case workers fail in the background first with the same error, and after that I receive my images.
I tried to use time.sleep(), but it does not seem to be possible to delay the workers.
I tried adding a dependency to the batch:
g = tf.get_default_graph()
with g.control_dependencies([init_one,init_two]):
example_batch = tf.train.batch([my_image])
where init_one, and init_two are tf.initialize_all(variables) and tf.initialize_local_variables()
the most relevant issue I could find is at: https://github.com/openai/universe-starter-agent/issues/44
Is there a way I could ask the synchronize worker threads with the main thread so that they don't race first and die out ?
A similar and easy to reproduce error with variable initialization happens when set up the epoch counter to anything that is not None Are there any potential solutions ? I've added the code needed to reproduce the error below:
def index_the_database(database_path):
"""indexes av4 database and returns two tensors of filesystem path: ligand files, and protein files"""
ligand_file_list = []
receptor_file_list = []
for ligand_file in glob(os.path.join(database_path, "*_ligand.av4")):
receptor_file = "/".join(ligand_file.split("/")[:-1]) + "/" + ligand_file.split("/")[-1][:4] + '.av4'
if os.path.exists(receptor_file):
ligand_file_list.append(ligand_file)
receptor_file_list.append(receptor_file)
index_list = range(len(ligand_file_list))
return index_list,ligand_file_list, receptor_file_list
index_list,ligand_file_list,receptor_file_list = index_the_database(database_path)
ligand_files = tf.convert_to_tensor(ligand_file_list,dtype=tf.string)
receptor_files = tf.convert_to_tensor(receptor_file_list,dtype=tf.string)
filename_queue = tf.train.slice_input_producer([ligand_files,receptor_files],num_epochs=10,shuffle=True)
serialized_ligand = tf.read_file(filename_queue[0])
serialized_receptor = tf.read_file(filename_queue[1])
image_one = tf.reduce_sum(tf.exp(tf.decode_raw(serialized_receptor,tf.float32)))
image_batch = tf.train.batch([image_one],100,num_threads=100)
init_two = tf.initialize_all_variables()
init_one = tf.initialize_local_variables()
sess = tf.Session()
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess,coord=coord)
sess.run([init_one])
sess.run([init_two])
while True:
print "next"
sess.run([image_batch])