Running multiple inferences in parallel with PyTorch - multithreading

I'm trying to implement Double DQN (not to be confused with DQN with a slightly delayed Q-target network) in PyTorch to train an agent to play an Atari OpenAI Gym game. Here I discuss the implementation of the following formula:
Update of Q-network, formula taken from Sutton & Barto.
My first implementation is:
Q_pred = self.Q_1.forward(s_now)[T.arange(batch_size), actions.long()]
Q_next_all = self.Q_1.forward(s_next)
maxA_id = T.argmax(Q_next_all, dim=1)
Q_pred2 = self.Q_2.forward(s_next)[T.arange(batch_size), maxA_id]
Q_target = (rewards + (~dones) * self.GAMMA * Q_pred2).detach()
self.Q_1.optimizer.zero_grad()
self.Q_1.loss(Q_target, Q_pred).backward()
self.Q_1.optimizer.step()
(Q_1 and Q_2 are nn.Module classes, and all of the variables involved here are already torch tensors lying in the GPU.)
I noticed that my program ran much slower than a previous implementation which used plain DQN.
I realized that I can combine the batches entering Q_1, so there will be one combined batch being forwarded in the neural network, instead of two batches in sequence. The code becomes:
s_combined = T.cat((s_now, s_next))
Q_combined = self.Q_1.forward(s_combined)
Q_pred = Q_combined[T.arange(batch_size), actions.long()]
Q_next_all = Q_combined[batch_size:]
Q_pred2_all = self.Q_2.forward(s_next)
maxA_id = T.argmax(Q_next_all, dim=1)
Q_pred2 = Q_pred2_all[T.arange(batch_size), maxA_id]
Q_target = (rewards + (~dones) * self.GAMMA * Q_pred2).detach()
self.Q_1.optimizer.zero_grad()
self.Q_1.loss(Q_target, Q_pred).backward()
self.Q_1.optimizer.step()
(This proves that I understand how to do batch training in PyTorch, so don't mark this as a duplicate of this question.)
Furthermore, I realized that Q_1 and Q_2 can process their batches in parallel. So I looked up how to do multiprocessing in PyTorch. Unfortunately, I couldn't find a good example. I tried to adapt a code that looks similar to my scenario, and my code becomes:
def spawned():
s_combined = T.cat((s_now, s_next))
Q_combined = self.Q_1.forward(s_combined)
Q_pred = Q_combined[T.arange(batch_size), actions.long()]
Q_next_all = Q_combined[batch_size:]
mp.set_start_method('spawn', force=True)
p = mp.Process(target=spawned)
p.start()
Q_pred2_all = self.Q_2.forward(s_next)
p.join()
maxA_id = T.argmax(Q_next_all, dim=1)
Q_pred2 = Q_pred2_all[T.arange(batch_size), maxA_id]
Q_target = (rewards + (~dones) * self.GAMMA * Q_pred2).detach()
self.Q_1.optimizer.zero_grad()
self.Q_1.loss(Q_target, Q_pred).backward()
self.Q_1.optimizer.step()
This crashes with the error message:
AttributeError: Can't pickle local object 'Agent.learn.<locals>.spawned'
So how do I make this work?
(Achieving this in CUDA programming is trivial. One simply launches two device kernels using a sequential host code, and the two kernels are automatically computed in parallel in the GPU.)

Related

How to integrate pytorch lightning profiler with tensorboard?

I know we can use torch profiler with tensorboard using something like this:
with torch.profiler.profile(
schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=2),
on_trace_ready=torch.profiler.tensorboard_trace_handler('./log/resnet18'),
record_shapes=True,
with_stack=True
) as prof:
for step, batch_data in enumerate(train_loader):
if step >= (1 + 1 + 3) * 2:
break
train(batch_data)
prof.step() # Need to call this at the end of each step to notify profiler of steps' boundary.
It works perfectly with pytorch, but the problem is I have to use pytorch lightning and if I put this in my training step, it just doesn't create the log file nor does it create an entry for profiler. All I get is lightning_logs which isn't the profiler output. I couldn't find anything in the docs about lightning_profiler and tensorboard so does anyone have any idea?
Here's what my training function looks like:
def training_step(self, train_batch, batch_idx):
with torch.profiler.profile(
activities=[ProfilerActivity.CPU],
schedule=torch.profiler.schedule(
wait=1,
warmup=1,
active=2,
repeat=1),
with_stack=True,
on_trace_ready=torch.profiler.tensorboard_trace_handler('./logs'),
) as profiler:
x, y = train_batch
x = x.float()
logits = self.forward(x)
loss = self.loss_fn(logits, y)
profiler.step()
return loss
You don't have to use raw torch.profiler at all. There is a whole page in Lightning Docs dedicated to Profiling ..
.. and its as easy as passing a trainer flag called profiler like
# other profilers are "simple", "advanced" etc
trainer = pl.Trainer(profiler="pytorch")
Also, set TensorBoardLogger as your preferred logger as you normally do
trainer = pl.Trainer(profiler="pytorch", logger=TensorBoardLogger(..))

QuantLib-python pricing barrier option using Heston model

I have recently started exploring the QuantLib option pricing libraries for python and have come across an error that I don't seem to understand. Basically, I am trying to price an Up&Out Barrier option using the Heston model. The code that I have written has been taken from examples found online and adapted to my specific case. Essentially, the problem is that when I run the code below I get an error that I believe is triggered at the last line of the code, i.e. the european_option.NPV() function
*** RuntimeError: wrong argument type
Can someone please explain me what I am doing wrong?
# option inputs
maturity_date = ql.Date(30, 6, 2020)
spot_price = 969.74
strike_price = 1000
volatility = 0.20
dividend_rate = 0.0
option_type = ql.Option.Call
risk_free_rate = 0.0016
day_count = ql.Actual365Fixed()
calculation_date = ql.Date(26, 6, 2020)
ql.Settings.instance().evaluationDate = calculation_date
# construct the option payoff
european_option = ql.BarrierOption(ql.Barrier.UpOut, Barrier, Rebate,
ql.PlainVanillaPayoff(option_type, strike_price),
ql.EuropeanExercise(maturity_date))
# set the Heston parameters
v0 = volatility*volatility # spot variance
kappa = 0.1
theta = v0
hsigma = 0.1
rho = -0.75
spot_handle = ql.QuoteHandle(ql.SimpleQuote(spot_price))
# construct the Heston process
flat_ts = ql.YieldTermStructureHandle(ql.FlatForward(calculation_date,
risk_free_rate, day_count))
dividend_yield = ql.YieldTermStructureHandle(ql.FlatForward(calculation_date,
dividend_rate, day_count))
heston_process = ql.HestonProcess(flat_ts, dividend_yield,
spot_handle, v0, kappa,
theta, hsigma, rho)
# run the pricing engine
engine = ql.AnalyticHestonEngine(ql.HestonModel(heston_process),0.01, 1000)
european_option.setPricingEngine(engine)
h_price = european_option.NPV()
The problem is that the AnalyticHestonEngine is not able to price Barrier options.
Check here https://www.quantlib.org/reference/group__barrierengines.html for a list of Barrier Option pricing engines.

Using multiple self-defined metrics in LightGBM

Given that we could use self-defined metric in LightGBM and use parameter 'feval' to call it during training.
And for given metric, we could define it in the parameter dict like metric:(l1, l2)
My question is that how call several self-defined metric at the same time? I cannot use feval=(my_metric1, my_metric2) to get the result
params = {}
params['learning_rate'] = 0.003
params['boosting_type'] = 'goss'
params['objective'] = 'multiclassova'
params['metric'] = ['multi_error', 'multi_logloss']
params['sub_feature'] = 0.8
params['num_leaves'] = 15
params['min_data'] = 600
params['tree_learner'] = 'voting'
params['bagging_freq'] = 3
params['num_class'] = 3
params['max_depth'] = -1
params['max_bin'] = 512
params['verbose'] = -1
params['is_unbalance'] = True
evals_result = {}
aa = lgb.train(params,
d_train,
valid_sets=[d_train, d_dev],
evals_result=evals_result,
num_boost_round=4500,
feature_name=f_names,
verbose_eval=10,
categorical_feature = f_names,
learning_rates=lambda iter: (1 / (1 + decay_rate * iter)) * params['learning_rate'])
Lets' discuss on the code I share here. d_train is my training set. d_dev is my validation set (I have a different test set.) evals_result will record our multi_error and multi_logloss per iteration as a list. verbose_eval = 10 will make LightGBM print multi_error and multi_logloss of both training set and validation set at every 10 iterations. If you want to plot multi_error and multi_logloss as a graph:
lgb.plot_metric(evals_result, metric='multi_error')
plt.show()
lgb.plot_metric(evals_result, metric='multi_logloss')
plt.show()
You can find other useful functions from LightGBM documentation. If you can't find what you need, go to XGBoost documentation, a simple trick. If there is something missing, please do not hesitate to ask more.

is it possible to get exactly the same results from tensorflow mfcc and librosa mfcc?

I'm trying to make tensorflow mfcc give me the same results as python lybrosa mfcc
i have tried to match all the default parameters that are used by librosa
in my tensorflow code and got a different result
this is the tensorflow code that i have used :
waveform = contrib_audio.decode_wav(
audio_binary,
desired_channels=1,
desired_samples=sample_rate,
name='decoded_sample_data')
sample_rate = 16000
transwav = tf.transpose(waveform[0])
stfts = tf.contrib.signal.stft(transwav,
frame_length=2048,
frame_step=512,
fft_length=2048,
window_fn=functools.partial(tf.contrib.signal.hann_window,
periodic=False),
pad_end=True)
spectrograms = tf.abs(stfts)
num_spectrogram_bins = stfts.shape[-1].value
lower_edge_hertz, upper_edge_hertz, num_mel_bins = 0.0,8000.0, 128
linear_to_mel_weight_matrix =
tf.contrib.signal.linear_to_mel_weight_matrix(
num_mel_bins, num_spectrogram_bins, sample_rate, lower_edge_hertz,
upper_edge_hertz)
mel_spectrograms = tf.tensordot(
spectrograms,
linear_to_mel_weight_matrix, 1)
mel_spectrograms.set_shape(spectrograms.shape[:-1].concatenate(
linear_to_mel_weight_matrix.shape[-1:]))
log_mel_spectrograms = tf.log(mel_spectrograms + 1e-6)
mfccs = tf.contrib.signal.mfccs_from_log_mel_spectrograms(
log_mel_spectrograms)[..., :20]
the equivalent in librosa:
libr_mfcc = librosa.feature.mfcc(wav, 16000)
the following are the graphs of the results:
I'm the author of tf.signal. Sorry for not seeing this post sooner, but you can get librosa and tf.signal.stft to match if you center-pad the signal before passing it to tf.signal.stft. See this GitHub issue for more details.
I spent a whole 1 day trying to make them match. Even the rryan's solution didn't work for me (center=False in librosa), but I finally found out, that TF and librosa STFT's match only for the case win_length==n_fft in librosa and frame_length==fft_length in TF. That's why rryan's colab example is working, but you can try that if you set frame_length!=fft_length, the amplitudes are very different (although visually, after plotting, the patterns look similar). Typical example - if you choose some win_length/frame_length and then you want to set n_fft/fft_length to the smallest power of 2 greater than win_length/frame_length, then the results will be different. So you need to stick with the inefficient FFT given by your window size... I don't know why it is so, but that's how it is, hopefully it will be helpful for someone.
The output of contrib_audio.decode_wav should be DecodeWav with { audio, sample_rate } and audio shape is (sample_rate, 1), so what is the purpose for getting first item of waveform and do transpose?
transwav = tf.transpose(waveform[0])
No straight forward way, since librosa stft uses center=True which does not comply with tf stft.
Had it been center=False, stft tf/librosa would give near enough results. see colab sniff
But even though, trying to import the librosa code into tf is a big headache. Here is what I started and gave up. Near but not near enough.
def pow2db_tf(X):
amin=1e-10
top_db=80.0
ref_value = 1.0
log10 = 2.302585092994046
log_spec = (10.0/log10) * tf.log(tf.maximum(amin, X))
log_spec -= (10.0/log10) * tf.log(tf.maximum(amin, ref_value))
pow2db = tf.maximum(log_spec, tf.reduce_max(log_spec) - top_db)
return pow2db
def librosa_feature_like_tf(x, sr=16000, n_fft=2048, n_mfcc=20):
mel_basis = librosa.filters.mel(sr, n_fft).astype(np.float32)
mel_basis = mel_basis.reshape(1, int(n_fft/2+1), -1)
tf_stft = tf.contrib.signal.stft(x, frame_length=n_fft, frame_step=hop_length, fft_length=n_fft)
print ("tf_stft", tf_stft.shape)
tf_S = tf.matmul(tf.abs(tf_stft), mel_basis);
print ("tf_S", tf_S.shape)
tfdct = tf.spectral.dct(pow2db_tf(tf_S), norm='ortho'); print ("tfdct", tfdct.shape)
print ("tfdct before cut", tfdct.shape)
tfdct = tfdct[:,:,:n_mfcc];
print ("tfdct afer cut", tfdct.shape)
#tfdct = tf.transpose(tfdct,[0,2,1]);print ("tfdct afer traspose", tfdct.shape)
return tfdct
x = tf.placeholder(tf.float32, shape=[None, 16000], name ='x')
tf_feature = librosa_feature_like_tf(x)
print("tf_feature", tf_feature.shape)
mfcc_rosa = librosa.feature.mfcc(wav, sr).T
print("mfcc_rosa", mfcc_rosa.shape)
For anyone still looking for this: I had a similar problem some time ago: Matching librosa's mel filterbanks/mel spectrogram to a tensorflow implementation. The solution was to use a different windowing approach for the spectrogram and librosa's mel matrix as constant tensor. See here and here.

tensorflow workers start before the main thread could initialize variables

I am trying to use tf.train.batch to run enqueue images in multiple threads. When the number of threads is 1, the code works fine. But when I set a higher number of threads I receive an error:
Failed precondition: Attempting to use uninitialized value Variable
[[Node: Variable/read = Identity[T=DT_INT32, _class=["loc:#Variable"], _device="/job:localhost/replica:0/task:0/cpu:0"](Variable)]]
The main thread has to run for some time under one second to index the database of folders and put it into tensor.
I tried to use sess.run([some_image]) before running tf.train.bath loop. In that case workers fail in the background first with the same error, and after that I receive my images.
I tried to use time.sleep(), but it does not seem to be possible to delay the workers.
I tried adding a dependency to the batch:
g = tf.get_default_graph()
with g.control_dependencies([init_one,init_two]):
example_batch = tf.train.batch([my_image])
where init_one, and init_two are tf.initialize_all(variables) and tf.initialize_local_variables()
the most relevant issue I could find is at: https://github.com/openai/universe-starter-agent/issues/44
Is there a way I could ask the synchronize worker threads with the main thread so that they don't race first and die out ?
A similar and easy to reproduce error with variable initialization happens when set up the epoch counter to anything that is not None Are there any potential solutions ? I've added the code needed to reproduce the error below:
def index_the_database(database_path):
"""indexes av4 database and returns two tensors of filesystem path: ligand files, and protein files"""
ligand_file_list = []
receptor_file_list = []
for ligand_file in glob(os.path.join(database_path, "*_ligand.av4")):
receptor_file = "/".join(ligand_file.split("/")[:-1]) + "/" + ligand_file.split("/")[-1][:4] + '.av4'
if os.path.exists(receptor_file):
ligand_file_list.append(ligand_file)
receptor_file_list.append(receptor_file)
index_list = range(len(ligand_file_list))
return index_list,ligand_file_list, receptor_file_list
index_list,ligand_file_list,receptor_file_list = index_the_database(database_path)
ligand_files = tf.convert_to_tensor(ligand_file_list,dtype=tf.string)
receptor_files = tf.convert_to_tensor(receptor_file_list,dtype=tf.string)
filename_queue = tf.train.slice_input_producer([ligand_files,receptor_files],num_epochs=10,shuffle=True)
serialized_ligand = tf.read_file(filename_queue[0])
serialized_receptor = tf.read_file(filename_queue[1])
image_one = tf.reduce_sum(tf.exp(tf.decode_raw(serialized_receptor,tf.float32)))
image_batch = tf.train.batch([image_one],100,num_threads=100)
init_two = tf.initialize_all_variables()
init_one = tf.initialize_local_variables()
sess = tf.Session()
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess,coord=coord)
sess.run([init_one])
sess.run([init_two])
while True:
print "next"
sess.run([image_batch])

Resources