Optimizer parameters missing in Pytorch - pytorch

I have 2 nets sharing one optimizer using different learning rate. Simple code shown as below:
optim = torch.optim.Adam([
{'params': A.parameters(), 'lr': args.A},
{'params': B.parameters(), 'lr': args.B}])
Is this right? I ask this because when I check parameters in optimizer (using code below), I found only 2 parameters.
for p in optim.param_groups:
outputs = ''
for k, v in p.items():
if k is 'params':
outputs += (k + ': ' + str(v[0].shape).ljust(30) + ' ')
else:
outputs += (k + ': ' + str(v).ljust(10) + ' ')
print(outputs)
Only 2 parameters are printed:
params: torch.Size([16, 1, 80]) lr: 1e-05 betas: (0.9, 0.999) eps: 1e-08 weight_decay: 0 amsgrad: False
params: torch.Size([30, 10]) lr: 1e-05 betas: (0.9, 0.999) eps: 1e-08 weight_decay: 0 amsgrad: False
Actually, 2 nets have more than 100 parameters. I thought all parameters will be printed. Why is this happening? Thank you!

You only print the first tensor of each param groups:
if k is 'params':
outputs += (k + ': ' + str(v[0].shape).ljust(30) + ' ') # only v[0] is printed!
Try and print all the parameters:
if k is 'params':
outputs += (k + ': ')
for vp in v:
outputs += (str(vp.shape).ljust(30) + ' ')

Related

Keras RNN univariate multi-steap-ahed, t+2 has better performance than t+1, input/timestep/RNN structure problem

I have a problem when using RNN and want to predict multiple steps ahead.
The code is 'working', but the output does not make sense, basically the t+2 is a lot more accurate than t+1 and the same goes for t+3, and it is very counterintuitive that the one-step-ahead output should be significantly less accurate.
The data setup is as follows;
We want to predict the total sales (across multiple platforms) for a given hour. The total sales data has a lack, so we do not have it continuously.
The sales on the internal platform are real-time, so no delay on this data.
Lastly, we have an external forecast, however, the forecast is static and is not revised very often, but we have it far into the future.
The forecasting problem is, we want to predict the next 4 hours of total sales. However, because the total sale data is delayed we already know, what our internal sales are for the first hour, and we also have the external forecast for all 4 hours. How do incorporate this into my model? Is my current setup the right method for this?
The input variables has shape (Samples, TimeStept, Features), where the first column in X_array[:,0,:] correspond to the observations that is first in the sequence, so it is the oldest part and similar for Y_train --> Y_train[TargetNames].columns = ['t + 1', 't + 2', 't + 3', 't + 4']
The code below is a simulation of the problem, meaning that the second element of AE/MAPE should be less than the first element --> t+2 has a lower error than t+1:
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.preprocessing import StandardScaler, MinMaxScaler
import math
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
import datetime as dt
pd.options.mode.chained_assignment = None # default='warn'
Make data
df = pd.date_range('2021-01-01', '2021-12-01', freq='H')
df = pd.DataFrame(df[0:len(df)-1], columns={'DateTime'})
df['TotalSales'] = 0
np.random.seed(1)
for i in df.index:
if i == df.index[0]:
df['TotalSales'].iloc[i] = 1
else:
x = df['TotalSales'].iloc[i-1] + np.random.normal(0, 1, 1)
if x < 0:
df['TotalSales'].iloc[i] = 1 + 0.1 * math.exp(x)
else:
df['TotalSales'].iloc[i] = 1 + 0.9 * x
df['InternalSales'] = 0.2 * df['TotalSales'] + np.random.normal(0, 0.2, len(df))
df['ExternalForecast'] = df['TotalSales'] + np.random.normal(0, 2, len(df))
df['ExternalForecast'][df['ExternalForecast']<0] = 0.1
df['InternalSales'].iloc[len(df)-3:] = np.nan # We do not know these observations
df['TotalSales'].iloc[len(df)-4:] = np.nan # We do not know these observations
df.set_index('DateTime', inplace=True)
df.tail()
Align data
df['InternalSales_Lead1'] = df['InternalSales'].shift(-1)
df['ExternalForecast_Lead2'] = df['ExternalForecast'].shift(-4) # typo df['ExternalForecast_Lead4'] =..
pd.set_option('display.max_columns', 5)
df.tail()
Setting
valid_start = '2021-10-01'
test_start = '2021-11-01'
Gran = 60 # minutes
Names = ['InternalSales_Lead1', 'ExternalForecast_Lead2']
Target = 'TotalSales'
AlternativeForecast = 'ExternalForecast'
TimeSteps = 24 # hours
HORIZON = 4 # step ahead
X_array = df.copy()
X_array = X_array[Names]
df.reset_index(inplace=True)
Data = df[df['DateTime'].dt.date.astype(str) < test_start]
scaler = StandardScaler().fit(Data[Names])
yScaler = MinMaxScaler().fit(np.array(Data[Target]).reshape(-1, 1))
df['Scaled_' + Target] = yScaler.transform(np.array(df[Target]).reshape(-1, 1))
X_array = pd.DataFrame(scaler.transform(X_array), index=X_array.index,columns=X_array.columns)
def LSTM_structure(Y, X, timestep, horizon, TargetName):
if TargetName==None:
print('TargetName must be specified')
Array_X = np.zeros(((len(X) - timestep + 1), timestep, len(X.columns)))
for variable in range(0,len(X.columns)):
col = X.columns[variable]
for t in range(timestep, len(X)+1):
# Array_X[t - timestep,:,variable] = np.array(X[col].iloc[(t - timestep):t]).T
Array_X[t - timestep, :, variable] = X[col].iloc[(t - timestep):t].values
if horizon ==1:
Y_LSTM = Y[(timestep - 1):]
Y_LSTM['t'+str(horizon)] = Y_LSTM[TargetName]
else:
Y_LSTM = Y[(timestep - 1):]
for t in range(1,horizon+1):
Y_LSTM['t + ' + str(t)] = Y_LSTM[TargetName].shift(-(t-1))
return Y_LSTM, Array_X
Y_total, X_array = LSTM_structure(Y=df[['DateTime', Target, 'Scaled_' + Target, AlternativeForecast]], X=X_array, timestep=TimeSteps, horizon=HORIZON, TargetName='Scaled_' + Target)
# X_array.shape = (7993, 24, 2)
Y_total.reset_index(drop=True, inplace=True)
Y_train = Y_total[Y_total['DateTime'].dt.date.astype(str) < valid_start]
X_train_scale = X_array[Y_train.index,:,:]
Y_Val = Y_total[(Y_total['DateTime'].dt.date.astype(str) >= valid_start) & (Y_total['DateTime'].dt.date.astype(str) < test_start)]
X_val_scale = X_array[Y_Val.index,:,:]
Y_test = Y_total[Y_total['DateTime'].dt.date.astype(str) >= test_start]
X_test_scale = X_array[Y_test.index,:,:]
Model
TargetNames = Y_total.filter(like='t + ').columns
LATENT_DIM = 5
BATCH_SIZE = 32
EPOCHS = 10
try:
del model
except Exception:
pass
model = keras.Sequential()
model.add(keras.layers.GRU(LATENT_DIM, input_shape=(TimeSteps, X_train_scale.shape[2])))
model.add(keras.layers.RepeatVector(HORIZON))
model.add(keras.layers.GRU(LATENT_DIM, return_sequences=True))
model.add(keras.layers.TimeDistributed(keras.layers.Dense(1)))
model.add(keras.layers.Flatten())
model.compile(optimizer='SGD', loss='mse')
model.summary()
earlystop = EarlyStopping(monitor='val_loss', min_delta=0, patience=3, restore_best_weights=True)
i = 1
np.random.seed(i)
tf.random.set_seed(i)
hist = model.fit(X_train_scale,
Y_train[TargetNames],
batch_size=BATCH_SIZE,
epochs=EPOCHS,
validation_data=(X_val_scale, Y_Val[TargetNames]),
callbacks=[earlystop],
verbose=1)
y_hat_scaled = model.predict(X_test_scale)
for i in range(1, HORIZON+1):
Y_test['Predict_t + ' + str(i)] = yScaler.inverse_transform(np.array(y_hat_scaled[:,i-1]).reshape(-1, 1))
Make format correct
for i in range(1, HORIZON + 1):
if i == 1:
Performance = Y_test[['DateTime', Target, AlternativeForecast,'Predict_t + '+ str(i)]]
else:
Temp = Y_test[['DateTime', 'Predict_t + '+ str(i)]]
Temp['DateTime'] = Temp['DateTime'] + dt.timedelta(minutes=Gran * (i-1))
Performance = pd.merge(Performance, Temp[['DateTime', 'Predict_t + '+ str(i)]], how='left', on='DateTime')
Plot
from matplotlib import pyplot as plt
plt.plot(Performance['DateTime'], Performance[Target], label=Target)
for i in range(1, HORIZON + 1):
plt.plot(Performance['DateTime'], Performance['Predict_t + '+ str(i)], label='Predict_t + '+ str(i))
plt.title('Model Performance')
plt.ylabel('MW')
plt.xlabel('Time')
plt.legend()
plt.show()
Performance
for i in range(1, HORIZON + 1):
ae= (Performance['Predict_t + '+ str(i)] - Performance[Target]).abs().mean()
mape = ((Performance['Predict_t + '+ str(i)] - Performance[Target]).abs()/Performance[Target]).mean() * 100
if i == 1:
AE= ae
MAPE = round(mape,2)
else:
AE= np.append(AE, ae)
MAPE = np.append(MAPE, round(mape,2))
# Alternative forecast
ae = (Performance[AlternativeForecast] - Performance[Target]).abs().mean()
mape = ((Performance[AlternativeForecast] - Performance[Target]).abs()/Performance[Target]).mean() * 100
AE= np.append(AE, ae)
MAPE = np.append(MAPE, round(mape, 2))
AE
MAPE
I hope one of you have time to help me with this problem of mine :-)

Python 3 encoding russian text

I have a dataset with Russian text, which looks like this:
I am trying to pre-process this dataset and split it to train,dev and testing datasets by using the following code:
# coding=utf-8
import os
import argparse
import xml.etree.ElementTree as ET
import random
import math
from collections import Counter
from utils import semeval2014term_to_aspectsentiment_hr
from copy import copy, deepcopy
parser = argparse.ArgumentParser(description='Generate finetuning corpus for restaurants.')
parser.add_argument('--noconfl',
action='store_true',
default=False,
help='Remove conflicting sentiments from labels')
parser.add_argument('--istrain',
action='store_true',
default=False,
help='If is a training set we split of 10% and output train_full, train_split, dev. Default is testset creating no split')
parser.add_argument("--files",
type=str,
nargs='+',
action="store",
help="File that contains the data used for training. Multiple paths will mix the datasets.")
parser.add_argument("--output_dir",
type=str,
action="store",
default="data/transformed/untitled",
help="output dir of the dataset(s)")
parser.add_argument("--upsample",
type=str,
action="store",
default=None,
help="please add a string with 3 numbers like '0.5 0.3 0.2' representing relative numbers of 'POS NEG NEU' adding to 1"
" which represents target distribution - only valid in non-confl case")
parser.add_argument("--seed",
type=int,
action="store",
default=41,
help="random seed, effects on upsampling and validationset")
args = parser.parse_args()
# 1. Load The Dataset
# 2. Create Bert-Pair Style Format
# 3. Save Train, Validation and so on
def split_shuffle_array(ratio, array, rseed):
# split_ratio_restaurant = .076 # for 150 sentence in conflicting case
# split_ratio_laptops = .101 # for 150 sentences in conflicting case
random.Random(rseed).shuffle(array)
m = math.floor(ratio * len(array))
return array[0:m], array[m::]
def create_sentence_pairs(sents, aspect_term_sentiments):
# create sentence_pairs
all_sentiments = []
sentence_pairs = []
labels = []
for ix, ats in enumerate(aspect_term_sentiments):
s = sents[ix]
for k, v in ats:
all_sentiments.append(v)
sentence_pairs.append((s, k))
labels.append(v)
counts = Counter(all_sentiments)
return sentence_pairs, labels, counts
def upsample_data(sentence_pairs, labels, target_ratios={'POS': 0.53, 'NEG': 0.21, 'NEU': 0.26}):
# one question: should we upsample sentencepairs, where the sentence only occurs once?!
print('Upsampling data ...')
# print(sentence_pairs, labels) # is list of pairs -> decide which pair to upsample ...
# 0. compute indeex subsets for every example
# 1. compute how many samples to sample ->
ix_subsets = {
'POS': [],
'NEG': [],
'NEU': []
}
ratios_subsets = {
'POS': 0,
'NEG': 0,
'NEU': 0
}
examples_to_add = {
'POS': 0,
'NEG': 0,
'NEU': 0
}
n = float(len(labels))
for ix, l in enumerate(labels):
ix_subsets[l].append(ix)
ratios_subsets[l] += (1.0 / n)
t_keys = target_ratios.keys()
tmp = [math.floor(target_ratios[k] * n) - len(ix_subsets[k]) for k in t_keys]
class_nothing_to_add = list(t_keys)[tmp.index(min(tmp))]
print(t_keys)
print(ratios_subsets)
print(tmp)
print(class_nothing_to_add)
# print(ix_subsets)
m = len(ix_subsets[class_nothing_to_add]) / target_ratios[class_nothing_to_add]
total_to_add = m - n
print(n, math.floor(m))
examples_to_add = {k: math.floor(target_ratios[k] * m - len(ix_subsets[k])) for k in t_keys}
print(examples_to_add) # so we need to add more neutral examples and more positiev ones
# downsampling would be set 0 the maximum amount of negative ones
# now select all the indices, with replacement because it can be more than double
new_samples = []
for k in t_keys:
new_samples.extend(random.Random(args.seed).choices(ix_subsets[k], k=examples_to_add[k]))
print(len(new_samples))
# now add all new samples to the dataset and shuffle it
new_sentence_pairs = copy(sentence_pairs)
new_labels = labels.copy()
for ix in new_samples:
new_sentence_pairs.append(copy(sentence_pairs[ix]))
new_labels.append(labels[ix])
random.Random(args.seed).shuffle(new_sentence_pairs)
random.Random(args.seed).shuffle(new_labels)
print(len(set(new_sentence_pairs)))
print(len(set(sentence_pairs)))
return new_sentence_pairs, new_labels
def export_dataset_to_xml(fn, sentence_pairs, labels):
# export in format semeval 2014, incomplete though! just for loading with existing dataloaders for ATSC
sentences_el = ET.Element('sentences')
sentimap_reverse = {
'POS': 'positive',
'NEU': 'neutral',
'NEG': 'negative',
'CONF': 'conflict'
}
for ix, (sentence, aspectterm) in enumerate(sentence_pairs):
# print(sentence)
sentiment = labels[ix]
sentence_el = ET.SubElement(sentences_el, 'sentence')
sentence_el.set('id', str(ix))
text = ET.SubElement(sentence_el, 'text')
text.text = str(sentence).strip()
aspect_terms_el = ET.SubElement(sentence_el, 'aspectTerms')
aspect_term_el = ET.SubElement(aspect_terms_el, 'aspectTerm')
aspect_term_el.set('term', aspectterm)
aspect_term_el.set('polarity', sentimap_reverse[sentiment])
aspect_term_el.set('from', str('0'))
aspect_term_el.set('to', str('0'))
def indent(elem, level=0):
i = "\n" + level * " "
j = "\n" + (level - 1) * " "
if len(elem):
if not elem.text or not elem.text.strip():
elem.text = i + " "
if not elem.tail or not elem.tail.strip():
elem.tail = i
for subelem in elem:
indent(subelem, level + 1)
if not elem.tail or not elem.tail.strip():
elem.tail = j
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = j
return elem
indent(sentences_el)
# mydata = ET.dump(sentences_el)
mydata = ET.tostring(sentences_el)
with open(fn, "wb") as f:
# f.write('<?xml version="1.0" encoding="UTF-8" standalone="yes"?>')
f.write(mydata)
f.close()
def save_dataset_to_tsv(fn, data):
pass
sentence_pairs_train_mixed = []
sentence_pairs_trainsplit_mixed = []
sentence_pairs_dev_mixed = []
sentence_pairs_test_mixed = []
labels_train_mixed = []
labels_trainsplit_mixed = []
labels_dev_mixed = []
labels_test_mixed = []
for fn in args.files:
print(args.output_dir)
if not os.path.exists(args.output_dir):
os.makedirs(args.output_dir)
print(fn)
sents_train, ats_train, idx2labels = semeval2014term_to_aspectsentiment_hr(fn,
remove_conflicting=args.noconfl)
sentence_pairs_train, labels_train, counts_train = create_sentence_pairs(sents_train, ats_train)
if args.istrain:
sents_dev, sents_trainsplit = split_shuffle_array(.1, sents_train, 41)
ats_dev, ats_trainsplit = split_shuffle_array(.1, ats_train, 41)
sentence_pairs_dev, labels_dev, counts_dev = create_sentence_pairs(sents_dev, ats_dev)
sentence_pairs_trainsplit, labels_trainsplit, counts_trainsplit = create_sentence_pairs(sents_trainsplit,
ats_trainsplit)
print_dataset_stats('Train', sents_train, sentence_pairs_train, counts_train)
print_dataset_stats('Dev', sents_dev, sentence_pairs_dev, counts_dev)
print_dataset_stats('TrainSplit', sents_trainsplit, sentence_pairs_trainsplit, counts_trainsplit)
sentence_pairs_trainsplit_mixed += sentence_pairs_trainsplit
sentence_pairs_train_mixed += sentence_pairs_train
sentence_pairs_dev_mixed += sentence_pairs_dev
labels_trainsplit_mixed += labels_trainsplit
labels_train_mixed += labels_train
labels_dev_mixed += labels_dev
if len(args.files) == 1:
if args.upsample:
distro_arr = args.upsample.split(' ')
pos = float(distro_arr[0])
neg = float(distro_arr[1])
neu = float(distro_arr[2])
assert pos + neg + neu == 1.0, 'upsampling target distribution does not sum to 1'
target_distro = {'POS': pos, 'NEG': neg, 'NEU': neu}
print('Target Sampling Distribution for Training Set:', target_distro)
sentence_pairs_train, labels_train = upsample_data(sentence_pairs_train, labels_train, target_ratios=target_distro)
export_dataset_to_xml(args.output_dir + '/train.xml', sentence_pairs_train, labels_train)
export_dataset_to_xml(args.output_dir + '/dev.xml', sentence_pairs_dev, labels_dev)
export_dataset_to_xml(args.output_dir + '/train_split.xml', sentence_pairs_trainsplit, labels_trainsplit)
else:
sentence_pairs_test_mixed += sentence_pairs_train
labels_test_mixed += labels_train
print_dataset_stats('Test', sents_train, sentence_pairs_train, counts_train)
if len(args.files) == 1:
export_dataset_to_xml(args.output_dir + '/test.xml', sentence_pairs_train, labels_train)
if len(args.files) > 1:
if args.istrain:
export_dataset_to_xml(args.output_dir + '/train.xml', sentence_pairs_train_mixed, labels_train_mixed)
export_dataset_to_xml(args.output_dir + '/dev.xml', sentence_pairs_dev_mixed, labels_dev_mixed)
export_dataset_to_xml(args.output_dir + '/train_split.xml', sentence_pairs_trainsplit_mixed,
labels_trainsplit_mixed)
else:
export_dataset_to_xml(args.output_dir + '/test.xml', sentence_pairs_test_mixed, labels_test_mixed)
After running the code above I have this result:
For English text it works just fine. Could someone help me to fix this and get normal text?
ET.tostring(sentences_el, encoding='UTF-8')

in fit_generator, training_generator was influenced by validation_generator

I met a strange problem in fit_generator
model.fit_generator(generate_arrays_from_file(trainSizeListImgDic[s], s, batchSize), steps_per_epoch=math.floor(size / batchSize), epochs=20,
verbose=2, validation_data=generate_arrays_from_file(testSizeListImgDic[s], s, batchSize),
validation_steps=vs,callbacks=[EarlyStoppingByAccVal(monitor='val_acc', value=0.90, verbose=1),checkpointer])
and my generate_arrays_from_file reads:
def generate_arrays_from_file( SizeListImg ,img_size,batch):
size = len(SizeListImg.images)
dim = re.split('[,()]', img_size)
dataX = np.zeros((batch, int(dim[1]), int(dim[2]), 1), dtype=np.float32)
dataY = np.zeros((batch, num_classes), dtype=np.float32)
loopcount = math.floor( size/batch )-1
if loopcount==0:
loopcount = 1
counter = 0
while (True):
i = random.randint(0,loopcount)
for ind in range( (i*batch) , (i + 1)*batch ):
try:
dataX[counter, :, :, 0] = SizeListImg.images[ind]
except :
print('dim='+ str(dim) )
print('error counter=' + str(counter) + " i="+str(i) + " ind=" + str(ind) + " batch="+str(batch) + "\n" )
print("SizeListImg.images="+str( len(SizeListImg.images) ) )
print( "img0 = "+SizeListImg.images_names[0])
print("img1 = " + SizeListImg.images_names[1])
print("img2 = " + SizeListImg.images_names[2])
print("img3 = " + SizeListImg.images_names[3])
for j, imgClass in enumerate(imgClasses):
dataY[counter, j] = (SizeListImg.labels[ind] == imgClass)
counter += 1
if counter>=batch:
yield (dataX,dataY)
counter = 0
dataX = np.zeros((batch, int(dim[1]), int(dim[2]), 1), dtype=np.float32) #not tf.zeros((25, 200, 200, 1)) please noted different: np.zeros(
dataY = np.zeros((batch, num_classes), dtype=np.float32)
During training I found size of training images 139 be reduced to the size of validation images 22, and that lead to the index wrong but the images indeed came from training images set. However if I reduce batch from 20 to 10, no any error.
Any conspiracy of fit_generator against me?

Why tensorflow1.1 gets slower and slower when training? Is it memory leak or queue starvation?

I trained a ESPCN in tensorflow1.1, the costed time per patch increase nearly linearly when training. The first 100 epoch takes only 4-5 seconds, but the 70th epoch takes about half a minute. See the training result below:
I've searched the same question on Google and Stack-overflow, and tried the solutions below, but seemed no work:
1.add tf.reset_default_graph() after every sess.run();
2.add time.sleep(5) to prevent queue starvation;
I know the general idea, that is to reduce the operations in Session(). But how? Anyone have the solution?
Here's part of my code:
L3, var_w_list, var_b_list = model_train(IN, FLAGS)
cost = tf.reduce_mean(tf.reduce_sum(tf.square(OUT - L3), reduction_indices=0))
global_step = tf.Variable(0, trainable=False)
learning_rate = tf.train.exponential_decay(FLAGS.base_lr, global_step * FLAGS.batch_size, FLAGS.decay_step, 0.96, staircase=True)
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost, global_step = global_step, var_list = var_w_list + var_b_list)
# optimizer = tf.train.MomentumOptimizer(learning_rate, 0.9).minimize(cost, var_list = var_w_list + var_b_list)
cnt = 0
with tf.Session() as sess:
init_op = tf.initialize_all_variables()
sess.run(init_op)
saver = tf.train.Saver()
ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
print('\n\n\n =========== All initialization finished, now training begins ===========\n\n\n')
t_start = time.time()
t1 = t_start
for i in range(1, FLAGS.max_Epoch + 1):
LR_batch, HR_batch = batch.__next__()
global_step += 1
[_, cost1] = sess.run([optimizer, cost], feed_dict = {IN: LR_batch, OUT: HR_batch})
# tf.reset_default_graph()
if i % 100 == 0 or i == 1:
print_step = i
print_loss = cost1 / FLAGS.batch_size
test_LR_batch, test_HR_batch = test_batch.__next__()
test_SR_batch = test_HR_batch.copy()
test_SR_batch[:,:,:,0:3] = sess.run(L3, feed_dict = {IN: test_LR_batch[:,:,:,0:3]})
# tf.reset_default_graph()
psnr_tmp = 0.0
ssim_tmp = 0.0
for k in range(test_SR_batch.shape[0]):
com1 = test_SR_batch[k, :, :, 0]
com2 = test_HR_batch[k, :, :, 0]
psnr_tmp += get_psnr(com1, com2, FLAGS.HR_size, FLAGS.HR_size)
ssim_tmp += get_ssim(com1, com2, FLAGS.HR_size, FLAGS.HR_size)
psnr[cnt] = psnr_tmp / test_SR_batch.shape[0]
ssim[cnt] = ssim_tmp / test_SR_batch.shape[0]
ep[cnt] = print_step
t2 = time.time()
print_time = t2 - t1
t1 = t2
print(("[Epoch] : {0:d} [Current cost] : {1:5.8f} \t [Validation PSNR] : {2:5.8f} \t [Duration time] : {3:10.8f} s \n").format(print_step, print_loss, psnr[cnt], print_time))
# tf.reset_default_graph()
cnt += 1
if i % 1000 == 0:
L3_test = model_test(IN_TEST, var_w_list, var_b_list, FLAGS)
output_img = single_HR.copy()
output_img[:,:,:,0:3] = sess.run(L3_test, feed_dict = {IN_TEST:single_LR[:,:,:,0:3]})
tf.reset_default_graph()
subname = FLAGS.img_save_dir + '/' + str(i) + ".jpg"
img_gen(output_img[0,:,:,:], subname)
print(('================= Saving model to {}/model.ckpt ================= \n').format(FLAGS.checkpoint_dir))
time.sleep(5)
# saver.save(sess, FLAGS.checkpoint_dir + '/model.ckpt', print_step)
t_tmp = time.time() - t_start
My configuration is: windows10 + tf1.1 + python3.5 + cuda8.0 + cudnn5.1
================================================================
Besides, I used pixel-shuffle(PS) layer instead of deconvolution in the last layer. I copied the PS code from others, which is shown below:
def _phase_shift(I, r):
bsize, a, b, c = I.get_shape().as_list()
bsize = tf.shape(I)[0] # Handling Dimension(None) type for undefined batch dim
X = tf.reshape(I, (bsize, a, b, r, r))
X = tf.transpose(X, (0, 1, 2, 4, 3)) # bsize, a, b, 1, 1
X = tf.split(X, a, 1) # a, [bsize, b, r, r]
X = tf.concat([tf.squeeze(x, axis=1) for x in X], 2) # bsize, b, a*r, r
X = tf.split(X, b, 1) # b, [bsize, a*r, r]
X = tf.concat([tf.squeeze(x, axis=1) for x in X], 2) # bsize, a*r, b*r
return tf.reshape(X, (bsize, a*r, b*r, 1))
def PS(X, r, color=False):
if color:
Xc = tf.split(X, 3, 3)
X = tf.concat([_phase_shift(x, r) for x in Xc], 3)
else:
X = _phase_shift(X, r)
return X
Which X is the 4-dimensional image tensor, r means the up-scaling factor, color determine whether the channel of images is 3(Ycbcr format) or 1(Grayscale format).
To use the layer is very simple, just like the tf.nn.relu() does:
L3_ps = PS(L3, scale, True)
Now I'm wondering whether this layer caused the slowing-down, because the program goes well when using deconvolution layer. Using deconvolution layer may be a solution, but I have to use PS layer for some reason.
I suspect this line is causing a memory leak (although without seeing the code, I can't say for certain):
L3_test = model_test(IN_TEST, var_w_list, var_b_list, FLAGS)
L3_test seems to be a tf.Tensor (because you later pass it to sess.run(), so it seems likely that model_test() is adding new nodes to the graph each time it is called (every 1000 steps), which causes more work to be done over time.
The solution is quite simple though: since model_test() does not depend on anything calculated in the training loop, you can move the call to outside the training loop, so it is only called once.

How to modify Keras Siamese Network example?

I've tried to change code from Keras example about siamese network. But the weird thing is that the accuracy is always be 0.5000, regardless of the loss decrement. My hypothesis for now is that i was wrongly modify the create_pair function, i wanna try to change the number of classes into 4:
Original:
def create_pairs(x, digit_indices):
'''Positive and negative pair creation.
Alternates between positive and negative pairs.
'''
pairs = []
labels = []
n = min([len(digit_indices[d]) for d in range(10)]) - 1
for d in range(10):
for i in range(n):
z1, z2 = digit_indices[d][i], digit_indices[d][i + 1]
pairs += [[x[z1], x[z2]]]
inc = random.randrange(1, 10)
dn = (d + inc) % 10
z1, z2 = digit_indices[d][i], digit_indices[dn][i]
pairs += [[x[z1], x[z2]]]
labels += [1, 0]
return np.array(pairs), np.array(labels)
and, in line 93-97:
digit_indices = [np.where(y_train == i)[0] for i in range(10)]
tr_pairs, tr_y = create_pairs(x_train, digit_indices)
digit_indices = [np.where(y_test == i)[0] for i in range(10)]
te_pairs, te_y = create_pairs(x_test, digit_indices)
This is my code :
def create_pairs(x, digit_indices):
'''Positive and negative pair creation.
Alternates between positive and negative pairs.
'''
pairs = []
labels = []
n = min([len(digit_indices[d]) for d in range(4)]) - 1
for d in range(4):
for i in range(n):
z1, z2 = digit_indices[d][i], digit_indices[d][i + 1]
pairs += [[x[z1], x[z2]]]
inc = random.randrange(1, 4)
dn = (d + inc) % 4
z1, z2 = digit_indices[d][i], digit_indices[dn][i]
pairs += [[x[z1], x[z2]]]
labels += [1, 0]
return np.array(pairs), np.array(labels)
and, in line 93-97:
digit_indices = [np.where(y_train == i)[0] for i in range(4)]
tr_pairs, tr_y = create_pairs(x_train, digit_indices)
digit_indices = [np.where(y_test == i)[0] for i in range(4)]
te_pairs, te_y = create_pairs(x_test, digit_indices)
And here's my base_network (the one that use RNN, not the conv net i've talked about in the comment reply, both give the same result, 50% of accuracy):
def create_base_network(embedding_layer):
seq = Sequential()
seq.add(embedding_layer)
seq.add(GRU(512, use_bias=True, dropout=0.5, recurrent_dropout=0.5, return_sequences=True))
seq.add(GRU(512, use_bias=True, dropout=0.5, recurrent_dropout=0.5))
seq.add(Dense(512, activation='relu'))
seq.add(Dropout(0.1))
seq.add(Dense(512, activation='relu'))
return seq
The embedding layer is just a simple glove matrix. And i also add another dense layer using sigmoid activation function after the merging.
Anything missing? Or that is not how i should change it? Thanks in advance
The Siamese code is wrong and not yet fixed. The problem is the loss function that is not symmetric in switching 0 and 1, but the keras code assume that it is.
Change this line
return K.mean(y_true * K.square(y_pred) + (1 - y_true) * K.square(K.maximum(margin - y_pred, 0)))
into
return K.mean((1 - y_true) * K.square(y_pred) + y_true * K.square(K.maximum(margin - y_pred, 0)))
and
labels += [1, 0]
into
labels += [0, 1]

Resources