Can someone explain to me why these seemingly same pieces of code yield different results in using my function on test data from a MNIST dataset('http://yann.lecun.com/exdb/mnist/')?
1
def fit_generative_model1(x,y):
k = 10 # labels 0,1,...,k-1
d = (x.shape)[1] # number of features
mu = np.zeros((k,d))
sigma = np.zeros((k,d,d))
pi = np.zeros(k)
for label in range(0,k):
c = 4000
indices = (y == label)
mu[label] = np.mean(x[indices,:], axis=0)
sigma[label] = np.cov(x[indices,:], rowvar=0, bias=1) + c*np.identity(784, dtype=float) # regularizing the matrix
return mu, sigma, pi
mu, sigma, pi = fit_generative_model1(train_data, train_labels)
2
def fit_generative_model2(x,y):
k = 10 # labels 0,1,...,k-1
d = (x.shape)[1] # number of features
mu = np.zeros((k,d))
sigma = np.zeros((k,d,d))
pi = np.zeros(k)
for label in range(0,k):
indices= (y==label)
mu[label]= np.mean(x[indices,:],axis=0)
sigma[label]= np.cov(x[indices,:], rowvar=0, bias=1) + 4000*np.eye(784) #Regularized
pi[label]= float(sum(indices))/float(len(y))
return mu, sigma, pi
mu, sigma, pi = fit_generative_model2(train_data, train_labels)
Next, making predictions on the test data.
# Compute log Pr(label|image) for each [test image,label] pair.
k = 10
score = np.zeros((len(test_labels),k))
for label in range(0,k):
rv = multivariate_normal(mean=mu[label], cov=sigma[label])
for i in range(0,len(test_labels)):
score[i,label] = np.log(pi[label]) + rv.logpdf(test_data[i,:])
predictions = np.argmax(score, axis=1)
# Finally, tally up score
errors = np.sum(predictions != test_labels)
print("Your model makes " + str(errors) + " errors out of 10000")
print("This is " + str(errors/100) + "% error rate")
#with model1 -??
Your model makes 9020 errors out of 10000
This is 90.2% error rate
#with model2 - all good.
Your model makes 431 errors out of 10000
This is 4.31% error rate
I am also struggling to make the loop to work for the best c.
for c in [20,2000, 4000]:
k = 10
score = np.zeros((len(test_labels),k))
for label in range(0,k):
rv = multivariate_normal(mean=mu[label], cov=sigma[label])
for i in range(0,len(test_labels)):
score[i,label] = np.log(pi[label]) + rv.logpdf(test_data[i,:])
predictions = np.argmax(score, axis=1)
errors = np.sum(predictions != test_labels)
print("Model with "+ str(c) + " has a " + str(errors/100) + " error rate")
Model with 20 has a 4.31 error rate
Model with 2000 has a 4.31 error rate
Model with 4000 has a 4.31 error rate
Related
I have a problem when using RNN and want to predict multiple steps ahead.
The code is 'working', but the output does not make sense, basically the t+2 is a lot more accurate than t+1 and the same goes for t+3, and it is very counterintuitive that the one-step-ahead output should be significantly less accurate.
The data setup is as follows;
We want to predict the total sales (across multiple platforms) for a given hour. The total sales data has a lack, so we do not have it continuously.
The sales on the internal platform are real-time, so no delay on this data.
Lastly, we have an external forecast, however, the forecast is static and is not revised very often, but we have it far into the future.
The forecasting problem is, we want to predict the next 4 hours of total sales. However, because the total sale data is delayed we already know, what our internal sales are for the first hour, and we also have the external forecast for all 4 hours. How do incorporate this into my model? Is my current setup the right method for this?
The input variables has shape (Samples, TimeStept, Features), where the first column in X_array[:,0,:] correspond to the observations that is first in the sequence, so it is the oldest part and similar for Y_train --> Y_train[TargetNames].columns = ['t + 1', 't + 2', 't + 3', 't + 4']
The code below is a simulation of the problem, meaning that the second element of AE/MAPE should be less than the first element --> t+2 has a lower error than t+1:
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.preprocessing import StandardScaler, MinMaxScaler
import math
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
import datetime as dt
pd.options.mode.chained_assignment = None # default='warn'
Make data
df = pd.date_range('2021-01-01', '2021-12-01', freq='H')
df = pd.DataFrame(df[0:len(df)-1], columns={'DateTime'})
df['TotalSales'] = 0
np.random.seed(1)
for i in df.index:
if i == df.index[0]:
df['TotalSales'].iloc[i] = 1
else:
x = df['TotalSales'].iloc[i-1] + np.random.normal(0, 1, 1)
if x < 0:
df['TotalSales'].iloc[i] = 1 + 0.1 * math.exp(x)
else:
df['TotalSales'].iloc[i] = 1 + 0.9 * x
df['InternalSales'] = 0.2 * df['TotalSales'] + np.random.normal(0, 0.2, len(df))
df['ExternalForecast'] = df['TotalSales'] + np.random.normal(0, 2, len(df))
df['ExternalForecast'][df['ExternalForecast']<0] = 0.1
df['InternalSales'].iloc[len(df)-3:] = np.nan # We do not know these observations
df['TotalSales'].iloc[len(df)-4:] = np.nan # We do not know these observations
df.set_index('DateTime', inplace=True)
df.tail()
Align data
df['InternalSales_Lead1'] = df['InternalSales'].shift(-1)
df['ExternalForecast_Lead2'] = df['ExternalForecast'].shift(-4) # typo df['ExternalForecast_Lead4'] =..
pd.set_option('display.max_columns', 5)
df.tail()
Setting
valid_start = '2021-10-01'
test_start = '2021-11-01'
Gran = 60 # minutes
Names = ['InternalSales_Lead1', 'ExternalForecast_Lead2']
Target = 'TotalSales'
AlternativeForecast = 'ExternalForecast'
TimeSteps = 24 # hours
HORIZON = 4 # step ahead
X_array = df.copy()
X_array = X_array[Names]
df.reset_index(inplace=True)
Data = df[df['DateTime'].dt.date.astype(str) < test_start]
scaler = StandardScaler().fit(Data[Names])
yScaler = MinMaxScaler().fit(np.array(Data[Target]).reshape(-1, 1))
df['Scaled_' + Target] = yScaler.transform(np.array(df[Target]).reshape(-1, 1))
X_array = pd.DataFrame(scaler.transform(X_array), index=X_array.index,columns=X_array.columns)
def LSTM_structure(Y, X, timestep, horizon, TargetName):
if TargetName==None:
print('TargetName must be specified')
Array_X = np.zeros(((len(X) - timestep + 1), timestep, len(X.columns)))
for variable in range(0,len(X.columns)):
col = X.columns[variable]
for t in range(timestep, len(X)+1):
# Array_X[t - timestep,:,variable] = np.array(X[col].iloc[(t - timestep):t]).T
Array_X[t - timestep, :, variable] = X[col].iloc[(t - timestep):t].values
if horizon ==1:
Y_LSTM = Y[(timestep - 1):]
Y_LSTM['t'+str(horizon)] = Y_LSTM[TargetName]
else:
Y_LSTM = Y[(timestep - 1):]
for t in range(1,horizon+1):
Y_LSTM['t + ' + str(t)] = Y_LSTM[TargetName].shift(-(t-1))
return Y_LSTM, Array_X
Y_total, X_array = LSTM_structure(Y=df[['DateTime', Target, 'Scaled_' + Target, AlternativeForecast]], X=X_array, timestep=TimeSteps, horizon=HORIZON, TargetName='Scaled_' + Target)
# X_array.shape = (7993, 24, 2)
Y_total.reset_index(drop=True, inplace=True)
Y_train = Y_total[Y_total['DateTime'].dt.date.astype(str) < valid_start]
X_train_scale = X_array[Y_train.index,:,:]
Y_Val = Y_total[(Y_total['DateTime'].dt.date.astype(str) >= valid_start) & (Y_total['DateTime'].dt.date.astype(str) < test_start)]
X_val_scale = X_array[Y_Val.index,:,:]
Y_test = Y_total[Y_total['DateTime'].dt.date.astype(str) >= test_start]
X_test_scale = X_array[Y_test.index,:,:]
Model
TargetNames = Y_total.filter(like='t + ').columns
LATENT_DIM = 5
BATCH_SIZE = 32
EPOCHS = 10
try:
del model
except Exception:
pass
model = keras.Sequential()
model.add(keras.layers.GRU(LATENT_DIM, input_shape=(TimeSteps, X_train_scale.shape[2])))
model.add(keras.layers.RepeatVector(HORIZON))
model.add(keras.layers.GRU(LATENT_DIM, return_sequences=True))
model.add(keras.layers.TimeDistributed(keras.layers.Dense(1)))
model.add(keras.layers.Flatten())
model.compile(optimizer='SGD', loss='mse')
model.summary()
earlystop = EarlyStopping(monitor='val_loss', min_delta=0, patience=3, restore_best_weights=True)
i = 1
np.random.seed(i)
tf.random.set_seed(i)
hist = model.fit(X_train_scale,
Y_train[TargetNames],
batch_size=BATCH_SIZE,
epochs=EPOCHS,
validation_data=(X_val_scale, Y_Val[TargetNames]),
callbacks=[earlystop],
verbose=1)
y_hat_scaled = model.predict(X_test_scale)
for i in range(1, HORIZON+1):
Y_test['Predict_t + ' + str(i)] = yScaler.inverse_transform(np.array(y_hat_scaled[:,i-1]).reshape(-1, 1))
Make format correct
for i in range(1, HORIZON + 1):
if i == 1:
Performance = Y_test[['DateTime', Target, AlternativeForecast,'Predict_t + '+ str(i)]]
else:
Temp = Y_test[['DateTime', 'Predict_t + '+ str(i)]]
Temp['DateTime'] = Temp['DateTime'] + dt.timedelta(minutes=Gran * (i-1))
Performance = pd.merge(Performance, Temp[['DateTime', 'Predict_t + '+ str(i)]], how='left', on='DateTime')
Plot
from matplotlib import pyplot as plt
plt.plot(Performance['DateTime'], Performance[Target], label=Target)
for i in range(1, HORIZON + 1):
plt.plot(Performance['DateTime'], Performance['Predict_t + '+ str(i)], label='Predict_t + '+ str(i))
plt.title('Model Performance')
plt.ylabel('MW')
plt.xlabel('Time')
plt.legend()
plt.show()
Performance
for i in range(1, HORIZON + 1):
ae= (Performance['Predict_t + '+ str(i)] - Performance[Target]).abs().mean()
mape = ((Performance['Predict_t + '+ str(i)] - Performance[Target]).abs()/Performance[Target]).mean() * 100
if i == 1:
AE= ae
MAPE = round(mape,2)
else:
AE= np.append(AE, ae)
MAPE = np.append(MAPE, round(mape,2))
# Alternative forecast
ae = (Performance[AlternativeForecast] - Performance[Target]).abs().mean()
mape = ((Performance[AlternativeForecast] - Performance[Target]).abs()/Performance[Target]).mean() * 100
AE= np.append(AE, ae)
MAPE = np.append(MAPE, round(mape, 2))
AE
MAPE
I hope one of you have time to help me with this problem of mine :-)
Python code:
I have used the Python code as below. Here, machine is trained by using Logistic Regression algorithm and wine dataset. Here, problem is that weights are not getting updated. I don't understand where is the problem.
from sklearn import datasets
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
dataset = datasets.load_wine()
x = dataset.data
y = dataset.target
y = y.reshape(178,1)
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.15,shuffle=True)
print(x_train.shape)
class log_reg():
def __init__(self):
pass
def sigmoid(self,x):
return 1 / (1 + np.exp(-x))
def train(self,x,y,w1,w2,alpha,iterations):
cost_history = [0] * iterations
Y_train = np.zeros([y.shape[0],3])
for i in range(Y_train.shape[0]):
for j in range(Y_train.shape[1]):
if(y[i] == j):
Y_train[i,j] = 1
for iteration in range(iterations):
z1 = x.dot(w1)
a1 = self.sigmoid(z1)
z2 = a1.dot(w2)
a2 = self.sigmoid(z2)
sig_sum = np.sum(np.exp(a2),axis=1)
sig_sum = sig_sum.reshape(a2.shape[0],1)
op = np.exp(a2) / sig_sum
loss = (Y_train * np.log(op))
dl = (op-Y_train)
dz1 = ((dl*(self.sigmoid(z2))*(1-self.sigmoid(z2))).dot(w2.T))*(self.sigmoid(z1))*(1-self.sigmoid(z1))
dz2 = (dl * (self.sigmoid(z2))*(1-self.sigmoid(z2)))
dw1 = x.T.dot(dz1)
dw2 = a1.T.dot(dz2)
w1 += alpha * dw1
w2 += alpha * dw2
cost_history[iteration] = (np.sum(loss)/len(loss))
return w1,w2,cost_history
def predict(self,x,y,w1,w2):
z1 = x.dot(w1)
a1 = self.sigmoid(z1)
z2 = a1.dot(w2)
a2 = self.sigmoid(z2)
sig_sum = np.sum(np.exp(a2),axis=1)
sig_sum = sig_sum.reshape(a2.shape[0],1)
op = np.exp(a2) / sig_sum
y_preds = np.argmax(op,axis=1)
acc = self.accuracy(y_preds,y)
return y_preds,acc
def accuracy(self,y_preds,y):
y_preds = y_preds.reshape(len(y_preds),1)
correct = (y_preds == y)
accuracy = (np.sum(correct) / len(y)) * 100
return (accuracy)
if __name__ == "__main__":
network = log_reg()
w1 = np.random.randn(14,4) * 0.01
w2 = np.random.randn(4,3) * 0.01
X_train = np.ones([x_train.shape[0],x_train.shape[1]+1])
X_train[:,:-1] = x_train
X_test = np.ones([x_test.shape[0],x_test.shape[1]+1])
X_test[:,:-1] = x_test
new_w1,new_w2,cost = network.train(X_train,y_train,w1,w2,0.0045,10000)
y_preds,accuracy = network.predict(X_test,y_test,new_w1,new_w2)
print(y_preds,accuracy)
In the above code, parameters are mentioned as below
x--training set,
y--target(output),
w1--weights for first layer,
w2--weights for second layer,
I used logistic regression with 2 hidden layers.
I am trying to train dataset wine from sklearn.I don't know where the problem is, but weights are not updating. Any help would be appreciated.
Your weights are updating , but I think you cant see them changing because you are printing them after execution. Python has a object reference method for numpy arrays so when you passed w1 , its values change values too so new_w1 and w1 become the same .
Take this example
import numpy as np
x=np.array([1,2,3,4])
def change(x):
x+=3
return x
print(x)
change(x)
print(x)
if you see the output it comes out as
[1 2 3 4]
[4 5 6 7]
I recommend that you add a bias and fix your accuracy function as I get my accuracy as 1000.
My execution when i run the code
the w1 and w2 values are indeed changing .
the only thing i changed was the main code and enabled the original data set , please do the same and tell if your weights are still not updating
if __name__ == "__main__":
network = log_reg()
w1 = np.random.randn(13,4) * 0.01
w2 = np.random.randn(4,3) * 0.01
print(w1)
print(" ")
print(w2)
print(" ")
new_w1,new_w2,cost = network.train(x_train,y_train,w1,w2,0.0045,10000)
print(w1)
print(" ")
print(w2)
print(" ")
y_preds,accuracy = network.predict(x_test,y_test,new_w1,new_w2)
print(y_preds,accuracy)
Edit:
I made following changes and got an accuracy of about 80%.
Changed epochs to 2300
changed learning rate to 0.000006
changed np.random.rand to np.random.randn
epochs and learning rate i can understand, but i have no idea whats up with rand and randn.
Error after this
I am attempting digit classification of mnist database with a neural network of this structure
first layer => 784 neurons. Inputs not normalized.
hidden layer 1 => 30 neurons => activation function tanh
hidden layer 2 => 30 neurons => activation function tanh
output layer => 10 neurons => activation function softmax
cost function => cross entropy
learning rate is 0.000001,
epochs are 30,
batch size = 1
But the accuracy its attaining is just 11% and error is halting to a constant value (138328)( which is definitely bad ) after 20-25 epochs epochs
I have'nt put comments in the code, so if you find difficult in reading the code, just put in comments, i'll put comments in the code
learning rate =>
NeuralNetwork class is
import numpy as np
import sys
import time
class NueralNetwork():
def __init__(self, input_size, hidden1_size, hidden2_size, output_size):
self.input_layer_size = input_size
self.hidden_layer1_size = hidden1_size
self.hidden_layer2_size = hidden2_size
self.output_layer_size = output_size
np.random.seed(1)
self.W1 = np.random.rand(self.input_layer_size, self.hidden_layer1_size)
self.W2 = np.random.rand(self.hidden_layer1_size, self.hidden_layer2_size)
self.W3 = np.random.rand(self.hidden_layer2_size, self.output_layer_size)
self.b1 = np.zeros((1, self.hidden_layer1_size))
self.b2 = np.zeros((1, self.hidden_layer2_size))
self.b3 = np.zeros((1, self.output_layer_size))
def train(self, input_values, label, batch_size, epochs, learning_rate, error=False):
self.batch_size = batch_size
self.learning_rate = learning_rate
print("training...")
starting_time = time.time()
N = len(input_values)
for j in range(epochs):
# "j" is the epoch number
i = 0
print("## Epoch: ", j+1, " ##")
print("Time Lapsed: ", time.time() - starting_time)
self.error(input_values, label)
while i+batch_size < N:
sys.stdout.write("\rCompleted " + str((100*i/N)) + " in epoch " + str(j+1))
sys.stdout.flush()
X = input_values[i:i+batch_size]
y = label[i:i+batch_size]
i = i+batch_size
z1 = X # self.W1 + self.b1
a1 = np.tanh(z1)
z2 = a1 # self.W2 + self.b2
a2 = np.tanh(z2)
z3 = a2 # self.W3 + self.b3
a3_temp = np.exp(z3)
a3 = a3_temp / np.sum(a3_temp, keepdims=True, axis=1)
delta4 = a3
delta4[range(len(y)), y] -= 1
# print(delta3)
# print(delta3.shape)
dw3 = a2.T # delta4
db3 = np.sum(delta4, axis=0, keepdims=True)
# print(delta4.shape, " d4")
# print(db3.shape, " db3")
delta3 = (delta4 # self.W3.T) * (1 - np.power(a2, 2))
dw2 = np.dot(a1.T, delta3)
db2 = np.sum(delta3, axis=0)
delta2 = (delta3 # self.W2.T) * (1 - np.power(a1, 2))
dw1 = np.dot(X.T, delta2)
db1 = np.sum(delta2, axis=0)
self.W1 += -learning_rate * dw1
self.b1 += -learning_rate * db1
self.W2 += -learning_rate * dw2
self.b2 += -learning_rate * db2
self.W3 += -learning_rate * dw3
self.b3 += -learning_rate * db3
if error:
self.error(input_values, label)
print("")
def error(self, X, y):
z1 = X # self.W1 + self.b1
a1 = np.tanh(z1)
z2 = a1 # self.W2 + self.b2
a2 = np.tanh(z2)
z3 = a2 # self.W3 + self.b3
a3_temp = np.exp(z3)
a3 = a3_temp / np.sum(a3_temp, keepdims=True, axis=1)
correct_loss = -np.log(a3[range(len(y)), y])
data_loss = np.sum(correct_loss)
print(data_loss)
def show_performance(self, X, y):
print("Error is: ", end="")
z1 = X # self.W1 + self.b1
a1 = np.tanh(z1)
z2 = a1 # self.W2 + self.b2
a2 = np.tanh(z2)
z3 = a2 # self.W3 + self.b3
a3_temp = np.exp(z3)
a3 = a3_temp / np.sum(a3_temp, keepdims=True, axis=1)
y_p = np.argmax(a3, axis=1)
self.error(X, y)
score = 0
for i in range(len(y)):
if y_p[i] == y[i]:
score += 1
print("Score is ", score/len(y))
train_images, train_label, test_images, test_label = initial.load_data_mnist()
nn = NueralNetwork(784, 30, 30, 10)
nn.train(train_images, train_label, 1, 20, 0.000001)
nn.show_performance(test_images, test_label)
data is coming from here. 'data' directory has the mnist database files
from mnist import MNIST
import numpy as np
def load_data_mnist():
mdata = MNIST('data')
train_images, train_labels = mdata.load_training()
test_images, test_labels = mdata.load_testing()
return np.array(train_images), np.array(train_labels), np.array(test_images), np.array(test_labels)
I am trying to use sciklearn to find the goodness of a KNeighborsClassifier on my data.
My code is below (X is a matrix with NUM_MATCHES rows and NUM_FEATURES columns, Y is a column vector with NUM_MATCHES rows). I keep getting the error
TypeError: Partition index must be integer
on this line of the code below
rad_prob = estimator.predict_proba(np.reshape(radiant_query,(1,-1)))[0][1]
I am new to sciklearn not sure what the issue is.
from sklearn.neighbors import KNeighborsClassifier
from sklearn import cross_validation
import numpy as np
K=2
FOLDS_FINISHED=0
NUM_HEROES = 78
NUM_FEATURES = NUM_HEROES*2
def score(estimator, X, y):
global FOLDS_FINISHED
correct_predictions = 0
for i, radiant_query in enumerate(X):
dire_query = np.concatenate((radiant_query[NUM_HEROES:NUM_FEATURES], radiant_query[0:NUM_HEROES]))
rad_prob = estimator.predict_proba(np.reshape(radiant_query,(1,-1)))[0][1]
dire_prob = estimator.predict_proba(np.reshape(dire_query,(1,-1)))[0][0]
overall_prob = (rad_prob + dire_prob) / 2
prediction = 1 if (overall_prob > 0.5) else -1
result = 1 if prediction == y[i] else 0
correct_predictions += result
FOLDS_FINISHED += 1
accuracy = float(correct_predictions) / len(X)
print ('Accuracy: %f' % accuracy)
return accuracy
preprocessed = np.load('train_9000.npz')
X = preprocessed['X']
Y = preprocessed['Y']
NUM_MATCHES = 3000
X = X[0:NUM_MATCHES]
Y = Y[0:NUM_MATCHES]
k_fold = cross_validation.KFold(n=NUM_MATCHES, n_folds=K, shuffle=True)
d_tries = [3, 4, 5]
d_accuracy_pairs = []
for d_index, d in enumerate(d_tries):
model = KNeighborsClassifier(n_neighbors=NUM_MATCHES/K,metric=my_distance,weights=poly_param(d))
model_accuracies = cross_validation.cross_val_score(model, X, Y, scoring=score, cv=k_fold)
model_accuracy = model_accuracies.mean()
d_accuracy_pairs.append((d, model_accuracy))
I trained a ESPCN in tensorflow1.1, the costed time per patch increase nearly linearly when training. The first 100 epoch takes only 4-5 seconds, but the 70th epoch takes about half a minute. See the training result below:
I've searched the same question on Google and Stack-overflow, and tried the solutions below, but seemed no work:
1.add tf.reset_default_graph() after every sess.run();
2.add time.sleep(5) to prevent queue starvation;
I know the general idea, that is to reduce the operations in Session(). But how? Anyone have the solution?
Here's part of my code:
L3, var_w_list, var_b_list = model_train(IN, FLAGS)
cost = tf.reduce_mean(tf.reduce_sum(tf.square(OUT - L3), reduction_indices=0))
global_step = tf.Variable(0, trainable=False)
learning_rate = tf.train.exponential_decay(FLAGS.base_lr, global_step * FLAGS.batch_size, FLAGS.decay_step, 0.96, staircase=True)
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost, global_step = global_step, var_list = var_w_list + var_b_list)
# optimizer = tf.train.MomentumOptimizer(learning_rate, 0.9).minimize(cost, var_list = var_w_list + var_b_list)
cnt = 0
with tf.Session() as sess:
init_op = tf.initialize_all_variables()
sess.run(init_op)
saver = tf.train.Saver()
ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
print('\n\n\n =========== All initialization finished, now training begins ===========\n\n\n')
t_start = time.time()
t1 = t_start
for i in range(1, FLAGS.max_Epoch + 1):
LR_batch, HR_batch = batch.__next__()
global_step += 1
[_, cost1] = sess.run([optimizer, cost], feed_dict = {IN: LR_batch, OUT: HR_batch})
# tf.reset_default_graph()
if i % 100 == 0 or i == 1:
print_step = i
print_loss = cost1 / FLAGS.batch_size
test_LR_batch, test_HR_batch = test_batch.__next__()
test_SR_batch = test_HR_batch.copy()
test_SR_batch[:,:,:,0:3] = sess.run(L3, feed_dict = {IN: test_LR_batch[:,:,:,0:3]})
# tf.reset_default_graph()
psnr_tmp = 0.0
ssim_tmp = 0.0
for k in range(test_SR_batch.shape[0]):
com1 = test_SR_batch[k, :, :, 0]
com2 = test_HR_batch[k, :, :, 0]
psnr_tmp += get_psnr(com1, com2, FLAGS.HR_size, FLAGS.HR_size)
ssim_tmp += get_ssim(com1, com2, FLAGS.HR_size, FLAGS.HR_size)
psnr[cnt] = psnr_tmp / test_SR_batch.shape[0]
ssim[cnt] = ssim_tmp / test_SR_batch.shape[0]
ep[cnt] = print_step
t2 = time.time()
print_time = t2 - t1
t1 = t2
print(("[Epoch] : {0:d} [Current cost] : {1:5.8f} \t [Validation PSNR] : {2:5.8f} \t [Duration time] : {3:10.8f} s \n").format(print_step, print_loss, psnr[cnt], print_time))
# tf.reset_default_graph()
cnt += 1
if i % 1000 == 0:
L3_test = model_test(IN_TEST, var_w_list, var_b_list, FLAGS)
output_img = single_HR.copy()
output_img[:,:,:,0:3] = sess.run(L3_test, feed_dict = {IN_TEST:single_LR[:,:,:,0:3]})
tf.reset_default_graph()
subname = FLAGS.img_save_dir + '/' + str(i) + ".jpg"
img_gen(output_img[0,:,:,:], subname)
print(('================= Saving model to {}/model.ckpt ================= \n').format(FLAGS.checkpoint_dir))
time.sleep(5)
# saver.save(sess, FLAGS.checkpoint_dir + '/model.ckpt', print_step)
t_tmp = time.time() - t_start
My configuration is: windows10 + tf1.1 + python3.5 + cuda8.0 + cudnn5.1
================================================================
Besides, I used pixel-shuffle(PS) layer instead of deconvolution in the last layer. I copied the PS code from others, which is shown below:
def _phase_shift(I, r):
bsize, a, b, c = I.get_shape().as_list()
bsize = tf.shape(I)[0] # Handling Dimension(None) type for undefined batch dim
X = tf.reshape(I, (bsize, a, b, r, r))
X = tf.transpose(X, (0, 1, 2, 4, 3)) # bsize, a, b, 1, 1
X = tf.split(X, a, 1) # a, [bsize, b, r, r]
X = tf.concat([tf.squeeze(x, axis=1) for x in X], 2) # bsize, b, a*r, r
X = tf.split(X, b, 1) # b, [bsize, a*r, r]
X = tf.concat([tf.squeeze(x, axis=1) for x in X], 2) # bsize, a*r, b*r
return tf.reshape(X, (bsize, a*r, b*r, 1))
def PS(X, r, color=False):
if color:
Xc = tf.split(X, 3, 3)
X = tf.concat([_phase_shift(x, r) for x in Xc], 3)
else:
X = _phase_shift(X, r)
return X
Which X is the 4-dimensional image tensor, r means the up-scaling factor, color determine whether the channel of images is 3(Ycbcr format) or 1(Grayscale format).
To use the layer is very simple, just like the tf.nn.relu() does:
L3_ps = PS(L3, scale, True)
Now I'm wondering whether this layer caused the slowing-down, because the program goes well when using deconvolution layer. Using deconvolution layer may be a solution, but I have to use PS layer for some reason.
I suspect this line is causing a memory leak (although without seeing the code, I can't say for certain):
L3_test = model_test(IN_TEST, var_w_list, var_b_list, FLAGS)
L3_test seems to be a tf.Tensor (because you later pass it to sess.run(), so it seems likely that model_test() is adding new nodes to the graph each time it is called (every 1000 steps), which causes more work to be done over time.
The solution is quite simple though: since model_test() does not depend on anything calculated in the training loop, you can move the call to outside the training loop, so it is only called once.