Implementation of k-cross validation in python - python-3.x

I am trying to implement the logic behind k-cross validation without the use of library on a test matrix.Somehow , my rotated matrices are not working fine.
I have taken k to be 5.
X = np.matrix([[1,2,3,4,5],[7,8,9,4,5],[4,9,6,4,2],[9,5,1,2,3],[7,5,3,4,6]])
P = np.ones((5,5))
target = np.matrix([[1,2,3,4,5]]).T
#def k_fold(X,target,k):
r = X.shape[0]
k=5
step = r//k
last_row_train = step*(k-1)
for i in range(5):
X_train = X[0:last_row_train,:]
tempX = X_train
X_test = X[last_row_train:r,:]
temp_X_test = X_test
t_train = target[0:last_row_train,:]
temp_t_train = t_train
t_test = target[last_row_train:r,:]
temp_test = t_test
X[step:r,:] = tempX # On running this line, it changes the value of
# temp_X_test which is very weird and not
# supposed to happen
X[0:step,:] = temp_X_test
target[0:step,:] = temp_test
target[step:r,:] = temp_t_train
print (X)
print (target)

tempX = X_train
This statement does not create a new variable tempX and assign X_train to it. It makes both the variable names tempX and X_train point to the same object. Any change in tempX will be reflected in X_train. This is a recurring problem in your code.
When trying to make a list assignment like that use the code below.
tempX = X_train.copy()
Here's a link to a similar question with more solutions.
How to clone or copy a list?

Related

Kerastuner search doesn't get restarted even with Overwrite flag set to true

Tuning is done as follows:
tuner = kt.RandomSearch(
MyHyperModel(),
objective="mae",
max_trials=30,
overwrite=True,
directory=results_dir,
project_name="tune_hypermodel",
)
And I'm iterating over how many features to use:
data[name]=pd.read_pickle(os.path.join(root, name)+'/'+name+'.data.reindexed_by_pc.pkl')
CpG_num_lst=[100,500,1000,5000,10000,20000,40000]
train_score, valid_score = [], []
hps_by_CpG_num = []
for CpG_num in CpG_num_lst:
print("CpG_num:",CpG_num)
# force overwrite tune search (to start new search). Cause even with overwrite=True it doesn't overwrite
if os.path.exists(results_dir+'/tune_hypermodel'):
shutil.rmtree(results_dir+'/tune_hypermodel')
# initialize
X1, y1, X2, y2 = [dict() for _ in range(4)]
X1[name] = data[name][fold1_ids].head(CpG_num).values.T
X2[name] = data[name][fold2_ids].head(CpG_num).values.T
# get the ages of the corresponding persons. Notice info_1 and info_2 only contain "Ctrl" and not "Test" samples
y1[name] = info_1[name].query("`Train.Test`=='Train'")['Age'].values.astype(float)
y2[name] = info_2[name].query("`Train.Test`=='Train'")['Age'].values.astype(float)
# Split the data
X1_train, X1_valid, y1_train, y1_valid = train_test_split(X1[name], y1[name], test_size=0.2, shuffle= True)
# Grid search
tuner.search(X1_train, y1_train, validation_data = (X1_valid,y1_valid))
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]
# Get best hyperparameters
hp_dict = dict()
hp_dict['num_layers'] = best_hps.get('num_layers')
hp_dict['batch_size'] = best_hps.get('batch_size')
hp_dict['act_1'] = best_hps.get('act_1')
hp_dict['act_2'] = best_hps.get('act_2')
hp_dict['units_1'] = best_hps.get('units_1')
hp_dict['units_2'] = best_hps.get('units_2')
hps_by_CpG_num.append(hp_dict)
# Build best model
best_model = MyHyperModel().build(hp=best_hps)
history = best_model.fit(X1_train, y1_train, validation_data = (X1_valid,y1_valid), batch_size=best_hps.get('batch_size'), epochs=200)
However for each element in the for loop the tuner search doesn't restart, it just uses the best hyperparameters from the first search (CpG_num = 500)
What am I missing? Why is Keras taking old hyperparameters?
The solution was to include the tuner instantiation within the for loop.
tuner = kt.RandomSearch(
MyHyperModel(),
objective="mae",
max_trials=30,
overwrite=True,
directory=results_dir,
project_name="tune_hypermodel",
)
Not sure why though, but works... If somebody has any insight on this let me know. I thought the tuner would be overwritten.
Is there a better way of doing this?

RuntimeError: Trying to backward through the graph a second time. Saved intermediate values of the graph are freed when you call .backward()

I am trying to train SRGAN from scratch. I have read solutions for this type of problem, but it would be great if someone could help me debug my code. The exact error is: "RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad()" Here is the snippet I am trying to train:
gen_model = Generator().to(device, non_blocking=True)
disc_model = Discriminator().to(device, non_blocking=True)
opt_gen = optim.Adam(gen_model.parameters(), lr=0.01)
opt_disc = optim.Adam(disc_model.parameters(), lr=0.01)
from torch.nn.modules.loss import BCELoss
def train_model(gen, disc):
for epoch in range(20):
run_loss_disc = 0
run_loss_gen = 0
for data in train:
low_res, high_res = data[0].to(device, non_blocking=True, dtype=torch.float).permute(0, 3, 1, 2),data[1].to(device, non_blocking=True, dtype=torch.float).permute(0, 3, 1, 2)
#--------Discriminator-----------------
gen_image = gen(low_res)
gen_image = gen_image.detach()
disc_gen = disc(gen_image)
disc_real = disc(high_res)
p=nn.BCEWithLogitsLoss()
loss_gen = p(disc_real, torch.ones_like(disc_real))
loss_real = p(disc_gen, torch.zeros_like(disc_gen))
loss_disc = loss_gen + loss_real
opt_disc.zero_grad()
loss_disc.backward()
run_loss_disc+=loss_disc
#---------Generator--------------------
cont_loss = vgg_loss(high_res, gen_image)
adv_loss = 1e-3*p(disc_gen, torch.ones_like(disc_gen))
gen_loss = cont_loss+(10^-3)*adv_loss
opt_gen.zero_grad()
gen_loss.backward()
opt_disc.step()
opt_gen.step()
run_loss_gen+=gen_loss
print("Run Loss Discriminator: %d", run_loss_disc)
print("Run Loss Generator: %d", run_loss_gen)
train_model(gen_model, disc_model)
Apparently your disc_gen value was discarded by the first backward() call, as it says.
It should work if you change the discriminator part a bit:
gen_image = gen(low_res)
disc_gen = disc(gen_image.detach())
and add this at the start of the generator part:
disc_gen = disc(gen_image)

AttributeError: 'DType' object has no attribute 'type' Tensorflow Serving

I am trying to use a function (from another module) inside tensorflow. The function accepts a numpy array and returns the changepoints. My main goal is to deploy this model on tensorflow serving. I am running into error
AttributeError: 'DType' object has no attribute 'type'
There are 2 functions, one is create_data() that creates a numpy array and returns it, another is change() which accepts numpy array and uses the before mentioned function to return changepoints. I have created a placeholder to accept input data, an operation to execute the function. Problem is, if i try to send data through placeholder, i run into error. If i send the data directly into the function, it runs. Following is my code.
def create_data():
np.random.seed(0)
size = 100
mean_a = 0.0
mean_b = 10.0
mean_c = 0
var = 0.1
data_a = np.random.normal(mean_a, var, size)
data_b = np.random.normal(mean_b, var, size)
data_c = np.random.normal(mean_c, var, size)
data = np.concatenate([data_a, data_b, data_c])
return data
def change(data):
# what else i tried
# data = np.array(data, dtype=np.float)
# above line gives another error mentioned after code
cpts = (pelt(normal_mean(x, np.var(x)), len(x)))
return cpts
sess = tf.Session()
x = tf.placeholder(tf.float32, shape=[300, ], name="myInput")
y = tf.convert_to_tensor(change(x),np.float32,name="myOutput")
z = sess.run(y,feed_dict={x:create_data()})
If i try the code data = np.array(data, dtype=np.float) in the function change(), it gives me error
ValueError: setting an array element with a sequence.
I also tried data = np.hstack((data)).astype(np.float) and data = np.vstack((data)).astype(np.float) but it runs into a separate error that says use tf.map_fn. I also tried to use tf.eval() to convert the numbers but i couldn't get them to run inside a function with placeholders.
But if i send in the output directly,
y = tf.convert_to_tensor(change(create_data()),np.float32,name="myOutput")
It works.
How should i send in the input to make it work?
EDIT: The function in question is this if anyone wants to know.
This error is raised when you try to pass a Tensor into a numpy function
You need to use tf.py_func to include python function into tensorflow graph
(also, your change() functin uses data as argument instead of x)
Here is the code that worked for me
import numpy as np
import tensorflow as tf
from changepy import pelt
from changepy.costs import normal_mean
def create_data():
np.random.seed(0)
size = 100
mean_a = 0.0
mean_b = 10.0
mean_c = 0
var = 0.1
data_a = np.random.normal(mean_a, var, size)
data_b = np.random.normal(mean_b, var, size)
data_c = np.random.normal(mean_c, var, size)
data = np.concatenate([data_a, data_b, data_c])
return data
def change(x):
# what else i tried
# data = np.array(data, dtype=np.float)
# above line gives another error mentioned after code
cpts = (pelt(normal_mean(x, np.var(x)), len(x)))
return cpts
sess = tf.Session()
x = tf.placeholder(tf.float32, shape=[300, ], name="myInput")
y = tf.convert_to_tensor(tf.compat.v1.py_func(change, [x], 3*[tf.int64]),np.float32,name="myOutput")
z = sess.run(y,feed_dict={x:create_data()})
print(z)

reading textfile returning empty variable in tensorflow

I have a text file which has 110 rows and 1024 columns of float values. I am trying to load the textfile and it doesnt read any thing.
filename = '300_faults.txt'
filename_queue = tf.train.string_input_producer([filename])
reader = tf.TextLineReader()
_,a = reader.read(filename_queue)
#x = np.loadtxt('300_faults.txt') # working
#a = tf.constant(x,tf.float32) # working
model = tf.initialize_all_variables()
with tf.Session() as session:
session.run(model)
print(session.run(tf.shape(a)))
printing the shape of the variable returns [].
Firstly - tf.shape(a) == [] doesn't mean that variable is empty. All scalars and strings have shape [].
https://www.tensorflow.org/programmers_guide/dims_types
May be you can check "rank" instead - it would be 0 for scalars and strings.
Other than that it looks like string_input_producer is a queue and it needs additional wiring to make ti work.
Please try this
filename = '300_faults.txt'
filename_queue = tf.train.string_input_producer([filename])
reader = tf.TextLineReader()
_,a = reader.read(filename_queue)
#x = np.loadtxt('300_faults.txt') # working
#a = tf.constant(x,tf.float32) # working
model = tf.initialize_all_variables()
with tf.Session() as session:
session.run(model)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
print(session.run(tf.shape(a)))
print(session.run((a)))
coord.request_stop()
coord.join(threads)

How to print the values of tensors inside a while loop?

I'm very new to tensorflow, and I couldn't figure this one out.
I have this while loop:
def process_tree_tf(n_child, reprs, weights, bias, embed_dim, activation = tf.nn.relu):
n_child, reprs = n_child, reprs
parent_idxs = generate_parents_numpy(n_child)
loop_idx = reprs.shape[0] - 1
loop_vars = loop_idx, reprs, parent_idxs, weights, embed_dim
def loop_condition(loop_ind, *_):
return tf.greater(0, loop_idx)
def loop_body(loop_ind, reprs, parent_idxs, weights, embed_dim):
x = reprs[loop_ind]
x_expanded = tf.expand_dims(x, axis=-1)
w = weights
out = tf.squeeze(tf.add(tf.matmul(x_expanded,w,transpose_a=True), bias))
activated = activation(out)
par_idx = parent_idxs[loop_ind]
reprs = update_parent(reprs, par_idx, embed_dim, activated)
reprs = tf.Print(reprs, [reprs]) #This doesn't work
loop_ind = loop_ind-1
return loop_ind, reprs, parent_idxs, weights, embed_dim
return tf.while_loop(loop_condition, loop_body, loop_vars)
And I'm evaluating it this way:
embed_dim = 2
hidden_dim = 2
n_nodes = 4
batch = 2
reprs = np.ones((n_nodes, embed_dim+hidden_dim))
n_child = np.array([1, 1, 1, 0])
weights = np.ones((embed_dim+hidden_dim, hidden_dim))
bias = np.ones(hidden_dim)
with tf.Session() as sess:
_, r, *_ = process_tree_tf(n_child, reprs, weights, bias, embed_dim, activation=tf.nn.relu)
print(r.eval())
I want to check the value of reprs inside the while loop, but tf.Print doesn't seem to work, and print just tells me it's a tensor and gives me its shape.
How do I go about doing this?
Thank you so much!
Take a look at this webpage: https://www.tensorflow.org/api_docs/python/tf/Print
You can see that tf.Print is an identity operator with the side effect of printing data when evaluating. You should therefore use this line to print:
reprs = tf.Print(reprs, [reprs])
Hope this helps, and good luck!
The approach suggested by rmeertens is the one I think is correct. I would just add (as a response to your comments) that if something is printing "Tensor("while/update_parent:0, ...... " then that implies that that value in the graph is not being evaluated.
You are likely seeing that as the output of your "print(r.eval())" statement, NOT the tf.Print() statement.
Note that the output of tf.Print() appears in PyCharm (the IDE I am using) in red, while the output of a normal python print operation appears in black. So the tf.Print() output looks like a warning message. It could be that it is indeed printing out, but you are simply overlooking it.

Resources