I am pre-processing a numpy array and want to enter it in as a tensorflow Variable. I've tried following other stack exchange advice, but so far without success. I would like to see if I'm doing something uniquely wrong here.
npW = np.zeros((784,10))
npW[0,0] = 20
W = tf.Variable(tf.convert_to_tensor(npW, dtype = tf.float32))
sess = tf.InteractiveSession()
print("npsum", np.sum(npW))
And this is the result.
npsum 20.0
Tensor("Sum:0", shape=(), dtype=float32)
I don't know why the reduced sum of the W variable remains zero. Am i missing something here?

You need to understand that Tensorflow differs from traditionnal computing. First, you declare a computational graph. Then, you run operations through the graph.
Taking your example, you have your numpy variables :
npW = np.zeros((784,10))
npW[0,0] = 20
Next, these instructions are a definition of tensorflow variables, i.e. nodes in the computational graph:
W = tf.Variable(tf.convert_to_tensor(npW, dtype = tf.float32))
sum = tf.reduce_sum(W)
And to be able to compute the operation, you need to run the op through the graph, with a sesssion, i.e. :
sess = tf.InteractiveSession()
result =
print(result) # print 20
Another way is to call eval instead of
print(sum.eval()) # print 20

So i tested it a bit differently and found out that the variable is getting assigned properly, but, the reduced_sum function isn't working as expected. If any one has explanations on that it would be much appreciated.
npW = np.zeros((2,2))
npW[0,0] = 20
W = tf.Variable(npW, dtype = tf.float32)
A= tf.constant([[20,0],[0,0]])
sess = tf.InteractiveSession()
# Train
print("npsum", np.sum(npW))
This had output
npsum 20.0
Tensor("Sum:0", shape=(2,), dtype=float32)
Tensor("Sum_1:0", shape=(), dtype=int32)
[[ 20. 0.],
[ 0. 0.]]
[[20 0],
[ 0 0]]


Autograd Pytorch

I am new to pytorch, and I have been trying some examples with autograd, to see if I understand it. I am confused about why the following code does not work:
def Loss(a):
return a**2
a=torch.tensor(3.0, requires_grad=True )
with torch.no_grad(): a=a+1.0
Instead of outputing 8.0, we get "RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn"
There are two things to note regarding your code:
You are performing two back propagation up to leaf a which means the gradients should accumulate. In other words, you should get a gradient equal to da²/da + d(a+1)²/da which is equal to 2a + 2(a+1) which is 2(2a + 1). If a=3, then a.grad will be equal to 14.
You are using a torch.no_grad context manager which means you will be unable to perform backpropagation from any resulting tensor i.e. here a itself.
Here is a snippet which yields the desired result, that is 14 as the accumulation of both gradients:
>>> L = Loss(a)
>>> L.backward()
>>> a.grad
>>> L = Loss(a+1)
>>> L.backward()
>>> a.grad
14 # as 6 + 8

What is Mean_test_score and STD_Test_Score used for [duplicate]

Hello I'm doing a GridSearchCV and I'm printing the result with the .cv_results_ function from scikit learn.
My problem is that when I'm evaluating by hand the mean on all the test score splits I obtain a different number compared to what it is written in 'mean_test_score'. Which is different from the standard np.mean()?
I attach here the code with the result:
n_estimators = [100]
max_depth = [3]
learning_rate = [0.1]
param_grid = dict(max_depth=max_depth, n_estimators=n_estimators, learning_rate=learning_rate)
gkf = GroupKFold(n_splits=7)
grid_search = GridSearchCV(model, param_grid, scoring=score_auc, cv=gkf)
grid_result =, Y, groups=patients)
The result of this operation is:
{'mean_fit_time': array([ 8.92773601]),
'mean_score_time': array([ 0.04288721]),
'mean_test_score': array([ 0.83490629]),
'mean_train_score': array([ 0.95167036]),
'param_learning_rate': masked_array(data = [0.1],
mask = [False],
fill_value = ?),
'param_max_depth': masked_array(data = [3],
mask = [False],
fill_value = ?),
'param_n_estimators': masked_array(data = [100],
mask = [False],
fill_value = ?),
'params': ({'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 100},),
'rank_test_score': array([1]),
'split0_test_score': array([ 0.74821666]),
'split0_train_score': array([ 0.97564995]),
'split1_test_score': array([ 0.80089016]),
'split1_train_score': array([ 0.95361201]),
'split2_test_score': array([ 0.92876979]),
'split2_train_score': array([ 0.93935856]),
'split3_test_score': array([ 0.95540287]),
'split3_train_score': array([ 0.94718634]),
'split4_test_score': array([ 0.89083901]),
'split4_train_score': array([ 0.94787374]),
'split5_test_score': array([ 0.90926355]),
'split5_train_score': array([ 0.94829775]),
'split6_test_score': array([ 0.82520379]),
'split6_train_score': array([ 0.94971417]),
'std_fit_time': array([ 1.79167576]),
'std_score_time': array([ 0.02970254]),
'std_test_score': array([ 0.0809713]),
'std_train_score': array([ 0.0105566])}
As you can see, doing the np.mean of all the test_score it gives you a value approximately of 0.8655122606479532 while the 'mean_test_score' is 0.83490629
Thanks for you help,
I will post this as a new answer since its so much code:
The test and train scores of the folds are: (taken from the results you posted in your question)
test_scores = [0.74821666,0.80089016,0.92876979,0.95540287,0.89083901,0.90926355,0.82520379]
train_scores = [0.97564995,0.95361201,0.93935856,0.94718634,0.94787374,0.94829775,0.94971417]
The amount of training samples in those folds are: (taken from the output of print([(len(train), len(test)) for train, test in gkf.split(X, groups=patients)]))
train_len = [41835, 56229, 56581, 58759, 60893, 60919, 62056]
test_len = [24377, 9983, 9631, 7453, 5319, 5293, 4156]
Then the test- and train-means with the amount of training samples per fold as weight is:
train_avg = np.average(train_scores, weights=train_len)
-> 0.95064898361714389
test_avg = np.average(test_scores, weights=test_len)
-> 0.83490628649308296
So this is exactly the value sklearn gives you. It is also the correct mean accuracy of your classification. The mean of the folds is incorrect in that it depends on the somewhat arbitrary splits/folds you chose.
So in concusion, both explanations were indeed identical and correct.
If you see the original code of GridSearchCV in their github repository, they dont use np.mean() instead they use np.average() with weights. Hence the difference. Here's their code:
n_splits = 3
test_sample_counts = np.array(test_sample_counts[:n_splits],
weights = test_sample_counts if self.iid else None
means = np.average(test_scores, axis=1, weights=weights)
stds = np.sqrt(np.average((test_scores - means[:, np.newaxis])
axis=1, weights=weights))
cv_results = dict()
for split_i in range(n_splits):
cv_results["split%d_test_score" % split_i] = test_scores[:,
cv_results["mean_test_score"] = means
cv_results["std_test_score"] = stds
In case you want to know more about the difference between them take a look
Difference between np.mean() and np.average()
I suppose the reason for the different means are different weighting factors in the mean calculation.
The mean_test_score that sklearn returns is the mean calculated on all samples where each sample has the same weight.
If you calculate the mean by taking the mean of the folds (splits), then you only get the same results if the folds are all of equal size. If they are not, then all samples of larger folds will automatically have a smaller impact on the mean of the folds than smaller folds, and the other way around.
Small numeric example:
mean([2,3,5,8,9]) = 5.4 # mean over all samples ('mean_test_score')
mean([2,3,5]) = 3.333 # mean of fold 1
mean([8,9]) = 8.5 # mean of fold 2
mean(3.333, 8.5) = 5.91 # mean of means of folds
5.4 != 5.91

Array entry used in function turns from nan to 0 numpy python

I made a simple function that produces a weighted average of several time series using supplied weights. It is designed to handle missing values (NaNs), which is why I am not using numpy's supplied average function.
However, when I feed it my array containing missing values, the array has its nan values replaced by 0s! I would have assumed that since I am changing the name of the array and it is not a global variable this should not happen. I want my X array to retain its original form including the nan value
I am a relative novice using python (obviously).
X = np.array([[1, 2, 3], [1, 2, 3], [1, 2, np.nan]]) # 3 time series to be weighted together
weights = np.array([[1,1,1]]) # simple example with weights for each series as 1
def WeightedMeanNaN(Tseries, weights):
## calculates weighted mean
N_Tseries = Tseries
Weights = np.repeat(weights, len(N_Tseries), axis=0) # make a vector of weights matching size of time series
loc = np.where(np.isnan(N_Tseries)) # get location of nans
Weights[loc] = 0
N_Tseries[loc] = 0
Weights = Weights/Weights.sum(axis=1)[:,None] # normalize each row so that weights sum to 1
WeightedAve = np.multiply(N_Tseries,Weights)
WeightedAve = WeightedAve.sum(axis=1)
return WeightedAve
WeightedMeanNaN(Tseries = X, weights = weights)
Out[161]: array([2. , 2. , 1.5])
array([[1., 2., 3.],
[1., 2., 3.],
[1., 2., 0.]]) # no longer nan!! ```
Where you call
loc = np.where(np.isnan(N_Tseries)) # get location of nans
Weights[loc] = 0
N_Tseries[loc] = 0
You remove all NaNs and set them to zeros.
To reverse this you could iterate over the array and replace zeros with NaNs.
However, this would also set regular zeros to Nans.
So it turns out this is a mistake caused by me being used to working in Matlab. Python treats arguments supplied to the function as pointers to the original object. In contrast, Matlab creates copies that are discarded when the function ends.
I solved my problem by adding ".copy()" when assigning variables in the function, so that the first line in the function above becomes:
N_Tseries = Tseries.copy().
However, one thing that puzzles me is that some people have suggested that using Tseries[:] should also create a copy of Tseries rather than a pointer to the original variable. This did not work for me though.
I found this answer useful:
Python function not supposed to change a global variable

Loop over tensor dimension 0 (NoneType) with second tensor values

I have a tensor a, I'd like to loop over the rows and index values based on another tensor l. i.e. l suggests the length of the vector I need.
sess = tf.InteractiveSession()
a = tf.constant(np.random.rand(3,4)) # shape=(3,4)
array([[0.35879311, 0.35347166, 0.31525201, 0.24089784],
[0.47296348, 0.96773956, 0.61336239, 0.6093023 ],
[0.42492552, 0.2556728 , 0.86135674, 0.86679779]])
l = tf.constant(np.array([3,2,4])) # shape=(3,)
array([3, 2, 4])
Expected output:
[array([0.35879311, 0.35347166, 0.31525201]),
array([0.47296348, 0.96773956]),
array([0.42492552, 0.2556728 , 0.86135674, 0.86679779])]
The tricky part is the fact that a could have None as first dimension since it's what is usually defined as batch size through placeholder.
I can not just use mask and condition as below since I need to compute the variance of each row individually.
condition = tf.sequence_mask(l, tf.reduce_max(l))
a_true = tf.boolean_mask(a, condition)
array([0.35879311, 0.35347166, 0.31525201, 0.47296348, 0.96773956,
0.42492552, 0.2556728 , 0.86135674, 0.86679779])
I also tried to use tf.map_fn but can't get it to work.
elems = (a, l)
tf.map_fn(lambda x: x[0][:x[1]], elems)
Any help will be highly appreciated!
TensorArray object can store tensors of different shapes. However, it is still not that simple. Take a look at this example that does what you want using tf.while_loop() with tf.TensorArray and tf.slice() function:
import tensorflow as tf
import numpy as np
batch_data = np.array([[0.35879311, 0.35347166, 0.31525201, 0.24089784],
[0.47296348, 0.96773956, 0.61336239, 0.6093023 ],
[0.42492552, 0.2556728 , 0.86135674, 0.86679779]])
batch_idx = np.array([3, 2, 4]).reshape(-1, 1)
x = tf.placeholder(tf.float32, shape=(None, 4))
idx = tf.placeholder(tf.int32, shape=(None, 1))
n_items = tf.shape(x)[0]
init_ary = tf.TensorArray(dtype=tf.float32,
def _first_n(i, ta):
ta = ta.write(i, tf.slice(input_=x[i],
begin=tf.convert_to_tensor([0], tf.int32),
return i+1, ta
_, first_n = tf.while_loop(lambda i, ta: i < n_items,
[0, init_ary])
first_n = [ # <-- extracts the tensors
for i in range(batch_data.shape[0])] # that you're looking for
with tf.Session() as sess:
res =, feed_dict={x:batch_data, idx:batch_idx})
# [array([0.3587931 , 0.35347167, 0.315252 ], dtype=float32),
# array([0.47296348, 0.9677396 ], dtype=float32),
# array([0.4249255 , 0.2556728 , 0.86135674, 0.8667978 ], dtype=float32)]
We still had to use batch_size to extract elements one by one from first_n TensorArray using read() method. We can't use any other method that returns Tensor because we have rows of different sizes (except TensorArray.concat method but it will return all elements stacked in one dimension).
If TensorArray will have less elements than index you pass to you will get InvalidArgumentError.
You can't use tf.map_fn because it returns a tensor that must have all elements of the same shape.
The task is simpler if you only need to compute variances of the first n elements of each row (without actually gather elements of different sizes together). In this case we could directly compute variance of sliced tensor, put it to TensorArray and then stack it to tensor:
n_items = tf.shape(x)[0]
init_ary = tf.TensorArray(dtype=tf.float32,
def _variances(i, ta, begin=tf.convert_to_tensor([0], tf.int32)):
mean, varian = tf.nn.moments(
tf.slice(input_=x[i], begin=begin, size=idx[i]),
axes=[0]) # <-- compute variance
ta = ta.write(i, varian) # <-- write variance of each row to `TensorArray`
return i+1, ta
_, variances = tf.while_loop(lambda i, ta: i < n_items,
[ 0, init_ary])
variances = variances.stack() # <-- read from `TensorArray` to `Tensor`
with tf.Session() as sess:
res =, feed_dict={x:batch_data, idx:batch_idx})
print(res) # [0.0003761 0.06120085 0.07217039]

Tensorflow Adagrad optimizer isn't working

When I run the following script, I notice the following couple of errors:
import tensorflow as tf
import numpy as np
import seaborn as sns
import random
#set random seed:
def potential(N):
points = np.random.rand(N,2)*10
values = np.array([np.exp((points[i][0]-5.0)**2 + (points[i][1]-5.0)**2) for i in range(N)])
return points, values
def init_weights(shape,var_name):
Xavier initialisation of neural networks
init = tf.contrib.layers.xavier_initializer()
return tf.get_variable(initializer=init,name = var_name,shape=shape)
def neural_net(X):
with tf.variable_scope("model",reuse=tf.AUTO_REUSE):
w_h = init_weights([2,10],"w_h")
w_h2 = init_weights([10,10],"w_h2")
w_o = init_weights([10,1],"w_o")
### bias terms:
bias_1 = init_weights([10],"bias_1")
bias_2 = init_weights([10],"bias_2")
bias_3 = init_weights([1],"bias_3")
h = tf.nn.relu(tf.add(tf.matmul(X, w_h),bias_1))
h2 = tf.nn.relu(tf.add(tf.matmul(h, w_h2),bias_2))
return tf.nn.relu(tf.add(tf.matmul(h2, w_o),bias_3))
X = tf.placeholder(tf.float32, [None, 2])
with tf.Session() as sess:
model = neural_net(X)
## define optimizer:
opt = tf.train.AdagradOptimizer(0.0001)
values =tf.placeholder(tf.float32, [None, 1])
squared_loss = tf.reduce_mean(tf.square(model-values))
## define model variables:
model_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,"model")
train_model = opt.minimize(squared_loss,var_list=model_vars)
for i in range(10):
points, val = potential(100)
train_feed = {X : points,values: val.reshape((100,1))},feed_dict = train_feed)
print(,feed_dict = {X:points}))
### plot the approximating model:
res = 0.1
xy = np.mgrid[0:10:res, 0:10:res].reshape(2,-1).T
values =, feed_dict={X: xy})
On the first run I get:
[nan] [nan] [nan] [nan] [nan] [nan] [nan]] Traceback (most
recent call last):
line 485, in heatmap
yticklabels, mask)
line 167, in init
cmap, center, robust)
line 206, in _determine_cmap_params
vmin = np.percentile(calc_data, 2) if robust else calc_data.min()
line 29, in _amin
return umr_minimum(a, axis, None, out, keepdims)
ValueError: zero-size array to reduction operation minimum which has
no identity
On the second run I have:
ValueError: Variable model/w_h/Adagrad/ already exists, disallowed.
Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope?
It's not clear to me why I get either of these errors. Furthermore, when I use:
for i in range(10):
points, val = potential(10)
train_feed = {X : points,values: val.reshape((10,1))},feed_dict = train_feed)
print(,feed_dict = {X:points}))
I find that on the first run, I sometimes get a network that has collapsed to the constant function with output 0. Right now my hunch is that this might simply be a numerics problem but I might be wrong.
If so, it's a serious problem as the model I have used here is very simple.
Right now my hunch is that this might simply be a numerics problem
indeed, when running potential(100) I sometimes get values as large as 1E21. The largest points will dominate your loss function and will drive the network parameters.
Even when normalizing your target values e.g. to unit variance, the problem of the largest values dominating the loss would still remain (look e.g. at plt.hist(np.log(potential(100)[1]), bins = 100)).
If you can, try learning the log of val instead of val itself. Note however that then you are changing the assumption of the loss function from 'predictions follow a normal distribution around the target values' to 'log predictions follow a normal distribution around log of the target values'.
