how tensorflow deals with np.nan? - python-3.x

Please consider the following code,
x = tf.constant([[[1, np.nan, np.nan], [4, 3, -1]], [[10, np.nan, 3], [20,5,-7]], [[5, np.nan, 3], [np.nan,15,-17]]])
x_max = tf.reduce_max(x, reduction_indices=[0])
with tf.Session() as sess:
print (np.shape(sess.run(x)))
print (sess.run(x))
print (sess.run(x_max))
The output is as following:
(3, 2, 3)
[[[ 1. nan nan]
[ 4. 3. -1.]]
[[ 10. nan 3.]
[ 20. 5. -7.]]
[[ 5. nan 3.]
[ nan 15. -17.]]]
[[ 10. -inf 3.]
[ 20. 15. -1.]]
Now my question is how tensorflow deals with np.nan, like numpy.nanmax or similar?

Quoting this link (credit goes to Yaroslav Bulatov):
Different parts of TensorFlow treat them differently:
* Float computations (usually?) propagate them.
* Int conversion treats them as 0.
* Int computations fail with Python parts of TensorFlow often raise an error on "NaN", ie, trying to add a NaN summary to histogram will fail with Python
exception.
Here is an example for some float operations:
a = tf.constant([1.0, np.nan])
b = tf.constant(np.nan)
r = tf.reduce_min(a)
m = a * b
with tf.Session() as sess:
print(sess.run(r)) # prints 1.0
print(sess.run(m)) # array([nan, nan], dtype=float32)

Related

What is 1. in Math or numpy? Is it supposed to mean 1.0?

A = np.array([[1, -2, 1], [2, 1, -3], [1, -3, 3]])
b = np.array([6, -3, 10])
x = np.linalg.solve(A, b)
print(x)
#[ 1. -2. 1.]
What format is this? This is my first time seeing it. how does it translate to normal numbers?
The number 1. is a shorter way of writing 1.0, and indicates that we are not dealing with an integer, but rather a floating point number. Consider following outputs:
>>> import numpy as np
>>> print(np.array([1,-2,1], dtype=float))
[ 1. -2. 1.]
>>> print(np.array([1,-2,1], dtype=int))
[ 1 -2 1]
>>> print(np.array([1,-2,1], dtype=np.float32))
[ 1. -2. 1.]
>>> print(np.array([1,-2,1], dtype=np.float16))
[ 1. -2. 1.]

PyTorch, when divided by zero, set the result value with 0

In Pytorch, when values are divided by zero, replace the result value with 0, as it will output NaN. Here is an example,
a = th.from_numpy(np.array([ [1, 0], [0, 1], [1, 1]]))
b = th.zeros_like(a)
b[0, :] = 2
a = a / b
How can I do that?
You can replace NaN values obtained after division with 0 using the following method -
Create a ByteTensor indicating the positions of NaN
a != a
>> tensor([[False, False],
[ True, False],
[False, False]])
Replace NaN values indicated by above Tensor with 0
a = a / b
>> tensor([[0.5000, 0.0000],
[ nan, inf],
[ inf, inf]])
a[a != a] = 0
>> tensor([[0.5000, 0.0000],
[0.0000, inf],
[ inf, inf]])
Note this will also replace any NaN values introduced before division.

creating a dictionary with two tensors tensorFlow

I have two tensors
top_k_values = [[0.1,0.2,0.3]
[0.4, 0.5,0.6]]
top_k_indices= [[1,3,5]
[2, 5,3]]
I want to take the indices and the values and create dictionary like
dict[1] = 0.1
dict[2] = 0.4
dict[3] = 0.2 + 0.6
dict [5] = 0.3 + 0.5
I want to order this dictionary by key and then select the top 3 indices
Could someone please help me.
I have been trying to use map_fn. But this does not seem to be workin
Is the above problem solvable with tensorflow
You can use a counter to accumulate the values for each indice. This is from python standard library. I don't know if you can do the same with tensorflow library.
>>> from collections import counter
>>> d=Counter()
>>> for indice_list, value_list in zip(top_k_indices, top_k_values):
... for indice, value in zip(indice_list, value_list):
... d[indice] += value
>>> d
Counter({3: 0.8, 5: 0.8, 2: 0.4, 1: 0.1})
# this is your expected result
# a counter is a kind of dict, but if you need a real dict:
>>> dict(d)
{1: 0.1, 3: 0.8, 5: 0.8, 2: 0.4}
# 3 indices with maximum values
>>> d.most_common(3)
[(3, 0.8), (5, 0.8), (2, 0.4)]
>>> sorted([indice for indice, value in d.most_common(3)])
[2, 3, 5]

sklearn: create sparse feature vector from multiple choice features

I have data which looks like this:
20003.0, 20003.1, 20004.0, 20004.1, 34
1, 2, 3, 4, 5
Where the columns with ending .* are multiple choice and the values they take on are categorical.
Ideally, I would be able to pass [{'20003': ['1', '2'], '20004': ['3', '4'], '34': 5}] to DictVectorizer but this is not supported yet.
How should this data be loaded to create sparse feature vectors?
If you can sum up you the values for the same groups of columns:
In [63]: df
Out[63]:
20003.0 20003.1 20004.0 20004.1 34
0 1 2 3 4 5
In [64]: d = df.groupby(df.columns.str.split('.').str[0], axis=1).sum().to_dict('r')
In [65]: d
Out[65]: [{'20003': 3, '20004': 7, '34': 5}]
In [66]: from sklearn.feature_extraction import DictVectorizer
In [67]: v = DictVectorizer()
In [68]: X = v.fit_transform(d)
In [69]: X.toarray()
Out[69]: array([[ 3., 7., 5.]])
In [70]: v.inverse_transform(X)
Out[70]: [{'20003': 3.0, '20004': 7.0, '34': 5.0}]
Explanation:
In [71]: df.groupby(df.columns.str.split('.').str[0], axis=1).sum()
Out[71]:
20003 20004 34
0 3 7 5
In [72]: df.groupby(df.columns.str.split('.').str[0], axis=1).sum().to_dict('r')
Out[72]: [{'20003': 3, '20004': 7, '34': 5}]

how to operate None in the list comprehension?

I want to make [2,3,None,4,None] into [4,6,None,8,None],how can i do?
>>> v= [2,3,None,4,None]
>>> [x*2 for x in v if not x is None]
[4, 6, 8]
You're almost there. Just add an else clause to include the Nones:
>>> [x*2 if x is not None else x for x in v]
[4, 6, None, 8, None]
Note that you must put the if-else before the for x in v; otherwise you'd get a syntax error.
Personally, I'd rewrite it a little further to make the behavior more explicit, but it's all up to you.
>>> [None if x is None else x*2 for x in v]
[4, 6, None, 8, None]
If you are able to use NaN instead of None, you can simplify your code, since NaN already behaves in the desired way, i.e. 2*nan evaluates to nan again, so you don't need to special-case it:
>>> nan = float('nan')
>>> v = [2, 3, nan, 4, nan]
>>> [2 * x for x in v]
[4, 6, nan, 8, nan]

Resources