How can I make torch tensor? - pytorch

I want to make a torch tensor composed of only 1 or -1.
I just try to use torch.empty() and torch.randn().
tmp = torch.randint(low=0, high=2, size=(4, 4))
tmp[tmp==0] = -1
tmp
>> tensor([[ 1, -1, -1, 1],
[-1, 1, 1, -1],
[-1, -1, 1, 1],
[-1, 1, -1, 1]])
However, I don`t know what method is most efficient in time.
And I want to make this code to one line as possible because this code is positioned at __init__ ()

Why not:
tmp = -1 + 2 * torch.randint(low=0, high=2, size=(4, 4))

Related

How to select rows from two different Numpy arrays conditionally?

I have two Numpy 2D arrays and I want to get a single 2D array by selecting rows from the original two arrays. The selection is done conditionally. Here is the simple Python way,
import numpy as np
a = np.array([4, 0, 1, 2, 4])
b = np.array([0, 4, 3, 2, 0])
y = np.array([[0, 0, 0, 0],
[0, 0, 0, 1],
[0, 0, 1, 0],
[0, 0, 1, 1],
[0, 0, 1, 0]])
x = np.array([[0, 0, 0, 0],
[1, 1, 1, 0],
[1, 1, 0, 0],
[1, 1, 1, 1],
[0, 0, 1, 0]])
z = np.empty(shape=x.shape, dtype=x.dtype)
for i in range(x.shape[0]):
z[i] = y[i] if a[i] >= b[i] else x[i]
print(z)
Looking at numpy.select, I tried, np.select([a >= b, a < b], [y, x], -1) but got ValueError: shape mismatch: objects cannot be broadcast to a single shape. Mismatch is between arg 0 with shape (5,) and arg 1 with shape (5, 4).
Could someone help me write this in a more efficient Numpy manner?
This should do the trick, but it would be helpful if you could show an example of your expected output:
>>> np.where((a >= b)[:, None], y, x)
array([[0, 0, 0, 0],
[1, 1, 1, 0],
[1, 1, 0, 0],
[0, 0, 1, 1],
[0, 0, 1, 0]])

Count Unique elements in pytorch Tensor

Suppose I have the following tensor: y = torch.randint(0, 3, (10,)). How would you go about counting the 0's 1's and 2's in there?
The only way I can think of is by using collections.Counter(y) but was wondering if there was a more "pytorch" way of doing this. A use case for example would be when building the confusion matrix for predictions.
You can use torch.unique with the return_counts option:
>>> x = torch.randint(0, 3, (10,))
tensor([1, 1, 0, 2, 1, 0, 1, 1, 2, 1])
>>> x.unique(return_counts=True)
(tensor([0, 1, 2]), tensor([2, 6, 2]))

Bitwise operations in Pytorch

Could someone help me how to perform bitwise AND operations on two tensors in Pytorch 1.4?
Apparently I could only find NOT and XOR operations in official document
I don't see them in the docs, but it looks like &, |, __and__, __or__, __xor__, etc are bit-wise:
>>> torch.tensor([1, 2, 3, 4]).__xor__(torch.tensor([1, 1, 1, 1]))
tensor([0, 3, 2, 5])
>>> torch.tensor([1, 2, 3, 4]) | torch.tensor([1, 1, 1, 1])
tensor([1, 3, 3, 5])
>>> torch.tensor([1, 2, 3, 4]) & torch.tensor([1, 1, 1, 1])
tensor([1, 0, 1, 0])
>>> torch.tensor([1, 2, 3, 4]).__and__(torch.tensor([1, 1, 1, 1]))
tensor([1, 0, 1, 0])
See https://github.com/pytorch/pytorch/pull/1556
Check this. There is no bitwise and/or operation for tensors in Torch. There are element-wise operations implemented in Torch, but not bite-wise ones.
However, if you could convert each bit as a separate Tensor dimension, you can use element-wise operation.
For an example,
a = torch.Tensor{0,1,1,0}
b = torch.Tensor{0,1,0,1}
torch.cmul(a,b):eq(1)
0
1
0
0
[torch.ByteTensor of size 4]
torch.add(a,b):ge(1)
0
1
1
1
[torch.ByteTensor of size 4]
Hope this will help you.

Not able to use Stratified-K-Fold on multi label classifier

The following code is used to do KFold Validation but I am to train the model as it is throwing the error
ValueError: Error when checking target: expected dense_14 to have shape (7,) but got array with shape (1,)
My target Variable has 7 classes. I am using LabelEncoder to encode the classes into numbers.
By seeing this error, If I am changing the into MultiLabelBinarizer to encode the classes. I am getting the following error
ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead.
The following is the code for KFold validation
skf = StratifiedKFold(n_splits=10, shuffle=True)
scores = np.zeros(10)
idx = 0
for index, (train_indices, val_indices) in enumerate(skf.split(X, y)):
print("Training on fold " + str(index+1) + "/10...")
# Generate batches from indices
xtrain, xval = X[train_indices], X[val_indices]
ytrain, yval = y[train_indices], y[val_indices]
model = None
model = load_model() //defined above
scores[idx] = train_model(model, xtrain, ytrain, xval, yval)
idx+=1
print(scores)
print(scores.mean())
I don't know what to do. I want to use Stratified K Fold on my model. Please help me.
MultiLabelBinarizer returns a vector which is of the length of your number of classes.
If you look at how StratifiedKFold splits your dataset, you will see that it only accepts a one-dimensional target variable, whereas you are trying to pass a target variable with dimensions [n_samples, n_classes]
Stratefied split basically preserves your class distribution. And if you think about it, it does not make a lot of sense if you have a multi-label classification problem.
If you want to preserve the distribution in terms of the different combinations of classes in your target variable, then the answer here explains two ways in which you can define your own stratefied split function.
UPDATE:
The logic is something like this:
Assuming you have n classes and your target variable is a combination of these n classes. You will have (2^n) - 1 combinations (Not including all 0s). You can now create a new target variable considering each combination as a new label.
For example, if n=3, you will have 7 unique combinations:
1. [1, 0, 0]
2. [0, 1, 0]
3. [0, 0, 1]
4. [1, 1, 0]
5. [1, 0, 1]
6. [0, 1, 1]
7. [1, 1, 1]
Map all your labels to this new target variable. You can now look at your problem as simple multi-class classification, instead of multi-label classification.
Now you can directly use StartefiedKFold using y_new as your target. Once the splits are done, you can map your labels back.
Code sample:
import numpy as np
np.random.seed(1)
y = np.random.randint(0, 2, (10, 7))
y = y[np.where(y.sum(axis=1) != 0)[0]]
OUTPUT:
array([[1, 1, 0, 0, 1, 1, 1],
[1, 1, 0, 0, 1, 0, 1],
[1, 0, 0, 1, 0, 0, 0],
[1, 0, 0, 1, 0, 0, 0],
[1, 0, 0, 0, 1, 1, 1],
[1, 1, 0, 0, 0, 1, 1],
[1, 1, 1, 1, 0, 1, 1],
[0, 0, 1, 0, 0, 1, 1],
[1, 0, 1, 0, 0, 1, 1],
[0, 1, 1, 1, 1, 0, 0]])
Label encode your class vectors:
from sklearn.preprocessing import LabelEncoder
def get_new_labels(y):
y_new = LabelEncoder().fit_transform([''.join(str(l)) for l in y])
return y_new
y_new = get_new_labels(y)
OUTPUT:
array([7, 6, 3, 3, 2, 5, 8, 0, 4, 1])

'Tensor' object has no attribute 'assign_add'

I encountered error 'Tensor' object has no attribute 'assign_add' when I try to use the assign_add or assign_sub function.
The code is shown below:
I defined two tensor t1 and t2, with the same shape, and same data type.
>>> t1 = tf.Variable(tf.ones([2,3,4],tf.int32))
>>> t2 = tf.Variable(tf.zeros([2,3,4],tf.int32))
>>> t1
<tf.Variable 'Variable_4:0' shape=(2, 3, 4) dtype=int32_ref>
>>> t2
<tf.Variable 'Variable_5:0' shape=(2, 3, 4) dtype=int32_ref>
then I use the assign_add on t1 and t2 to create t3
>>> t3 = tf.assign_add(t1,t2)
>>> t3
<tf.Tensor 'AssignAdd_4:0' shape=(2, 3, 4) dtype=int32_ref>
then I try to create a new tensor t4 using t1[1] and t2[1], which are tensors with same shape and same data type.
>>> t1[1]
<tf.Tensor 'strided_slice_23:0' shape=(3, 4) dtype=int32>
>>> t2[1]
<tf.Tensor 'strided_slice_24:0' shape=(3, 4) dtype=int32>
>>> t4 = tf.assign_add(t1[1],t2[1])
but got error,
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/admin/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/state_ops.py", line 245, in assign_add
return ref.assign_add(value)
AttributeError: 'Tensor' object has no attribute 'assign_add'
same error when using assign_sub
>>> t4 = tf.assign_sub(t1[1],t2[1])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/admin/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/state_ops.py", line 217, in assign_sub
return ref.assign_sub(value)
AttributeError: 'Tensor' object has no attribute 'assign_sub'
Any idea where is wrong?
Thanks.
The error is because t1 is a tf.Variable object , while t1[1] is a tf.Tensor.(you can see this in the outputs to your print statements.).Ditto for t2 and t[[2]]
As it happens, tf.Tensor can't be mutated(it's read only) whereas tf.Variable can be(read as well as write)
see here.
Since tf.scatter_add,does an inplace addtion, it doesn't work with t1[1] and t2[1] as inputs, while there's no such problem with t1 and t2 as inputs.
What you are trying to do here is a little bit confusing. I don't think you can update slices and create a new tensor at the same time/line.
If you want to update slices before creating t4, use tf.scatter_add() (or tf.scatter_sub() or tf.scatter_update() accordingly) as suggested here. For example:
sa = tf.scatter_add(t1, [1], t2[1:2])
Then if you want to get a new tensor t4 using new t1[1] and t2[1], you can do:
with tf.control_dependencies([sa]):
t4 = tf.add(t1[1],t2[1])
Here are some examples for using tf.scatter_add and tf.scatter_sub
>>> t1 = tf.Variable(tf.ones([2,3,4],tf.int32))
>>> t2 = tf.Variable(tf.zeros([2,3,4],tf.int32))
>>> init = tf.global_variables_initializer()
>>> sess.run(init)
>>> t1.eval()
array([[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]],
[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]]], dtype=int32)
>>> t2.eval()
array([[[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]],
[[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]]], dtype=int32)
>>> t3 = tf.scatter_add(t1,[0],[[[2,2,2,2],[2,2,2,2],[2,2,2,2]]])
>>> sess.run(t3)
array([[[3, 3, 3, 3],
[3, 3, 3, 3],
[3, 3, 3, 3]],
[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]]], dtype=int32)
>>>t4 = tf.scatter_sub(t1,[0,0,0],[t1[1],t1[1],t1[1]])
Following is another example, which can be found at https://blog.csdn.net/efforever/article/details/77073103
Because few examples illustrating scatter_xxx can be found on the web, I paste it below for reference.
import tensorflow as tf
import numpy as np
with tf.Session() as sess1:
c = tf.Variable([[1,2,0],[2,3,4]], dtype=tf.float32, name='biases')
cc = tf.Variable([[1,2,0],[2,3,4]], dtype=tf.float32, name='biases1')
ccc = tf.Variable([0,1], dtype=tf.int32, name='biases2')
#对应label的centers-diff[0--]
centers = tf.scatter_sub(c,ccc,cc)
#centers = tf.scatter_sub(c,[0,1],cc)
#centers = tf.scatter_sub(c,[0,1],[[1,2,0],[2,3,4]])
#centers = tf.scatter_sub(c,[0,0,0],[[1,2,0],[2,3,4],[1,1,1]])
#即c[0]-[1,2,0] \ c[0]-[2,3,4]\ c[0]-[1,1,1],updates要减完:indices与updates元素个数相同
a = tf.Variable(initial_value=[[0, 0, 0, 0],[0, 0, 0, 0]])
b = tf.scatter_update(a, [0, 1], [[1, 1, 0, 0], [1, 0, 4, 0]])
#b = tf.scatter_update(a, [0, 1,0], [[1, 1, 0, 0], [1, 0, 4, 0],[1, 1, 0, 1]])
init = tf.global_variables_initializer()
sess1.run(init)
print(sess1.run(centers))
print(sess1.run(b))
[[ 0. 0. 0.]
[ 0. 0. 0.]]
[[1 1 0 0]
[1 0 4 0]]
[[-3. -4. -5.]
[ 2. 3. 4.]]
[[1 1 0 1]
[1 0 4 0]]
You can also use tf.assign() as a workaround as sliced assign was implemented for it, unlike for tf.assign_add() or tf.assign_sub(), as of TensorFlow version 1.8. Please note, you can only do one slicing operation (slice into slice is not going to work) and also this is not atomic, so if there are multiple threads reading and writing to the same variable, you don't know which operation will be the last one to write unless you explicitly code for it. tf.assign_add() and tf.assign_sub() are guaranteed to be thread safe. Still, this is better that nothing: consider this code (tested):
import tensorflow as tf
t1 = tf.Variable(tf.zeros([2,3,4],tf.int32))
t2 = tf.Variable(tf.ones([2,3,4],tf.int32))
assign_op = tf.assign( t1[ 1 ], t1[ 1 ] + t2[ 1 ] )
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run( init_op )
res = sess.run( assign_op )
print( res )
will output:
[[[0 0 0 0]
[0 0 0 0]
[0 0 0 0]]
[[1 1 1 1]
[1 1 1 1]
[1 1 1 1]]]
as desired.

Resources