How can a tensor be flipped in Theano? - theano

Given a tensor v = t.vector(), how can I flip it?
For instance, [1, 2, 3, 4, 5, 6] flipped is [6, 5, 4, 3, 2, 1].

You can simply do v[::-1].eval(), or just v[::-1] in the middle of your computational graph.
Minimal example:
import numpy as np
import theano
from theano import tensor as T
X_values = np.arange(10).astype(theano.config.floatX)
X = T.shared(X_values, 'X')
print(X.eval())
print(X[::-1].eval())
See the section on indexing here for more details.

OK, I know I'm late to the party here, but I've just started playing with Theano, and thought I'd throw in this variation, since I don't think shared values were necessary:
from theano import tensor as T
from theano import function as Tfunc
z = T.vector()
f = Tfunc([z],z[::-1])
This gives:
>>> f([1,3,5,7,9])
array([ 9., 7., 5., 3., 1.])

Related

Large rounding errors in python plots

I try to plot the following simple sequence
a_n=\frac{3^n+1}{7^n+8}
which should tend to 0, but the plot shows a weird effect for values of $n$ near 20....
I use the code
import numpy as np
import matplotlib.pyplot as plt
def f(n):
return (3**n+1)/(7**n+8)
n=np.arange(0,25, 1)
plt.plot(n,f(n),'bo-')
On the other hand, computing numerically the above sequence one does not find such large values
for i in range(0,25):
print([i,f(i)])
[0, 0.2222222222222222]
[1, 0.26666666666666666]
[2, 0.17543859649122806]
[3, 0.07977207977207977]
[4, 0.034039020340390205]
[5, 0.014510853404698185]
[6, 0.0062044757218015075]
[7, 0.0026567874970706124]
[8, 0.0011382857610720493]
[9, 0.00048778777316480816]
[10, 0.00020904485804220367]
[11, 8.958964415487241e-05]
[12, 3.8395417418579486e-05]
[13, 1.6455158259653074e-05]
[14, 7.05220773432529e-06]
[15, 3.022374322043928e-06]
[16, 1.295303220696569e-06]
[17, 5.551299431298911e-07]
[18, 2.3791283154177113e-07]
[19, 1.0196264191387531e-07]
[20, 4.3698275080881505e-08]
[21, 1.872783217393992e-08]
[22, 8.026213788319863e-09]
[23, 3.439805909206865e-09]
[24, 1.4742025325067883e-09]
​
Why is this happening?
The issue is not with matplotlib, but with the datatype of the numbers that arange is producing. You are not specifying the dtype, because in the docs for arange, it states that is inferred from the input. Your inputs are integers, so it must assume they are 32-bit integers since the dtype is unmodified so that when I check the type:
print(type(n[0]))
<class 'numpy.int32'>
If I change the dtype to single precision floats, we get the behavior you expect:
n = np.arange(0,25,1, dtype=np.float32)
print(type(n[0]))
<class 'numpy.float32'>
plt.plot(n,f(n),'bo-')
Alternatively, you could just put a period behind the 1 -> 1. to imply you want double-precision floats (even if the resulting array contains integer-esque numbers [0., 1., 2., ...])

Using Pytorch how to define a tensor with indices and corresponding values

Problem
I have a list of indices and a list of values like so:
i = torch.tensor([[2, 2, 1], [2, 0, 2]])
v = torch.tensor([1, 2, 3])
I want to define a (3x3 for the example) matrix which contains the values v at the indices i (1 at position (2,2), 2 at position (2, 0) and 3 at position (1,2)):
tensor([[0, 0, 0],
[0, 0, 3],
[2, 0, 1]])
What I have tried
I can do it using a trick, with torch.sparse and .to_dense() but I feel that it's not the "pytorchic" way to do that nor the most efficient:
f = torch.sparse.FloatTensor(indices, values, torch.Size([3, 3]))
print(f.to_dense())
Any idea for a better solution ?
Ideally I would appreciate a solution at least as fast than the one provided above.
Of course this was just an example, no particular structure in tensors i and v are assumed (neither for the dimension).
There is an alternative, as below:
import torch
i = torch.tensor([[2, 2, 1], [2, 0, 2]])
v = torch.tensor([1, 2, 3], dtype=torch.float) # enforcing same data-type
target = torch.zeros([3,3], dtype=torch.float) # enforcing same data-type
target.index_put_(tuple([k for k in i]), v)
print(target)
The target tensor will be as follows:
tensor([[0., 0., 0.],
[0., 0., 3.],
[2., 0., 1.]])
This medium.com blog article provides a comprehensive list of all index functions for PyTorch Tensors.

LDA covariance matrix not match calculated covariance matrix

I'm looking to better understand the covariance_ attribute returned by scikit-learn's LDA object.
I'm sure I'm missing something, but I expect it to be the covariance matrix associated with the input data. However, when I compare .covariance_ against the covariance matrix returned by numpy.cov(), I get different results.
Can anyone help me understand what I am missing? Thanks and happy to provide any additional information.
Please find a simple example illustrating the discrepancy below.
import numpy as np
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
# Sample Data
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 0, 0, 0])
# Covariance matrix via np.cov
print(np.cov(X.T))
# Covariance matrix via LDA
clf = LinearDiscriminantAnalysis(store_covariance=True).fit(X, y)
print(clf.covariance_)
In sklearn.discrimnant_analysis.LinearDiscriminantAnalysis, the covariance is computed as follow:
In [1]: import numpy as np
...: cov = np.zeros(shape=(X.shape[1], X.shape[1]))
...: for c in np.unique(y):
...: Xg = X[y == c, :]
...: cov += np.count_nonzero(y==c) / len(y) * np.cov(Xg.T, bias=1)
...: print(cov)
array([[0.66666667, 0.33333333],
[0.33333333, 0.22222222]])
So it corresponds to the sum of the covariance of each individual class multiplied by a prior which is the class frequency. Note that this prior is a parameter of LDA.

sklearn train_test_split returns some elements in both test/train

I have a data-set X with 260 unique observations.
when running x_train,x_test,_,_=test_train_split(X,y,test_size=0.2) I would assume that
[p for p in x_test if p in x_train] would be empty, but it is not. Actually it turns out that only two observations in x_test is not in x_train.
Is that intended or...?
EDIT (posted the data I am using):
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split as split
import numpy as np
DATA=load_breast_cancer()
X=DATA.data
y= DATA.target
y=np.array([1 if p==0 else 0 for p in DATA.target])
x_train,x_test,y_train,y_test=split(X,y,test_size=0.2,stratify=y,random_state=42)
len([p for p in x_test if p in x_train]) #is not 0
EDIT 2.0: Showing that the test works
a=np.array([[1,2,3],[4,5,6]])
b=np.array([[1,2,3],[11,12,13]])
len([p for p in a if p in b]) #1
This is not a bug with the implementation of train_test_split in sklearn, but a weird peculiarity of how the in operator works on numpy arrays. The in operator first does an elementwise comparison between two arrays, and returns True if ANY of the elements match.
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([[6, 7, 8], [5, 5, 5]])
a in b # True
The correct way to test for this kind of overlap is using the equality operator and np.all and np.any. As a bonus, you also get the indices that overlap for free.
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([[6, 7, 8], [5, 5, 5], [7, 8, 9]])
a in b # True
z = np.any(np.all(a == b[:, None, :], -1)) # False
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([[6, 7, 8], [1, 2, 3], [7, 8, 9]])
a in b # True
overlap = np.all(a == b[:, None, :], -1)
z = np.any(overlap) # True
indices = np.nonzero(overlap) # (1, 0)
You need to check using the following:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split as split
import numpy as np
DATA=load_breast_cancer()
X=DATA.data
y= DATA.target
y=np.array([1 if p==0 else 0 for p in DATA.target])
x_train,x_test,y_train,y_test=split(X,y,test_size=0.2,stratify=y,random_state=42)
len([p for p in x_test.tolist() if p in x_train.tolist()])
0
Using x_test.tolist() the in operator will work as intended.
Reference: testing whether a Numpy array contains a given row

array-like shape (n_samples,) vs [n_samples] in sklearn documents

For the sample_weight, the requirement of its shape is array-like shape (n_samples,), sometimes is array-like shape [n_samples]. Does (n_samples,) means 1d array? and [n_samples] means list? Or they're equivalent to each other?
Both forms can be seen here: http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html
You can use a simple example to test this:
import numpy as np
from sklearn.naive_bayes import GaussianNB
#create some data
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
Y = np.array([1, 1, 1, 2, 2, 2])
#create the model and fit it
clf = GaussianNB()
clf.fit(X, Y)
#check the type of some attributes
type(clf.class_prior_)
type(clf.class_count_)
#check the shapes of these attributes
clf.class_prior_.shape
clf.class_count_
Or more advanced searching:
#verify that it is a numpy nd array and NOT a list
isinstance(clf.class_prior_, np.ndarray)
isinstance(clf.class_prior_, list)
Similarly, you can check all the attributes.
Results
numpy.ndarray
numpy.ndarray
(2,)
array([ 3., 3.])
True
False
The results indicate that these atributes are numpy nd arrays.

Resources