Pytorch find unique vectors in tensor - pytorch

I have a tensor containing binary values e.g
T1 = torch.tensor([[1., 0., 1.],
[0., 1., 0.],
[1., 0., 1.]])
i need to convert this to:
tensor([[1., 0., 1.],
[0., 1., 0.]])
I looked into torch.unique but it only works for values?
Is there a way to do the unique operation across entire vectors

Although the PyTorch documentation is not very clear the dim parameter can achieve this I found this post solving the issue Delete duplicated rows in torch.tensor
hence
torch.unique(T1,dim=0)
would solve the problem

Related

Scikit learn preprocessing cannot understand the output using min_frequency argument in OneHotencoder class

Consider the below array t. When using min_frequency kwarg in the OneHotEncoder class, I cannot understand why the category snake is still present when transforming a new array. There are 2/40 events of this label. Should the shape of e be (4,3) instead?
sklearn.__version__ == '1.1.1'
t = np.array([['dog'] * 8 + ['cat'] * 20 + ['rabbit'] * 10 +
['snake'] * 2], dtype=object).T
enc = OneHotEncoder(min_frequency= 4/40,
sparse=False).fit(t)
print(enc.infrequent_categories_)
# [array(['snake'], dtype=object)]
e = enc.transform(np.array([['dog'], ['cat'], ['dog'], ['snake']]))
array([[0., 1., 0., 0.],
[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 0., 1.]]) # snake is present?
Check out enc.get_feature_names_out():
array(['x0_cat', 'x0_dog', 'x0_rabbit', 'x0_infrequent_sklearn'],
dtype=object)
"snake" isn't considered its own category anymore, but lumped into the infrequent category. If you added some other rare categories, they'd be assigned to the same, and if you additionally set handle_unknown="infrequent_if_exist", you would also encode unseen categories to the same.

UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach()

I'm new on PyTorch and I'm trying to code with it
so I have a function called OH which tack a number and return a vector like this
def OH(x,end=10,l=12):
x = T.LongTensor([[x]])
end = T.LongTensor([[end]])
one_hot_x = T.FloatTensor(1,l)
one_hot_end = T.FloatTensor(1,l)
first=one_hot_x.zero_().scatter_(1,x,1)
second=one_hot_end.zero_().scatter_(1,end,1)
vector=T.cat((one_hot_x,one_hot_end),dim=1)
return vector
OH(0)
output:
tensor([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 1., 0.]])
now I have a NN that takes this output and return number but this warning always appear in my compiling
online.act(OH(obs))
output:
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:17: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
4
I tried to to use online.act(OH(obs).clone().detach()) but it give me the same warning
and the code works fine and give good results but I need to understand this warning
Edit
the following is my NN that has the act function
class Network(nn.Module):
def __init__(self,lr,n_action,input_dim):
super(Network,self).__init__()
self.f1=nn.Linear(input_dim,128)
self.f2=nn.Linear(128,64)
self.f3=nn.Linear(64,32)
self.f4=nn.Linear(32,n_action)
#self.optimizer=optim.Adam(self.parameters(),lr=lr)
#self.loss=nn.MSELoss()
self.device=T.device('cuda' if T.cuda.is_available() else 'cpu')
self.to(self.device)
def forward(self,x):
x=F.relu(self.f1(x))
x=F.relu(self.f2(x))
x=F.relu(self.f3(x))
x=self.f4(x)
return x
def act(self,obs):
state=T.tensor(obs).to(device)
actions=self.forward(state)
action=T.argmax(actions).item()
return action
the problem is that you are receiving a tensor on the act function on the Network and then save it as a tensor
just remove the tensor in the action like this
def act(self,obs):
#state=T.tensor(obs).to(device)
state=obs.to(device)
actions=self.forward(state)
action=T.argmax(actions).item()

How to interprete ACF and PACF functions from statsmodels?

I'm trying to determine p and q values for an ARMA model. The time series is already stationary and I was looking to ACF and PACF plots, but I need to get those p and q values "on the go" (like performing a simulation).
I noticed that in statsmodels there are actually two functions for acf and pacf, but I'm not understanding how to use them properly.
This is how the code looks like
from statsmodels.tsa.stattools import acf, pacf
>>>acf(data,qstat=True)
(array([1. , 0.98707179, 0.9809318 , 0.9774078 , 0.97436479,
0.97102392, 0.96852746, 0.96620799, 0.9642253 , 0.96288455,
0.96128443, 0.96026672, 0.95912503, 0.95806287, 0.95739194,
0.95622575, 0.9545498 , 0.95381055, 0.95318588, 0.95203675,
0.95096276, 0.94996035, 0.94892427, 0.94740811, 0.94582933,
0.94420572, 0.9420396 , 0.9408416 , 0.93969163, 0.93789606,
0.93608273, 0.93413445, 0.93343312, 0.93233588, 0.93093149,
0.93033546, 0.92983324, 0.92910616, 0.92830326, 0.92799811,
0.92642784]),
array([ 2916.11296684, 5797.02377904, 8658.22999328, 11502.6002944 ,
14328.44503612, 17140.72034976, 19940.48013538, 22729.69637912,
25512.09429552, 28286.18290207, 31055.33003897, 33818.82409725,
36577.1270353 , 39332.49361223, 42082.0755955 , 44822.94911057,
47560.49941212, 50295.38504714, 53024.59880222, 55748.57526173,
58467.72758802, 61181.8659989 , 63888.25003765, 66586.53110019,
69276.46332225, 71954.97102175, 74627.57217707, 77294.54406888,
79952.23080669, 82600.54514273, 85238.73829645, 87873.86209917,
90503.68343426, 93126.47509834, 95746.79574474, 98365.17422285,
100980.34471949, 103591.88164688, 106202.58634768, 108805.3453693 ]),
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0.]))
>>>pacf(data)
array([ 1. , 0.98740203, 0.26463067, 0.18709112, 0.11351714,
0.0540612 , 0.06996315, 0.05159168, 0.05358487, 0.06867607,
0.03915513, 0.06099868, 0.04020074, 0.0390229 , 0.05198753,
0.01873783, -0.00169158, 0.04387457, 0.03770717, 0.01360295,
0.01740693, 0.01566421, 0.01409722, -0.00988412, -0.00860644,
-0.00905181, -0.0344616 , 0.0199406 , 0.01123293, -0.02002155,
-0.01415968, -0.0266674 , 0.03583483, 0.0065682 , -0.00483241,
0.0342638 , 0.02353691, 0.01704061, 0.01292073, 0.03163407,
-0.02838961])
How can I get p and q with this functions? The acf function returns only 1 array if qstat is set to False
Selecting the order of an ARMA(p,q) model using estimated ACFs/PACFs is usually not the best approach. This is simply because in case of an ARMA process both the ACF and PACF slowly decay (in absolute terms) for increasing lags. So you cannot really infer the lag order from it. Instead they are mostly used for pure AR/MA models in which you observe a clear cutoff in either of the two series (but even then it is more of a graphical approach).
If you want to determine p and q "on the fly" for an ARMA model it seems more reasonable to use information criteria (e.g. AIC, BIC, etc.). statsmodels provides the function arma_order_select_ic() for this very purpose. So what you want is something like this:
from statsmodels.tsa.stattools import arma_order_select_ic
arma_order_select_ic(data, max_ar=4, max_ma=4, ic='bic')

How do I create a torch diagonal matrices with different element in each batch?

I want to create a tensor like
tensor([[[1,0,0],[0,1,0],[0,0,1]],[[2,0,0],[0,2,0],[0,0,2]]]])
That is, when a torch tensor B of size (1,n) is given, I want to create a torch tensor A of size (n,3,3) such that A[i] is an B[i] * (identity matrix of size 3x3).
Without using 'for sentence', how do I create this?
Use torch.einsum (Einstein's notation of sum and product)
A = torch.eye(3)
b = torch.tensor([1.0, 2.0, 3.0])
torch.einsum('ij,k->kij', A, b)
Will return:
tensor([[[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]],
[[2., 0., 0.],
[0., 2., 0.],
[0., 0., 2.]],
[[3., 0., 0.],
[0., 3., 0.],
[0., 0., 3.]]])

How to slice matrix with logic sign?

I can apply the following code to an array.
from numpy import *
A = eye(4)
A[A[:,1] > 0.5,:]
But How can I apply the similar method to a mat?
A = mat(eye(4))
A[A[:,1] > 0.5,:]
I know the above code is wrong, but what should I do?
The problem is that, when A is a numpy.matrix, A[:,1] returns a 2-d matrix, and therefore A[:,1] > 0.5 is also 2-d. Anything that makes this expression look like the same thing that is created when A is an ndarray will work. For example, you can write A.A[:,1] > 0.5 (the .A attribute returns an ndarray view of the matrix), or (A[:,1] > 0.5).A1 (the A1 attribute returns a flatten ndarray).
For example,
In [119]: A
Out[119]:
matrix([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.]])
In [120]: A[(A[:, 1] > 0.5).A1,:]
Out[120]: matrix([[ 0., 1., 0., 0.]])
In [121]: A[A.A[:, 1] > 0.5,:]
Out[121]: matrix([[ 0., 1., 0., 0.]])
Because of quirks like these, I (and many others) recommend avoiding the numpy.matrix class. Most code can be written just as easily by using ndarrays throughout.

Resources