How to select 5% of total values from a tensor randomly? - pytorch

a = torch.rand(2,5,10)
I want to select at most 5% of values from tensor a randomly and then multiply those values with -1? How to do that? kindly, give a generic solution as the shape of the tensor is not fixed

This worked for me
out = (torch.rand_like(a) - 0.05).sign().type_as(a) * a

Related

How to query a matrix for multiple values and receive value

I am a bit lost with Excel. I have two values
Length
Weigth
I am looking for a solution for the problem below:
Take length and weight from the input values, find the next upper value for both in the matrix and receive the value from the value column. And if there is no upper value for my input, i want to have a fallback and always use teh value 1.0.
How to solve this?
I have a matrix which has these values inside:
You could try:
=IFERROR(INDEX(SORT(SORT(FILTER(MATRIX, (LENGTHS>length)*(WEIGHTS>weight)),1,1),2,1),1,3),1)
, where MATRIX, LENGTHS, and WEIGHTS refer to the data on the right, whilst length and weight refer to the cells containing 80 and 450.
Just seen you said H6 was what I referred to as length - if so:
=IFERROR(INDEX(SORT(SORT(FILTER(K6:M11, (K6:K11>H6)*(L6:L11>H7)),1,1),2,1),1,3),1)
in H8.
You can use XMATCH to get the next value as follow:
=LET(f, FILTER(D2:F7, D2:D7>B1),
IFNA(INDEX(INDEX(f,,3), XMATCH(B2, INDEX(f,,2),1)), 1))
If you want to treat the error when the length condition is not satisfied, you can modify it as follows:
=LET(f, FILTER(D2:F7, D2:D7>B1, ""), IF(#f="", "No match for length condition",
IFNA(INDEX(INDEX(f,,3),XMATCH(B2, INDEX(f,,2),1)), 1)))

How can I apply dropout per row of a 2d tensor in pytorch

I have a (relatively sparse) 2d tensor U of shape (B, I) of 1s and 0s. Each row represents a user and each column an item where the cell is 1 if the user has interacted with said item and 0 if not.
I want to apply dropout (or a similar tensor operation to it) so that, at random, p% of the 1s in each row (i.e. per user) are set to 0.
How can I go about doing that efficiently without a for-loop along the B dimension (where I would just use pytorch's dropout on the row 1d tensors, after accounting for the 0s)?
If I understand the question correctly, you want to build the network out manually? One way to do this would be to create a boolean array (same size of your weights) each run. Then multiply that with the weight before using it.
dropout = torch.randint(2, (10,))
weights = torch.randn(10)
dr_wt = dropout * weights
Edit
you can create a array with 10% 1s rest 0s. then shuffle it every run to multiply with the weights.
a = np.zeros(10)
a[0] = 1
np.random.shuffle(a)
a = torch.as_tensor(a)
Correct me if I'm wrong, but if you say you want a p% of 1s to turn to 0s per row, then row 0 might have 10 1s and row 1 might have 100. When you apply dropout on average, only one of the 1s gets affected by the dropout mask, while about 10 get affected in the second row.
def dropout(input: Tensor, p: float = 0.5):
mask = torch.rand_like(input) > p # creates a bool tensor
return input * mask
I don't know how you would be able to guarantee that exactly 10% get nulled without using some sort of row-based sampling of nonzero indices, which in turn requires a for loop.

Math.Net Weighted Least Squares dimensions issue

I'm trying to run a univariate WLS - using WeightedRegression.Weighted(X,y,W) - and am getting the error message Matrix dimensions must agree: op1 is 5836x1, op2 is 5836x1. Whether I make Y a column vector or a matrix (with only one column) does not matter.
From the error message, you can see the matrix (or the vector and matrix) dimensions agree - both have 5836 rows and 1 column.
What am I doing wrong?
What are the dimensions of your weight matrix? I think it expects a diagonal matrix, so in your case 5836x5836.

Apply this function to a 2D numpy Matrix vector operations only

guys, I have this function
def averageRating(a,b):
avg = (float(a)+float(b))/2
return round(avg/25)*25
Currently, I am looping over my np array which is just a 2D array that has numerical values. What I want to be able to do is have "a" be the 1st array and "b" be the 2nd array and get the average per row and what I want for my return is just an array with the values. I have used mean but could not find a way to edit it and have the round() part or multiple (avg*25)/25.
My goal is to get rid of looping and replace it with a vectorized operations because of how slow looping is.
Sorry for the question new to python and numpy.
def averageRating(a,b):
avg = (np.average(a,axis=1) + np.average(b,axis=1))/2
return np.round(avg,0)
This should do what you are looking for if I understand the question correctly. Specifying axis = 1 in np.average will give the average of the rows (axis = 0 would be the average of the columns). And the 0 in np.round will round to 0 decimal places, changing it will change the number of decimal places you round to. Hope that helps!
def averageRating(a, b):
averages = []
for i in range( len(a) ):
averages.append( (a[i] + b[i]) / 2 )
return averages
Giving your arrays are of equal length this should be a simple resolution.
This doesn't eliminate the use of for loops, however, it will be computationally cheaper than the current approach.

Normalizing Vectors with Negative values

I want to represent each text-based item I have in my system as a vector in vector space model. The values for the terms can be negative or positive that reflect the frequency of a term in the positive or negative class. The zero value means neutral
for example:
Item1 (-1,0,-5,4.5,2)
Item2 (2,6,0,-4,0.5)
My questions are:
1- How can I normalize my vectors to a range of [0 to 1] where:
.5 means zero before normalization
and .5> if it is positive
.5< if it negative
I want to know if there is a mathematical formula to do such a thing.
2- Will similarity measure choice be different after the normalization?? For example can I use Cosine similarity?
3- Will it be difficult if I preform dimensionality reduction after the normalization??
Thanks in advance
One solution could be to use the MinMaxScaler which scales the number between (0, 1) range and then divide each row by the sum of the row. In python using sklearn you can do something like this:
from sklearn.preprocessing import MinMaxScaler, normalize
scaler = MinMaxScaler()
scaled_X = scaler.fit_transform(X)
normalized_X = normalize(scaled_X, norm='l1', axis=1, copy=True)

Resources