I have a sparse matrix of type torch.sparse_coo. I want to find the top-k elements of each row. However, torch.topk() function can't be used for that sparse matrix.
Could anyone give me some advice on what to do?
Thanks!
Related
Suppose I have a matrix [[1,0],[0,1]] and [[2,0],[0,2]] I want to figure out a way to combine these 2 matrices into [[1,0,2,0],[0,1,0,2]]. I cannot find the appropriate constructor in the documentation: https://www.nalgebra.org/docs/user_guide/vectors_and_matrices/#matrix-construction
that will help me solve this problem.
I also tried firstly declaring an empty Dynamic matrix like DMatrix and then appending the rows using insert_row() but it seems that insert_row() can only be used to insert rows which are all just filled with the same constant number.
want to sort np.ndarray indexes of an array such as
[[.5,.7, .9], [.6, .0, .8]]
result would look like this
[[1,1],[0,1],[1,0],[0,1],[1,2],[0,3]]
applying those indexes will get correct sorting order and at same time can be applied to other structures that match the data.
I tried np.argsort, but that doesn't give indexes for ndarray
You can use np.argsort on the flat array and then use np.divmod to get the indexes of your previous shape.
Edit: np.unravel_index is the divmod alternative for higher dimensions, see https://numpy.org/doc/stable/reference/generated/numpy.unravel_index.html
Can someone please explain me how to do inner product of two tensors in python to get one dimensional array. For example, I have two tensors with size (6,6,6,6,6) and (6,6,6,6). I need an one dimensional array of size (6,1) or (1,6).
Numpy has a function tensordot, check it out:
https://numpy.org/doc/stable/reference/generated/numpy.tensordot.html
I need to multiply all rows of a matrix by column, i think with an example it will be clearer:
matrix is:
1,2,3
4,5,6
7,8,9
An i need an operation that returns:
28,80,162
But i can't find anything in the documentation and blogs and other SO question only are related to matrix multiplication and dot product, which is not what i need in this case,how can it be achieved in a vectorized fashion(instead of for loop based) ?
For example this is easy to achieve for the case of sum like:
the_matrix.sum(dim=0)
But there's not something like:
the_matrix.mul(dim=0)
I found the solution, there's no :
the_matrix.mul(dim=0)
But there's:
he_matrix.prod(dim=0)
Which does exactly what is needed.
What is a general guideline to handle missing categorical feature values when using Random Forest Regressor (or any ensemble learner for that matter)? I know that scikit learn has impute function (like mean...strategy or proximity) to impute missing values (numerical). But, how does one handle missing categorical value : Like Industry (oil, computer, auto, None), major(bachelors, masters, doctoral, None).
Any suggestion is appreciated.
Breiman and Cutler, the inventors of Random Forest, suggest two possible strategies (see http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#missing1):
Random forests has two ways of replacing missing values. The first way
is fast. If the mth variable is not categorical, the method computes
the median of all values of this variable in class j, then it uses
this value to replace all missing values of the mth variable in class
j. If the mth variable is categorical, the replacement is the most
frequent non-missing value in class j. These replacement values are
called fills.
The second way of replacing missing values is computationally more
expensive but has given better performance than the first, even with
large amounts of missing data. It replaces missing values only in the
training set. It begins by doing a rough and inaccurate filling in of
the missing values. Then it does a forest run and computes
proximities.
Alternatively, leaving your label variable aside for a minute, you could train a classifier on rows that have non-null values for the categorical variable in question, using all of your features in the classifier. Then use this classifier to predict values for the categorical variable in question in your 'test set'. Armed with a more complete data set, you can now return to the task of predicting values for your original label variable.