PyTorch error isDifferentiableType(variable.scalar_type()) for calculating det of a complex matrix - pytorch

Following up this post
When I want to use complex_det function to calculate det of a complex matrix I face this error:
RuntimeError: isDifferentiableType(variable.scalar_type()) INTERNAL ASSERT FAILED at "/pytorch/torch/csrc/autograd/functions/utils.h":59, please report a bug to PyTorch.
any idea how I can fix it?
<ipython-input-76-246d142f8871> in complex_det(A)
3 return torch.view_as_complex(torch.stack((A.real.diag(), A.imag.diag()),dim=1))
4 #Perform LU decomposition to matrix A:
----> 5 A_LU, pivots = A.lu()
6 P, A_L, A_U = torch.lu_unpack(A_LU, pivots)
7 #Det. of multiplied matrices is multiplcation of det.:
/usr/local/lib/python3.6/dist-packages/torch/tensor.py in lu(self, pivot, get_infos)
332 r"""See :func:`torch.lu`"""
333 # If get_infos is True, then we don't need to check for errors and vice versa
--> 334 LU, pivots, infos = torch._lu_with_info(self, pivot=pivot, check_errors=(not get_infos))
335 if get_infos:
336 return LU, pivots, infos

Related

type error in functions to run point in polygon query on RAPIDS

I want to create a point in polygon query for 14million NYC taxi trips and find out which of the 263 taxi zones the trips were located.
I want to the code on RAPIDS cuspatial. I read a few forums and posts, and came across cuspatial polygon limitations that users can only perform queries on 32 polygons in each run. So I did the following to split my polygons in batches.
This is my taxi zone polygon file
cusptaxizone
(0 0
1 1
2 34
3 35
4 36
...
258 348
259 349
260 350
261 351
262 353
Name: f_pos, Length: 263, dtype: int32,
0 0
1 232
2 1113
3 1121
4 1137
...
349 97690
350 97962
351 98032
352 98114
353 98144
Name: r_pos, Length: 354, dtype: int32,
x y
0 933100.918353 192536.085697
1 932771.395560 191317.004138
2 932693.871591 191245.031174
3 932566.381345 191150.211914
4 932326.317026 190934.311748
... ... ...
98187 996215.756543 221620.885314
98188 996078.332519 221372.066989
98189 996698.728091 221027.461362
98190 997355.264443 220664.404123
98191 997493.322715 220912.386162
[98192 rows x 2 columns])
There are 263 polygons/ taxi zones in total - I want to do queries in 24 batches and 11 polygons in each iteration.
def create_iterations(start, end, batches):
iterations = list(np.arange(start, end, batches))
iterations.append(end)
return iterations
pip_iterations = create_iterations(0, 264, 24)
#loop to do point in polygon query in a table
def perform_pip(cuda_df, cuspatial_data, polygon_name, iter_batch):
cuda_df['borough'] = " "
for i in range(len(iter_batch)-1):
start = pip_iterations[i]
end = pip_iterations[i+1]
pip = cuspatial.point_in_polygon(cuda_df['pickup_longitude'], cuda_df['pickup_latitude'],
cuspatial_data[0][start:end], #poly_offsets
cuspatial_data[1], #poly_ring_offsets
cuspatial_data[2]['x'], #poly_points_x
cuspatial_data[2]['y'] #poly_points_y
)
for i in pip.columns:
cuda_df['borough'].loc[pip[i]] = polygon_name[i]
return cuda_df
When I ran the function I received a type error. I wonder what might cause the issue?
pip_pickup = perform_pip(cutaxi, cusptaxizone, pip_iterations)
TypeError: perform_pip() missing 1 required positional argument: 'iter_batch'
It seems like you are passing in cutaxi for cuda_df, cusptaxizone for cuspatial_data and pip_iterations for polygon_name variable in perform_pip function. There is no variable/value passed for iter_batch defined in perform_pip function:
def perform_pip(cuda_df, cuspatial_data, polygon_name, iter_batch):
Hence, you get the above error which states that iter_batch is missing. As stated in the above comment as well you are not passing the right number of parameters for perform_pip function.
If you edit your code to pass in the right number of variables to perform_pip function the above mentioned error :
TypeError: perform_pip() missing 1 required positional argument: 'iter_batch'
would be resolved.

sklearn, Keras, DeepStack - ValueError: multi_class must be in ('ovo', 'ovr')

I trained a set of DNNs and I want to use them in a deep ensemble. The code is implemented in TF2, but the package deepstack works with Keras as well. The code looks something like this
from deepstack.base import KerasMember
from deepstack.ensemble import DirichletEnsemble
dirichletEnsemble = DirichletEnsemble(N=2000 * ensemble_size)
for net_idx in range(0,ensemble_size):
member = KerasMember(name=model_name, keras_model=model,
train_batches=(train_images,train_labels), val_batches=(valid_images, valid_labels))
dirichletEnsemble.add_member(member)
dirichletEnsemble.fit()
where 'model' is essentially a Keras model, thus you need to load one model at each loop (I am using my own implementation). 'ensemble_size' represents the number of DNNs used in the ensemble.
As a result, I get the following error
ValueError: multi_class must be in ('ovo', 'ovr')
which is generated by the sklearn package.
FURTHER DETAILS: deepstack creates a metric
metric = metrics.roc_auc_score
and then returns it as
return metric(y_t, y_p)
which then calls sklearn
if multi_class == 'raise':
raise ValueError("multi_class must be in ('ovo', 'ovr')")
In my specific case, the labels are respectively y_t
[ 7 10 18 52 10 13 10 4 7 7 24 26 7 26 13 13]
and y_p
[ 73 250 250 250 281 281 250 281 281 174 281 250 281 250 250 250]
How do I set multi_class as 'ovo' or 'ovr'?
The documentation for roc_auc_score indicates the following:
roc_auc_score(
y_true,
y_score,
*,
average='macro',
sample_weight=None,
max_fpr=None,
multi_class='raise',
labels=None
)
The second last parameter there is multi_class, which has the following explanation:
Multiclass only. Determines the type of configuration to use. The default value raises an error, so either 'ovr' or 'ovo' must be passed explicitly.
So, it seems that there is some variation in how roc auc is calculated and they are forcing you to explicitly choose which variation you want them to use. If you don't make the choice, the default will result in an exception being raised. And that exception is the error that you are reporting in your question title.
if you are getting this error while using sklearn roc_auc_score library, try roc_auc_score(YTEST,YPRED, multi_class='ovr') ovr is one vs rest which will convert your multiclass problem to a binary problem

linearK error in seq. default() cannot be NA, NaN

I am trying to learn linearK estimates on a small linnet object from the CRC spatstat book (chapter 17) and when I use the linearK function, spatstat throws an error. I have documented the process in the comments in the r code below. The error is as below.
Error in seq.default(from = 0, to = right, length.out = npos + 1L) : 'to' cannot be NA, NaN or infinite
I do not understand how to resolve this. I am following this process:
# I have data of points for each data of the week
# d1 is district 1 of the city.
# I did the step below otherwise it was giving me tbl class
d1_data=lapply(split(d1, d1$openDatefactor),as.data.frame)
# I previously create a linnet and divided it into districts of the city
d1_linnet = districts_linnet[["d1"]]
# I create point pattern for each day
d1_ppp = lapply(d1_data, function(x) as.ppp(x, W=Window(d1_linnet)))
plot(d1_ppp[[1]], which.marks="type")
# I am then converting the point pattern to a point pattern on linear network
d1_lpp <- as.lpp(d1_ppp[[1]], L=d1_linnet, W=Window(d1_linnet))
d1_lpp
Point pattern on linear network
3 points
15 columns of marks: ‘status’, ‘number_of_’, ‘zip’, ‘ward’,
‘police_dis’, ‘community_’, ‘type’, ‘days’, ‘NAME’,
‘DISTRICT’, ‘openDatefactor’, ‘OpenDate’, ‘coseDatefactor’,
‘closeDate’ and ‘instance’
Linear network with 4286 vertices and 6183 lines
Enclosing window: polygonal boundary
enclosing rectangle: [441140.9, 448217.7] x [4640080, 4652557] units
# the errors start from plotting this lpp object
plot(d1_lpp)
"show.all" is not a graphical parameter
Show Traceback
Error in plot.window(...) : need finite 'xlim' values
coords(d1_lpp)
x y seg tp
441649.2 4649853 5426 0.5774863
445716.9 4648692 5250 0.5435492
444724.6 4646320 677 0.9189631
3 rows
And then consequently, I also get error on linearK(d1_lpp)
Error in seq.default(from = 0, to = right, length.out = npos + 1L) : 'to' cannot be NA, NaN or infinite
I feel lpp object has the problem, but I find it hard to interpret the errors and how to resolve them. Could someone please guide me?
Thanks
I can confirm there is a bug in plot.lpp when trying to plot the marked point pattern on the linear network. That will hopefully be fixed soon. You can plot the unmarked point pattern using
plot(unmark(d1_lpp))
I cannot reproduce the problem with linearK. Which version of spatstat are you running? In the development version on my laptop spatstat_1.51-0.073 everything works. There has been changes to this code recently, so it is likely that this will be solved by updating to development version (see https://github.com/spatstat/spatstat).

svm train output file has less lines than that of the input file

I am currently building a binary classification model and have created an input file for svm-train (svm_input.txt). This input file has 453 lines, 4 No. features and 2 No. classes [0,1].
i.e
0 1:15.0 2:40.0 3:30.0 4:15.0
1 1:22.73 2:40.91 3:36.36 4:0.0
1 1:31.82 2:27.27 3:22.73 4:18.18
0 1:22.73 2:13.64 3:36.36 4:27.27
1 1:30.43 2:39.13 3:13.04 4:17.39 ......................
My problem is that when I count the number of lines in the output model generated by svm-train (svm_train_model.txt), this has 12 fewer lines than that of the input file. The line count here shows 450, although there are obviously also 9 lines at the beginning showing the various parameters generated
i.e.
svm_type c_svc
kernel_type rbf
gamma 1
nr_class 2
total_sv 441
rho -0.156449
label 0 1
nr_sv 228 213
SV
Therefore 12 lines in total from the original input of 453 have gone. I am new to svm and was hoping that someone could shed some light on why this might have happened?
Thanks in advance
Updated.........
I now believe that in generating the model, it has removed lines whereby the labels and all the parameters are exactly the same.
To explain............... My input is a set of miRNAs which have been classified as 1 and 0 depending on their involvement in a particular process or not (i.e 1=Yes & 0=No). The input file looks something like.......
0 1:22 2:30 3:14 4:16
1 1:26 2:15 3:17 4:25
0 1:22 2:30 3:14 4:16
Whereby, lines one and three are exactly the same and as a result will be removed from the output model. My question is then both why the output model would do this and how I can get around this (whilst using the same features)?
Whilst both SOME OF the labels and their corresponding feature values are identical within the input file, these are still different miRNAs.
NOTE: The Input file does not have a feature for miRNA name (and this would clearly show the differences in each line) however, in terms of the features used (i.e Nucleotide Percentage Content), some of the miRNAs do have exactly the same percentage content of A,U,G & C and as a result are viewed as duplicates and then removed from the output model as it obviously views them as duplicates even though they are not (hence there are less lines in the output model).
the format of the input file is:
Where:
Column 0 - label (i.e 1 or 0): 1=Yes & 0=No
Column 1 - Feature 1 = Percentage Content "A"
Column 2 - Feature 2 = Percentage Content "U"
Column 3 - Feature 3 = Percentage Content "G"
Column 4 - Feature 4 = Percentage Content "C"
The input file actually looks something like (See the very first two lines below), as they appear identical, however each line represents a different miRNA):
1 1:23 2:36 3:23 4:18
1 1:23 2:36 3:23 4:18
0 1:36 2:32 3:5 4:27
1 1:14 2:41 3:36 4:9
1 1:18 2:50 3:18 4:14
0 1:36 2:23 3:23 4:18
0 1:15 2:40 3:30 4:15
In terms of software, I am using libsvm-3.22 and python 2.7.5
Align your input file properly, is my first observation. The code for libsvm doesnt look for exactly 4 features. I identifies by the string literals you have provided separating the features from the labels. I suggest manually converting your input file to create the desired input argument.
Try the following code in python to run
Requirements - h5py, if your input is from matlab. (.mat file)
pip install h5py
import h5py
f = h5py.File('traininglabel.mat', 'r')# give label.mat file for training
variables = f.items()
labels = []
c = []
import numpy as np
for var in variables:
data = var[1]
lables = (data.value[0])
trainlabels= []
for i in lables:
trainlabels.append(str(i))
finaltrain = []
trainlabels = np.array(trainlabels)
for i in range(0,len(trainlabels)):
if trainlabels[i] == '0.0':
trainlabels[i] = '0'
if trainlabels[i] == '1.0':
trainlabels[i] = '1'
print trainlabels[i]
f = h5py.File('training_features.mat', 'r') #give features here
variables = f.items()
lables = []
file = open('traindata.txt', 'w+')
for var in variables:
data = var[1]
lables = data.value
for i in range(0,1000): #no of training samples in file features.mat
file.write(str(trainlabels[i]))
file.write(' ')
for j in range(0,49):
file.write(str(lables[j][i]))
file.write(' ')
file.write('\n')

Input contains NaN, infinity or a value too large for dtype('float64')

Running into the above error message when fitting the model for X and Y. Both are taken from training data and truth respectively. Verified that data does not contain NaN or Inf.
Tried to subset the data into 20*3 matrix and eyeballed the data where nothing seemed out of place. How can I fix it.
Here is the data subset I am working on:
1 2 3
12235 0.0369 -0.1415 -0.4381
11008 0.4285 0.2449 0.7858
15983 0.5557 0.0466 -0.2477
15881 0.8825 1.3252 -0.2296
14037 1.6551 0.5298 0.1924
4860 0.7082 -0.3576 0.5771
13475 0.0103 0.1030 1.4402
7226 0.5135 1.2396 0.9988
2862 0.5454 -0.1530 1.5451
1401 0.7960 0.9605 0.8021
3988 0.2682 0.9393 -0.1930
16346 -0.2303 0.5633 0.5991
15293 0.9816 0.6522 0.1207
895 0.6816 0.6819 0.5101
14781 0.2243 0.0350 -0.6212
14791 0.1902 0.2113 0.4330
4869 0.5471 1.4235 0.4891
1770 0.5270 0.4097 0.3691
15483 1.0364 0.8619 0.6298
17033 0.9304 -0.3223 0.9128
1 2 3
9909 0.0884 0.3513 0.7508
4307 0.3094 0.8885 1.2935
14128 -0.5162 1.0465 -1.1435
15694 0.6993 0.3426 0.9185
3709 -0.6405 -0.3263 0.2199
16190 0.7642 0.4764 0.3143
15877 0.6836 0.2586 0.8664
3319 -0.3437 -0.1538 0.5070
8135 0.1876 0.9128 -0.1812
13035 0.7733 1.7522 0.4158
12168 -0.0617 -0.0897 0.3686
10469 1.1860 0.3772 0.4178
6211 0.8808 1.0333 0.5994
9491 0.5110 0.6489 0.6749
8310 0.5609 0.1232 0.7549
171 1.3448 -0.7569 -0.1178
2068 0.4097 -0.1648 0.1831
4393 -0.2469 -0.4033 0.2077
2134 0.9408 0.2473 0.2176
12191 0.1368 1.5374 0.7149
I was passing dataframe, using .values in fit() fixes the issue and also my Y_Predict is giving few NaNs. It worked after I fixed these things. Thanks!!

Resources