I need to specify the dtype (data type) for sklearn's Kernel Density Function within a definition block from nvidia's rapids cudf library. In Python 3.7, I am able to find type information, but for some reason, it is not considered an accepted data type with nvidia's rapids def block. I am including my code and error message below so that anyone can reproduce the error message.
Here is the code for the typical implementation of Kernel Density function:
from sklearn.neighbors import KernelDensity
import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
kde = KernelDensity(kernel='gaussian', bandwidth=0.2).fit(X)
kde.score_samples(X)
array([-0.41075698, -0.41075698, -0.41076071, -0.41075698, -0.41075698,
-0.41076071])
type(kde)
<class 'sklearn.neighbors.kde.KernelDensity'>
Here is the NVIDIA Rapids Def block that I used with Sklearn's Kernel Density Function:
import cudf, math
import numpy as np
df = cudf.DataFrame()
nelem = 10
df['in1'] = np.arange(nelem) * 1.5
df['in2'] = np.arange(nelem) * 1.45
#Define input columns for the kernel
in1 = df['in1']
in2 = df['in2']
def kernel(in1, in2, out1, out2, out3, out4, kwarg1, kwarg2):
for i, (x, y) in enumerate(zip(in1, in2)):
out1[i] = [math.tan(i) for i in x]
out2[i] = np.array(out1[i].to_pandas())
out3[i] = ((KernelDensity(kernel='gaussian', bandwidth=kwarg1).fit(out2[i])).score_samples(out2[i]))
out4[i] = [i >= kwarg2 for i in out3[i]]
Results = cudf.DataFrame()
Results = df.apply_rows(kernel, incols=['in1','in2'], outcols=dict(out1='float', out2='float64', out3='float64', out4='float'), kwargs=dict(kwarg1=0.1, kwarg2=0.33))
Here is the error message (perhaps if I get the dtype correct for x and out3, this will resolve all of the errors):
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/cudf/dataframe/dataframe.py", line 2707, in apply_rows
self, func, incols, outcols, kwargs, cache_key=cache_key
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/cudf/utils/applyutils.py", line 64, in apply_rows return applyrows.run(df)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/cudf/utils/applyutils.py", line 128, in run self.launch_kernel(df, bound.args, **launch_params)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/cudf/utils/applyutils.py", line 152, in launch_kernel self.kernel[blkct, blksz](*args)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/cuda/compiler.py", line 806, in __call__ kernel = self.specialize(*args)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/cuda/compiler.py", line 817, in specialize kernel = self.compile(argtypes)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/cuda/compiler.py", line 833, in compile **self.targetoptions)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler_lock.py", line 32, in _acquire_compile_lock return func(*args, **kwargs)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/cuda/compiler.py", line 62, in compile_kernel
cres = compile_cuda(pyfunc, types.void, args, debug=debug, inline=inline)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler_lock.py", line 32, in _acquire_compile_lock, return func(*args, **kwargs)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/cuda/compiler.py", line 51, in compile_cuda, locals={})
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler.py", line 972, in compile_extra, return pipeline.compile_extra(func)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler.py", line 390, in compile_extra, return self._compile_bytecode()
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler.py", line 903, in _compile_bytecode, return self._compile_core()
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler.py", line 890, in _compile_core, res = pm.run(self.status)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler_lock.py", line 32, in _acquire_compile_lock, return func(*args, **kwargs)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler.py", line 266, in run
raise patched_exception
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler.py", line 257, in run
stage()
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler.py", line 515, in stage_nopython_frontend self.locals)
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/compiler.py", line 1124, in type_inference_stage, infer.propagate()
File "/anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/typeinfer.py", line 927, in propagate, raise errors[0]
numba.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Invalid use of Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x7f2679e6f9e8>) with argument(s) of type(s): (array(float64, 1d, A), array(float64, 1d, A), array(float64, 1d, A), array(float64, 1d, A), array(float64, 1d, A), array(float64, 1d, A), float64, float64) * parameterized
In definition 0:
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Untyped global name 'x': cannot determine Numba type of <class 'numba.ir.UndefinedType'>
File "<stdin>", line 2:
<source missing, REPL/exec in use?>
raised from /anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/typeinfer.py:1254
In definition 1:
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Untyped global name 'x': cannot determine Numba type of <class 'numba.ir.UndefinedType'>
File "<stdin>", line 2:
<source missing, REPL/exec in use?>
raised from /anaconda3/envs/rapidsAI/lib/python3.7/site-packages/numba/typeinfer.py:1254
This error is usually caused by passing an argument of a type that is unsupported by the named function.
[1] During: resolving callee type: Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x7f2679e6f9e8>)
[2] During: typing of call at <string> (11)
File "<string>", line 11:
<source missing, REPL/exec in use?>
The code that works is below. Some of your lines are incompatible with cudf:
Using i alone and not for indexing does not work. It is always zero. Therefore out1 is also zeros
Classes from sklearn are not compatible with numba nopython mode. This holds true for any library that numba does not specifically support. I do not know of any library that includes kernel density estimation that is supported in numba. Numpy is supported, but it does not have a kernel density estimation.
df.apply_rows() does not allow to apply a function to multiple rows, which you need in order to calculate kernel density. You probably need to use a df.apply_chunks().
To implement a kernel density estimation you will need:
Use df.apply_chunks()
Create a custom function that will be calculating kernel density. You could use parts of this code to create your function: KernelDensity source code
The custom function should be able to apply a kernel to a np.array to calculate the value for every window
apply_chunks() function should be set up so that the chuncks are rolling windows
Code:
import cudf, math
import numpy as np
df = cudf.DataFrame()
nelem = 10
df['in1'] = np.arange(nelem) * 1.5
df['in2'] = np.arange(nelem) * 1.45
#Define input columns for the kernel
in1 = df['in1']
in2 = df['in2']
def kernel(in1, in2, out1, out2, out3, out4, kwarg1, kwarg2):
for i, (x, y) in enumerate(zip(in1, in2)):
out1[i] = math.tan(float(i))
out2[i] = out1[i]
out3[i] = 1 #((KernelDensity(kernel='gaussian', bandwidth=kwarg1).fit(out2[i])).score_samples(out2[i]))
out4[i] = out3[i] >= kwarg2
Results = cudf.DataFrame()
Results = df.apply_rows(kernel, incols=['in1','in2'], outcols=dict(out1=np.float64, out2=np.float64, out3=np.float64, out4=np.float64), kwargs=dict(kwarg1=0.1, kwarg2=0.33))
enter image description hereI'm learning how to using python to draw some 3D pictures.
When I typing plt.show(), there will be an error.
ValueError: max() arg is an empty sequence
However, I have tried run it on IDLE and it didn't have an error.
What should I do to fix this problem when using PyCharm, really appreciate for helping.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
x = np.linspace(-6 * np.pi, 6 * np.pi, 1000)
y = np.sin(x)
z = np.cos(x)
fig = plt.figure()
ax = Axes3D(fig)
ax.plot(x, y, z)
plt.show()
I tried it in Python Console, only when I run plt.show() there will be an error.
[<mpl_toolkits.mplot3d.art3d.Line3D at 0x111b09c88>]
plt.show()
/Users/harry./Library/Python/3.6/lib/python/site-packages/matplotlib/figure.py:1743: UserWarning: This figure includes Axes that are not compatible with tight_layout, so its results might be incorrect.
warnings.warn("This figure includes Axes that are not "
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3267, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-6-1eb00ff78cf2>", line 1, in <module>
plt.show()
File "/Users/harry./Library/Python/3.6/lib/python/site-packages/matplotlib/pyplot.py", line 253, in show
return _show(*args, **kw)
File "/Applications/PyCharm.app/Contents/helpers/pycharm_matplotlib_backend/backend_interagg.py", line 27, in __call__
manager.show(**kwargs)
File "/Applications/PyCharm.app/Contents/helpers/pycharm_matplotlib_backend/backend_interagg.py", line 99, in show
self.canvas.show()
File "/Applications/PyCharm.app/Contents/helpers/pycharm_matplotlib_backend/backend_interagg.py", line 64, in show
self.figure.tight_layout()
File "/Users/harry./Library/Python/3.6/lib/python/site-packages/matplotlib/figure.py", line 1753, in tight_layout
rect=rect)
File "/Users/harry./Library/Python/3.6/lib/python/site-packages/matplotlib/tight_layout.py", line 326, in get_tight_layout_figure
max_nrows = max(nrows_list)
ValueError: max() arg is an empty sequence
I have a numpy array with following dimensions :
(1611216, 2)
I tried reshaping it to (804, 2004)
using :
df = np.reshape(df, (804, 2004))
but it gives an error :
Traceback (most recent call last):
File "Z:/Seismic/Geophysical/99_Personal/Abhishake/RMS_Machine_learning/RMS_data_analysis.py", line 19, in <module>
df = np.reshape(df, (804, 2004))
File "C:\python36\lib\site-packages\numpy\core\fromnumeric.py", line 232, in reshape
return _wrapfunc(a, 'reshape', newshape, order=order)
File "C:\python36\lib\site-packages\numpy\core\fromnumeric.py", line 57, in _wrapfunc
return getattr(obj, method)(*args, **kwds)
ValueError: cannot reshape array of size 3222432 into shape (804,2004)
df = np.reshape(df, (804, 2004))
but it gives an error :
Traceback (most recent call last):
File "Z:/Seismic/Geophysical/99_Personal/Abhishake/RMS_Machine_learning/RMS_data_analysis.py", line 19, in
df = np.reshape(df, (804, 2004))
File "C:\python36\lib\site-packages\numpy\core\fromnumeric.py", line 232, in reshape
return _wrapfunc(a, 'reshape', newshape, order=order)
File "C:\python36\lib\site-packages\numpy\core\fromnumeric.py", line 57, in _wrapfunc
return getattr(obj, method)(*args, **kwds)
ValueError: cannot reshape array of size 3222432 into shape (804,2004)
You cannot reshape (1611216, 2) numpy array into (804, 2004).
It is because 1611216 x 2 = 3,222,432 and 804 x 2004 = 1,611,216. The difference in size of the two array is very large. I think you have to come up with another set of dimensions for your numpy array and that would depend on how you want to use your array.
Hint : (1608, 2004) will be a valid reshape.
I'm playing around with PyTorch with the aim of learning it, and I have a very dumb question: how can I multiply a matrix by a single vector?
Here's what I've tried:
>>> import torch
>>> a = torch.rand(4,4)
>>> a
0.3162 0.4434 0.9318 0.8752
0.0129 0.8609 0.6402 0.2396
0.5720 0.7262 0.7443 0.0425
0.4561 0.1725 0.4390 0.8770
[torch.FloatTensor of size 4x4]
>>> b = torch.rand(4)
>>> b
0.1813
0.7090
0.0329
0.7591
[torch.FloatTensor of size 4]
>>> a.mm(b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: invalid argument 2: dimension 1 out of range of 1D tensor at /Users/soumith/code/builder/wheel/pytorch-src/torch/lib/TH/generic/THTensor.c:24
>>> a.mm(b.t())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: t() expects a 2D tensor, but self is 1D
>>> b.mm(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: matrices expected, got 1D, 2D tensors at /Users/soumith/code/builder/wheel/pytorch-src/torch/lib/TH/generic/THTensorMath.c:1288
>>> b.t().mm(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: t() expects a 2D tensor, but self is 1D
On the other hand, if I do
>>> b = torch.rand(4,2)
then my first attempt, a.mm(b), works fine. So the problem is just that I'm multiplying a vector rather than a matrix --- but how can I do this?
You're looking for
torch.mv(a,b)
Note that for the future, you may also find torch.matmul() useful. torch.matmul() infers the dimensionality of your arguments and accordingly performs either dot products between vectors, matrix-vector or vector-matrix multiplication, matrix multiplication or batch matrix multiplication for higher order tensors.
This is a self-answer to supplement #mexmex's correct and useful answer.
In PyTorch, unlike numpy, 1D Tensors are not interchangeable with 1xN or Nx1 tensors. If I replace
>>> b = torch.rand(4)
with
>>> b = torch.rand((4,1))
then I will have a column vector, and matrix multiplication with mm will work as expected.
But this is not necessary, because as #mexmex points out there is an mv function for matrix-vector multiplication, as well as a matmul function that dispatches the appropriate function depending on the dimensions of its input.
I am trying to run the detect_ts function from pyculiarity package but getting this error on passing a two-dimensional dataframe in python.
>>> import pandas as pd
>>> from pyculiarity import detect_ts
>>> data=pd.read_csv('C:\\Users\\nikhil.chauhan\\Desktop\\Bosch_Frame\\dataset1.csv',usecols=['time','value'])
>>> data.head()
time value
0 0 32.0
1 250 40.5
2 500 40.5
3 750 34.5
4 1000 34.5
>>> results = detect_ts(data,max_anoms=0.05,alpha=0.001,direction = 'both')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Windows\System32\pyculiar-0.0.5\pyculiarity\detect_ts.py", line 177, in detect_ts
verbose=verbose)
File "C:\Windows\System32\pyculiar-0.0.5\pyculiarity\detect_anoms.py", line 69, in detect_anoms
decomp = stl(data.value, np=num_obs_per_period)
File "C:\Windows\System32\pyculiar-0.0.5\pyculiarity\stl.py", line 35, in stl
res = sm.tsa.seasonal_decompose(data.values, model='additive', freq=np)
File "C:\Anaconda3\lib\site-packages\statsmodels\tsa\seasonal.py", line 88, in seasonal_decompose
trend = convolution_filter(x, filt)
File "C:\Anaconda3\lib\site-packages\statsmodels\tsa\filters\filtertools.py", line 303, in convolution_filter
result = _pad_nans(result, trim_head, trim_tail)
File "C:\Anaconda3\lib\site-packages\statsmodels\tsa\filters\filtertools.py", line 28, in _pad_nans
return np.r_[[np.nan] * head, x, [np.nan] * tail]
TypeError: 'numpy.float64' object cannot be interpreted as an integer
The problem with your code might be that np.nan is a float64 type value but the np.r_[] expects comma separated integers within its square brackets.
Hence you need to convert them to integer type first.
But we have another problem here.
return np.r_[[(int)(np.nan)] * head, x, [(int)(np.nan)] * tail]
This should have solved the problem in ordinary cases....
But it wont work in this case, as NaN cannot be type casted to integer type.
ValueError: cannot convert float NaN to integer
Thus, no proper solution can be suggested unless we know what you are trying to do here. Try providing a bit more details about your code and you are sure to get help from us.
:-)