How to use the same kde used by seaborn - python-3.x

I'm trying to use kernel density estimator to obtain the distribution of my data. Using the seaborn, I can simply call sns.kdeplot(temp, shade=True) and that will plot the kde or the distribution of my data. However, with seaborn, I cannot obtain scores for new data points. However, if I'm using the sklearn library, I can simply call kde.score_samples(data). Therefore, how can I achieve the same thing with seaborn? Or, is there a way I can return the kde obtained from seaborn?
Any help is much appreciated!

After looking through the documentation https://github.com/mwaskom/seaborn/blob/a9577e705023873de7c7bbf3e9b6ae0dc1880b51/seaborn/distributions.py#L2641, the bandwidth of the kernel is calculated as: bw = stats.gaussian_kde(a).scotts_factor() * a.std(ddof=1) Consequently, I've used the bw with kde from sklearn as: kde = KernelDensity(kernel='gaussian', bandwidth=bw).fit(temp.reshape(-1, 1)).
Thus, now I'm able to call: kde.score_samples(x_axis.reshape(-1, 1)).

Related

Does fitting Weibull distribution to data using scipy.stats perform poor?

I am working on fitting Weibull distribution on some integer data and estimating relevant shape, scale, location parameters. However, I noticed poor performance of scipy.stats library while doing so.
So, I took a different direction and checked the fit performance by using the code below. I first create 100 numbers using Weibull distribution with parameters shape=3, scale=200, location=1. Subsequently, I estimate the best distribution fit using fitter library.
from fitter import Fitter
import numpy as np
from scipy.stats import weibull_min
# generate numbers
x = weibull_min.rvs(3, scale=200, loc=1, size=100)
# make them integers
data = np.asarray(x, dtype=int)
# fit one of the four distributions
f = Fitter(data, distributions=["gamma", "rayleigh", "uniform", "weibull_min"])
f.fit()
f.summary()
I expect the best fit to be Weibull distribution. I have tried re-running this test. Sometimes Weibull fit is a good estimate. However, most of the time Weibull fit is reported as the worst result. In this case, the estimated parameters are = (0.13836651040093312, 66.99999999999999, 1.3200752378443505). I assume these parameters correspond to shape, scale, location in order. Below is the summary of the fit procedure.
$ f.summary()
sumsquare_error aic bic kl_div
gamma 0.001601 1182.739756 -1090.410631 inf
rayleigh 0.001819 1154.204133 -1082.276256 inf
uniform 0.002241 1113.815217 -1061.400668 inf
weibull_min 0.004992 1558.203041 -976.698452 inf
Additionally, the following plot is produced.
Also, Rayleigh distribution is a special case of Weibull with shape parameter = 2. So, I expect the resulting Weibull fit to be at least as good as Rayleigh.
Update
I ran the tests above on Linux/Ubuntu 20.04 machine with numpy version 1.19.2 and scipy version 1.5.2. The code above seems to run as expected and return proper results for Weibull distribution on a Mac machine.
I have also tested fitting a Weibull distribution on data x generated above on the Linux machine by using an R library fitdistrplus as:
fit.weib <- fitdist(x, "weibull")
and observed that the estimated shape and scale values are found to be very close to the initially given values. The best guess so far is that the problem is due to some Python-Ubuntu bug/incompatibility.
I can be considered as a newbie in this area. So, I am wondering, am I doing something wrong here? Or is this result somehow expected? Any help is greatly appreciated.
Thank you.
Library fitter doesn't allow to specify parameters for distributions such as a, loc, etc. And strangely, Mac produces better fit while Linux heavily pains the results for best fit, for the same version of Numpy and Scipy. Underlying reasons may include different BLAS-LAPACK algorithms designed for Linux and Mac, https://stackoverflow.com/a/49274049/6806531, or weibull_min may not initialize parameter a = 1 which is discussed online, or default floating-point accuracy. However, one can solve the error inside fitter library. Knowing the fact that weib_min is expon_weib with parameter a is fixed as 1, changing the run function inside of _timed_run function in fitter.py as
def run(self):
try:
if distribution == "exponweib":
self.result = func(args,floc=0,fa = 1, **kwargs)
else:
self.result = func(args, floc=0, **kwargs)
except Exception as err:
self.exc_info = sys.exc_info()
and using exponweib as weib_min gives nearly same results as R fitdist.
I am not familiar with the Fitter library, but in order to draw some conclusions I would suggest:
Retry your code, but by taking size=10,000. In this case, there are sufficient datapoints for the fitting methods to utilize. Theoretically, you would then expect the Weibull to deliver the best fit.
I noticed that the location parameter can sometimes be a pain. You could try to run your fits by fixing the location parameter with floc=1 (i.e. equal to your sampling parameter for location). What do you get? Aditionally, FYI, with MLE, it suffices to take loc=min(x), where x is your dataset. For the exponential distribution, this in fact the MLE of the location parameter. For other distributions I am not sure, but I wouldn't be surprised if this holds for other distributions as well. This would reduce the fitting procedure with 1 parameter.
Lastly, I noticed that if you take small values for location/scale/shape for some distributions, the functions logpdf and logcdf of scipy.stats distributions result in np.inf values. In this scenario, you could perhaps use the Powell optimization algorithm and set bounds on the values of your parameters.

Medical image processing using DICOM images

Im new to medical image processing. how can i convert 3D DICOM medical images to numerical matrix format using either python or c++?
Another option, if you really want "3D" dicom image support (ie CT/MR/NM/PET 3d series - as opposed to purely 2D image handling) and you want do anything really 3d related and/or more complex, you might want to check out simple ITK.
That gives you very powerful true 3d handling and is fast (it's wrapped around complied C). It includes, for example, full 3D image registration and various filters/tools etc.
It can read an entire series at once and automatically create a fully spatially aware 3D numpy array for you (ie it takes care of processing all the dicom 3D spatial orientation/spacing etc tags for you)
However, because it's a lot more powerful than pydicom, it also has a much steeper learning curve - but does have many examples and online jupyter notebook tutorials.
...so, depending on your needs it might be good for you. However, if you only really want basic 2d image-at-a-time type processing, pydicom is the way to go.
You can use pydicom package in python. You can install it in python by:
pip install pydicom
Here is a simple example of reading DICOM images and converting to numpy array:
import os
import pydicom
import numpy as np
dicom_dir = your_dicom_folder_of_slices
file_names = os.listdir(dicom_dir)
file_names.sort()
dicom_data = []
for name in file_names:
path = os.path.join(dicom_dir, name)
dicom_data.append(pydicom.read_file(path))
array = [data.pixel_array for data in dicom_data]
array = np.stack(array, axis=-1) # or 0 if 'channel_first'
Here is a detailed example.
I prefer using SimpleElastix for medical image processing. it has many methods for segmentations and many other helpful methods. it is available in both python and C++. In my experience SimpleElastix handled DICOMS and niftis better than other Packages.

Point Cloud triangulation using marching-cubes in Python 3

I'm working on a 3D reconstruction system and want to generate a triangular mesh from the registered point cloud data using Python 3. My objects are not convex, so the marching cubes algorithm seems to be the solution.
I prefer to use an existing implementation of such method, so I tried scikit-image and Open3d but both the APIs do not accept raw point clouds as input (note that I'm not expert of those libraries). My attempts to convert my data failed and I'm running out of ideas since the documentation does not clarify the input format of the functions.
These are my desired snippets where pcd_to_volume is what I need.
scikit-image
import numpy as np
from skimage.measure import marching_cubes_lewiner
N = 10000
pcd = np.random.rand(N,3)
def pcd_to_volume(pcd, voxel_size):
#TODO
volume = pcd_to_volume(pcd, voxel_size=0.05)
verts, faces, normals, values = marching_cubes_lewiner(volume, 0)
open3d
import numpy as np
import open3d
N = 10000
pcd = np.random.rand(N,3)
def pcd_to_volume(pcd, voxel_size):
#TODO
volume = pcd_to_volume(pcd, voxel_size=0.05)
mesh = volume.extract_triangle_mesh()
I'm not able to find a way to properly write the pcd_to_volume function. I do not prefer a library over the other, so both the solutions are fine to me.
Do you have any suggestions to properly convert my data? A point cloud is a Nx3 matrix where dtype=float.
Do you know another implementation [of the marching cube algorithm] that works on raw point cloud data? I would prefer libraries like scikit and open3d, but I will also take into account github projects.
Do you know another implementation [of the marching cube algorithm] that works on raw point cloud data?
Hoppe's paper Surface reconstruction from unorganized points might contain the information you needed and it's open sourced.
And latest Open3D seems to be containing surface reconstruction algorithms like alphaShape, ballPivoting and PoissonReconstruction.
From what I know, marching cubes is usually used for extracting a polygonal mesh of an isosurface from a three-dimensional discrete scalar field (that's what you mean by volume). The algorithm does not work on raw point cloud data.
Hoppe's algorithm works by first generating a signed distance function field (a SDF volume), and then passing it to marching cubes. This can be seen as an implementation to you pcd_to_volume and it's not the only way!
If the raw point cloud is all you have, then the situation is a little bit constrained. As you might see, the Poisson reconstruction and Screened Poisson reconstruction algorithm both implement pcd_to_volume in their own way (they are highly related). However, they needs additional point normal information, and the normals have to be consistently oriented. (For consistent orientation you can read this question).
While some Delaunay based algorithm (they do not use marching cubes) like alphaShape and this may not need point normals as input, for surfaces with complex topology, it's hard to get a satisfactory result due to orientation problem. And the graph cuts method can use visibility information to solve that.
Having said that, if your data comes from depth images, you will usually have visibility information. And you can use TSDF to build a good surface mesh. Open3D have already implemented that.

Matlab, is there any way to manipulate random variable

In Maple, there is some feature that allows you to calculate the pdf of a function of a random variable. For example, if X is exponentially distributed, and you want to know the distribution of X^2, then there is a function that will do that for you.
My question is , is there a functionality in matlab that allows you to do so? I have looked through the matlab's guide, but I didn't see it.
The Statistics toolbox includes many probability distributions for you to choose from, both parametric and non-parametric distributions. For each it provides functions for PDF, CDF, fitting, random number generation, etc..
I suggest you start with the "Distribution Fitting app": dfittool.
EDIT:
In addition, MuPAD has support for a number of distributions, which you can manipulate symbolically. Example:
The function intlib::changevar might be of interest here, though it seems intended for integrals...
Also, if you're interested in getting the values of the PMF, or discrete PDF, then, given x some RV with some distribution,
my_pmf = hist(x)/sum(x);
So try,
doc hist

2d geometry drawing tool

I'm looking for some tool/library that is able to draw simple 2d geometries from text file or programatically. I already found List of interactive geometry software but that not quite what I'm looking for. I would prefer something more similar in usage to graphviz or gnuplot. I already wrote some scripts for gnuplot but this tool has been designed for different purposes. Required functionality:
support for different kind of 2D geometries: points, segments, lines, circles, polygons
simple input type format maybe similar to postgis Well Known Text
support for objects additional data like tags and colors definition
output in common image format or some kind of interactive GUI (with zoom in/out and select object)
configurable grid
autoscale or draw in defined area
I will use it for testing geometry algorithms and don't want to reinvent the wheel.
Matplotlib. I'm not familiar with all the aspects of this Python library but I've heard it is pretty good.
To quote their introduction,
matplotlib is a python 2D plotting
library which produces publication
quality figures in a variety of
hardcopy formats and interactive
environments across platforms.
matplotlib can be used in python
scripts, the python and ipython shell
(ala MATLAB®* or Mathematica®†), web
application servers, and six graphical
user interface toolkits.
matplotlib tries to make easy things
easy and hard things possible. You can
generate plots, histograms, power
spectra, bar charts, errorcharts,
scatterplots, etc, with just a few
lines of code. For a sampling, see the
screenshots, thumbnail gallery, and
examples directory
(source: sourceforge.net)
>
For example, using "ipython -pylab" to
provide an interactive environment, to
generate 10,000 gaussian random
numbers and plot a histogram with 100
bins, you simply need to type
x = randn(10000)
hist(x, 100)
For the power user, you have full
control of line styles, font
properties, axes properties, etc, via
an object oriented interface or via a
set of functions familiar to MATLAB
users. The pylab mode provides all of
the pyplot plotting functions listed
below, as well as non-plotting
functions from numpy and
matplotlib.mlab.
Maybe dia, with it's SVG output option is what you're looking for? It can be scripted in Python.

Resources