I've started doing more and more work with the new Mathematica statistics and data analysis features.
I attended the "Statistics & Data Analysis with Mathematica" online seminar on Tuesday (great presentation, I highly recommend it) but I've run into some problems that I hope someone on this forum might have a few moments to consider.
I've created a rather extensive notebook to streamline my data analysis, call it "AnalysisNotebook". It outputs an extensive series of charts and data including: histograms, PDF and CDF plots, Q-Q plots, plots to study tail fit, hypothesis test data, etc.
This works great as long as I stay with Mathematica's off-the-shelf distributions and probably works fine for simple MixtureDistributions and even ParameterMixtureDistributions as for these Mathematica can likely figure out the moments and PDF and CDF, FindDistributionParameters, etc by breaking the mixtures down into pieces.
I run into trouble when I attempt to define and use even a simple TransformedDistribution i.e.,
LogNormalNormalDistribution[Gamma_, Sigma_, Delta_] :=
TransformedDistribution[ u*v + Delta,
{Distributed[ u, LogNormalDistribution[ Log[Gamma], Sigma] ],
Distributed[ v, NormalDistribution[0, Sqrt[2]]}
];
I'd like to do a lot of things along the lines of such Transformed Distributions. I appreciate the challenge something like this presents (some of which I've learned on this forum - thank you all):
They may not have closed forms;
PDF and CDF calculation may need interpolation, work-arounds, or custom approaches;
FindDistributionParameters and DistributionFitTest won't know how to deal with this kind of thing.
Basically the standard things one would want to use really don't/can't work and one can't fairly expect them to do so.
One can write custom code to do these sorts of things (again this forum has helped me a lot), but then incorporating all of the complexity of custom alternatives into my AnalysisNotebook, just seems silly. The AnalysisNotebook would grow with each new custom function.
It would help me immensely in this effort if I could write my custom versions of PDF, CDF, FindDistributionParameters, DistributionFitTest and anything else I might need to standards that the more general built in versions would simply call seamlessly. This way, something like my AnalysisNotebook could remain simple and uncluttered, a standard component in my tool box. I could spend my time working on the math rather than plumbing, if you take my meaning.
To clarify what I mean by this, similar to how one can define versions of a function to do different things (use different numbers of arguments or other kinds of situational awareness), Mathematica must do something similar for the functions that use distributions as arguments to know which solution to use for a particular built-in distribution. I want the ability to add or extend the functionality of PDF[], CDF[], FindDistributionParameters[], DistributionFitTest[] and related functions at that level -- to add functionality for custom distributions and their required supporting code, which the built in functions would/could call seamlessly.
Perhaps just a dream, but if anyone knows of any way I could approach this, I'd very much appreciate your feedback.
EDIT- The kind of problems I've encountered:
The following code never completes execution
r1 = RandomVariate[LogNormalNormalDistribution[0.01, 0.4, 0.0003], 1000];
FindDistributionParameters[r1, LogNormalNormalDistribution[gamma, sigma, delta]]
To work around this I wrote the following function
myLNNFit[data_] := Module[{costFunction, moments},
moments = Moment[EmpiricalDistribution[data], #] & /# Range[5];
costFunction[gamma_, sigma_, delta_] =
Sqrt#Total[((Moment[LogNormalNormalDistribution[gamma, sigma, delta],#]&/#Range[5]) - moments)^2];
NMinimize[{costFunction[gamma, sigma, delta], gamma > 0, sigma > 0}, {gamma, sigma, delta}] ]
This works fine by itself, but doesn't play well with everything else.
You can use TagSet to specify the symbol to which you want to associate a definition. This lets you define the PDF of a distribution even though PDF is Protected. Here's a trivial example. Note that TriangleWave is a built-in symbol, and TriangleDistribution is something I just made up. This fails:
PDF[TriangleDistribution[x_]] := TriangleWave[x]
This works:
TriangleDistribution /: PDF[TriangleDistribution[x_]] := TriangleWave[x]
Now you can do:
Plot[PDF[TriangleDistribution[x]], {x, 0, 1}]
Dear Jarga, the following tutorial in Mathematica documentation describes now you would enable random number generation for your distribution, look near the bottom of this document for a section 'Defining Distributional Generators'.
It is quite similar to what Joe suggested. You would need to define
In[1]:= Random`DistributionVector[
LogNormalNormalDistribution[gamma_, sigma_, delta_], len_, prec_] ^:=
RandomVariate[LogNormalDistribution[Log[gamma], sigma], len,
WorkingPrecision -> prec]*
RandomVariate[NormalDistribution[0, Sqrt[2]], len,
WorkingPrecision -> prec] + delta
In[2]:= RandomVariate[
LogNormalNormalDistribution[0.01, 0.4, 0.0003], 5]
Out[2]= {-0.0013684, 0.00400979, 0.00960139, 0.00524952, 0.012049}
I am not aware of any documented way to insert a new distribution into the estimation framework. The hypothesis testing should work if CDF is defined for your distribution and works correctly.
Related
I'm working with the VTK library in C++.
I have a mesh given as an unstructured grid and certain data given on integration points of gaussian quadrature on the cells (which was created by an external solver). For the sake of simplicity, let's assume that we talk about scalar data.
I also have a tool which displays VTK data graphically. What I want is to display the mentioned data with that tool, simply as interpolated/extrapolated scalar data on the whole grid.
My question is, is there something native to VTK with which I can give the mesh the scalar data at the integration points in some way and VTK handles the interpolation and extrapolation?
I mean, I could write an algorithm that processes the data, creates a new grid in which the cells do not share nodes (as the extrapolated values might not be continuous there), extrapolate the scalars to those nodes for each cell and then display that. However, by this I take away from the native possibilities of the VTK library (which seems to be quite strong in most other regards) and I don't want to reinvent the wheel anyway.
From https://vtk.org/Wiki/images/7/78/VTK-Quadrature-Point-Design-Doc.pdf, I am aware that there is the vtkQuadratureSchemeDefinition class and I think I know how to handle it, and I noticed vtkQuadraturePointInterpolator, which seems to do the opposite of what I'm searching for - interpolation to the integration points rather than extrapolating from them.
The newest entry in the VTK wiki otherwise seems to be https://vtk.org/Wiki/VTK/VTK_integration_point_support, which seems to be quite old, given that it pleads for the existence of some sort of quadrature point support in general, which currently already exists.
Also there is a question in the VTK mailing list which looks just like my question here:
https://public.kitware.com/pipermail/vtkusers/2013-January/078077.html, which seems to be without an answer.
Likewise, the issue https://gitlab.kitware.com/vtk/vtk/issues/17124 also seems to be about what I want to do, and it might hint at it currently not being possible, but it existing as an issue does not imply that it is not already solved (especially with no asignee to the issue).
I apologize if the answer was already somewhere, searching the interwebs did not return me the answer(s) I was looking for.
Situation : I have a small graph (a set of Edges and Nodes that is) - Now I want to display it in an interactive manner, and I would like to manipulate the display styles and symbols that are used for edges and nodes, programetically.
Hence kgraphviewer wont work - i want to do it programetically as stated.
I noticed VTK, comes with a lot of built is graph drawing algorithms. But seems to be a really large one.
Question : What are some alternatives to VTK ? Graphviz is probably one, but I can not confirm that graphviz comes with all the graph drawing algorithms as VTK - any other possibly smaller options, with all the built in graphs?
Side note : Some systems uses a static drawign widget, i.e. once the drawing is displayed, in an widget that the system comes with, you can not interact with the drawing using your mouse. The GNU implementation of IDL, GDL, has this problem. I would Like to avoid this.
Yes, I agree with you regarding the VTK's consideration: it's a powerful toolkit but it is (maybe) too much "big", and it's not so easy to configure a working VTK environment.
I don't have a great experience in the field of graphs, but a search leads to this other StackOverflow post. I think that Prefuse, listed under the Java section, could be of some interest. C++ itself seems to have a lot of choices, listed in various answers, here. I hope that it will help.
I used Gephi public domain graph visualization software on Linux. It was a quick way to get a 3 D. Picture which can be modified with line thickness to show an edge weight - good for comm network work.
I have come up with an idea for an audio project and it looks like Go is a useful language for implementing it. However, it requires the ability to apply filters to incoming audio, and Go doesn't appear to have any sort of audio processing package. I can use cgo to call C code, but every signal processing library I find uses C++ classes which cgo cannot handle. It looks like libsox may work. Are there any others?
What libsox can provide and what I need is to take an incoming audio stream and divide it into frequency bands. If I can do this while only reading the file once, then bonus! I am not sure if libsox can do this.
If you want to use a C++ library you could try SWIG, but you'll have to get it out of Subversion. The next release (2.0.1) will be the first released version to support Go. In my experience the Go support is still a little rough, but then again the library I tried to wrap is a monster.
Alternatively, you could still create your own bindings through cgo using the same method SWIG does, but it will be painful and tedious. The basic idea is that you first create a C wrapper, then let cgo create a Go wrapper around your C wrapper.
I don't know anything about signal processing or libsox, though. Sorry.
There is a relatively new project called ZikiChombo
which contains so far some basic DSP functionality geared toward audio, see here
The dsp part of the project has filters on its roadmap, but they are not yet there. On the other hand some infrastructure for implementing filters, such as real fft and block convolution is there. Meaning that if you want FIRs, and can compute the coefficients by some other means, you can run them via convolution in zc currently with sound in real time.
Basic filtering design support (FIR,Biquad), for example using an ideal filter as a starting point will be the next step for zc. There are numerous small self-contained open source projects for basic and more advanced FIR and IIR filter design, most notably Iowa Hills which might be more accessible than a larger project to compute filter coefficients outside of Go.
More advanced filtering such as Butterworth, and filters based on polynomial solving and the bilinear transform will take more time for zc.
There is also some software defined radio Golang projects with some code related to filtering, sorry don't have the links offhand but a search for the topic may lead you to them.
Finally, there is a gonum Fourier package which also supplies fft.
So Go is growing some interesting and potentially stuff in this domain, but still has quite a ways to go compared to older projects (which are mostly in C/C++, or perhaps with a Python wrapper via numpy for example).
I am using this pure golang repo to perform Fourier Transforms with good effect
https://github.com/mjibson/go-dsp
just supply the FFT call with a
import (
"github.com/mjibson/go-dsp/fft" // https://github.com/mjibson/go-dsp
)
var audio_wave []float64
// ... now populate audio_wave with your audio PCM samples
var complex_fft []complex128
// input time domain ... output frequency domain of equally spaced freq bins
complex_fft = fft.FFTReal(audio_wave)
My realtime app generates a data log: 100 words of data #10Khz. I need to analyze it and produce some plots of the results. There are intermediate calculations involved - I need to take some differences, averages, etc. Excel would work fine, except for:
the 32000 item limit on graph data series is too small - that's only 3 seconds of data.
the glacial speed at which it processes changes to graphs containing large data series is unbearable.
What are good alternatives to Excel for manipulating and plotting large quantities of data? I'm looking for something interactive, not a library.
For this sort of stuff we typically roll our own, but I know that isn't the solution you want. Can you use a good quality database (eg Oracle) to do the manipulation, then maybe put the summarized data back into Excel for the plotting? I believe Excel will link to databases these days, so you could make it quite automated.
Otherwise there are statistical tools like [SAS][1], but get your cheque book out first.
[1]: http://www.sas.com/technologies/analytics/statistics/stat/index.html SAS
There are also several free tools for analysing and plotting (see below). But I am not sure whether they have components to handle data in real-time.
R (similar to SAS) for statistical computations
octave (similar to Matlab) for mathematical computations
R (for data manipulation) and its ggplot2 module for creating sexy graphs. Incredibly useful.
If you need real-time graphics, then I'd look at building something using matplotlib. It's a Python module, and you can link it to R using rpy2 if required.
In particle and nuclear physics the big tool is ROOT, which I have seen used in a "update every two seconds as the data comes in" mode with a lot of data and a modest amount of intermediate processing.
Mind you, the student who wrote that module was a very slick programmer, and it took a while to shake the bugs out, even so.
ROOT is available for free, and provides all kinds of tools and support.
We have many WIDE html grids which scroll horizontally within a DIV in our web application.
I would like to find the best strategy for printing these grids on a portrait A4 page.
What I would like to know is what is the best way to present/display grids/data like this.
This question is not HTML specific, I am looking for design strategies and not CSS #page directives.
There's actually a whole book dedicated (amongst other things) to fast methods for the computation of \pi: 'Pi and the AGM', by Jonathan and Peter Borwein (available on Amazon).
I studied the AGM and related algorithms quite a bit: it's quite interesting (though sometimes non-trivial).
Note that to implement most modern algorithms to compute \pi, you will need a multiprecision arithmetic library (GMP is quite a good choice, though it's been a while since I last used it).
The time-complexity of the best algorithms is in O(M(n)log(n)), where M(n) is the time-complexity for the multiplication of two n-bit integers (M(n)=O(n log(n) log(log(n))) using FFT-based algorithms, which are usually needed when computing digits of \pi, and such an algorithm is implemented in GMP).
Note that even though the mathematics behind the algorithms might not be trivial, the algorithms themselves are usually a few lines of pseudo-code, and their implementation is usually very straightforward (if you chose not to write your own multiprecision arithmetic :-) ).
I guess it really depends on what your purpose is.
In a book format: I usually try span two facing pages.
For a conference or poster: Find an extra wide printer and print it out on a large sheet of paper.
Something more informal: Span regular pages and tape them together.
Powerpoint: Don't show the whole chart, they'll not be able to read the details anyways, just show the relevant information.