I implemented a multi-series line chart like the one given here by M. Bostock and ran into a curious issue which I cannot explain myself. When I choose linear interpolation and set my scales and axis everything is correct and values are well-aligned.
But when I change my interpolation to basis, without any modification of my axis and scales, values between the lines and the axis are incorrect.
What is happening here? With the monotone setting I can achieve pretty much the same effect as the basis interpolation but without the syncing problem between lines and axis. Still I would like to understand what is happening.
The basis interpolation is implementing a beta spline, which people like to use as an interpolation function precisely because it smooths out extreme peaks. This is useful when you are modeling something you expect to vary smoothly but only have sharp, infrequently sampled data. A consequence of this is that resulting line will not connect all data points, changing the appearance of extreme values.
In your case, the sharp peaks are the interesting features, the exception to the typically 0 baseline value. When you use a spline interpolation, you are smoothing over these peaks.
Here is a fun demo to play with the different types of line interpoations:
http://bl.ocks.org/mbostock/4342190
You can drag the data around so they resemble a sharp peak like yours, even click to add new points. Then, switch to a basis interpolation and watch the peak get averaged out.
Related
I am sketching out a new simulation that will involve thousands of ships moving around on Earth's oceans and interacting over long periods of time. So, lots of "intersection detection" for sensor and communications ranges, as well as region detection for various environmental conditions. We'll assume a spherical earth, not WGS84. This is an event-step simulation that spits out metrics, not a real time game or anything like that.
A question is to use Cartesian coordinates (Earth-Centered, Earth-Fixed) or Geodic/polar coordinates. With polar coordinates a ship's track would be internally represented as a series of lat/lon waypoints with times and a great circle paths between them. With a Cartesian representation the waypoints would be connected with polyline renderings of the great circle between them.
The reason this is a question is I suspect that by sticking to a Cartesian data model it becomes possible to use various geometry libraries that are performance tuned, and even offer up SIMD/GPU performance advantages. The polar coordinates would probably be the more natural way to proceed if writing everything from scratch. But I suspect that by keeping things Cartesian I will have greater access to better and faster libraries. Is this an invalid line of thought? Another consideration is that I know polar coordinate calculations tend to get really screwy when near the poles.
Just curious if somebody with experience could save me a whole lot of time prototyping some scenarios both ways.
It often works well to represent directions as unit vectors instead of angles. Rotation of a vector by another angle becomes a 2x2 or 3x3 matmul (efficient with SIMD, but still more expensive than an FP add of two numbers in radians), but you very rarely need sin/cos.
You may occasionally want atan2 to get an angle, but usually not inside tight loops.
Intersection-detection can be very efficient (with SIMD) for XYZ coordinates given another XYZ + range. I'm not sure how efficiently you could check which lat/lon pairs were within range of a given point, not a problem I've looked at.
IDK what kind of stuff you'd find in existing libraries, or what you'd want to do with it.
I have a data set that I receive from an outside source, and have no real control over.
The data, when plotted, shows two clumps of points with several sparse, irrelevant points. Here is a sample plot:
There is a clump of points on the left, clustered around (1, 16). This clump is actually part of a set of points that lies on (or near to) a line stretching from (1, 17.5) to (2.4, 13).
There is also an apparent curve from (1.75, 18) to (2.75, 12.5).
Finally, there are some sparse points above the second curve, around (2.5, 17).
Visually, it's not difficult to separate these groups of points. However, I need to separate these points within the data file into three groups, which I'll call Line, Curve, and Other (the Curve group is the one I actually need). I'd like to write a program that can do this reasonably well without needing to visually see the plot.
Now, I'm going to add a couple items that make this much worse. This is only a sample set of data. While the shapes of the curve and line are relatively constant from one data set to the next, the positions are not. These regions can (and do) shift, both horizontally and vertically. The only real constant is that there's a negative-slope line from the top-left to the bottom-right of the plot, an almost curve from the top-center to the bottom-right, and most of the sparse points are in the top-right corner, above the curve.
I'm on Linux, and I'm out of ideas. I can tell you the approaches that I've tried, though they have not done well.
First, I cleaned up the data set and sorted it in ascending order by x-coordinate. I thought that maybe the points were sorted in some sort of a logical way that would allow me to 'head' or 'tail' the data to achieve the desired result, but this was not the case.
I can write a code in anything (Python, Fortran, C, etc.) that removes a point if it's not within X distance of the previous point. This would be just fine, except that the scattering of the points is such that two points very near each other in x, are separated by an appreciable distance in y. It also doesn't help that the Line and Curve draw near one another for larger x-values.
I can fit a curve to a partial data set. When I sort the data by x-coordinate, for example, I can choose to only plot the first 30 points, or the last 200, or some set of 40 in the middle somewhere. That's not a problem. But the Line points tuck underneath the Curve points, which causes a problem.
If the Line points were fairly constant (which they're not), I could rotate my plot by some angle so that the Line is vertical and I can just look at the points to the right of that line, then rotate back. This may the best way to go about doing this, but in order to do that, I need to be able to isolate the linear points, which is more or less the essence of the problem.
The other idea that seems plausible to me, is to try to identify point density and split the data into separate files by those parameters. I think this is the best candidate for this problem, since it is independent of point location. However, I'm not sure how to go about doing this, especially because the Line and Curve do come quite close together for larger x-values (In the sample plot, it's x-values greater than about 2).
I know this does not exactly fall in with the request of a MWE, but I don't know how I'd go about providing a more classical MWE. If there's something else I can provide that would help, please ask. Thank you in advance.
I have a spatial dataset that consists of a large number of point measurements (n=10^4) that were taken along regular grid lines (500m x 500m) and some arbitrary lines and blocks in between. Single measurements taken with a spacing of about 0.3-1.0m (varying) along these lines (see example showing every 10th point).
The data can be assumed to be normally distributed but shows a strong small-scale variability in some regions. And there is some trend with elevation (r=0.5) that can easily be removed.
Regardless of the coding platform, I'm looking for a good or "the optimal" way to interpolate these points to a regular 25 x 25m grid over the entire area of interest (5000 x 7000m). I know about the wide range of kriging techniques but I wondered if somebody has a specific idea on how to handle the "oversampling along lines" with rather large gaps between the lines.
Thank you for any advice!
Leo
Kriging technique does not perform well when the points to interpolate are taken on a regular grid, because it is necessary to have a wide range of different inter-points distances in order to well estimate the covariance model.
Your case is a bit particular... The oversampling over the lines is not a problem at all. The main problem is the big holes you have in your grid. If think that these holes will create problems whatever the interpolation technique you use.
However it is difficult to predict a priori if kriging will behave well. I advise you to try it anyway.
Kriging is only suited for interpolating. You cannot extrapolate with kriging metamodel, so that you won't be able to predict values in the bottom left part of your figure for example (because you have no point here).
To perform kriging, I advise you to use the following tools (depending the languages you're more familiar with):
DiceKriging package in R (the one I use preferably)
fields package in R (which is more specialized on spatial fields)
DACE toolbox in matlab
Bonus: a link to a reference book about kriging which is available online: http://www.gaussianprocess.org/
PS: This type of question is more statistics oriented than programming and may be better suited to the stats.stackexchange.com website.
Given an input of 2D points, I would like to segment them in lines. So if you draw a zig-zag style line, each of the segments should be recognized as a line. Usually, I would use OpenCV's
cvHoughLines or a similar approach (PCA with an outlier remover), but in this case the program is not allowed to make "false-positive" errors. If the user draws a line and it's not recognized - it's ok, but if the user draws a curcle and it comes out as a square - it's not ok. So I have an upper bound on the error - but if it's a long line and some of the points have a greater distance from the approximated line, it's ok again. Summed up:
-line detection
-no false positives
-bounded, dynamically adjusting error
Oh, and the points are drawn in sequence, just like hand drawing.
At least it does not have to be fast. It's for a sketching tool. Anyone has an idea?
This has the same difficulty as voice and gesture recognition. In other words, you can never be 100% sure that you've found all the corners/junctions, and among those you've found you can never be 100% sure they are correct. The reason you can't be absolutely sure is because of ambiguity. The user might have made a single stroke, intending to create two lines that meet at a right angle. But if they did it quickly, the 'corner' might have been quite round, so it wouldn't be detected.
So you will never be able to avoid false positives. The best you can do is mitigate them by exploring several possible segmentations, and using contextual information to decide which is the most likely.
There are lots of papers on sketch segmentation every year. This seems like a very basic thing to solve, but it is still an open topic. The one I use is out of Texas A&M, called MergeCF. It is nicely summarized in this paper: http://srlweb.cs.tamu.edu/srlng_media/content/objects/object-1246390659-1e1d2af6b25a2ba175670f9cb2e989fe/mergeCF-sbim09-fin.pdf.
Basically, you find the areas that have high curvature (higher than some fraction of the mean curvature) and slow speed (so you need timestamps). Combining curvature and speed improves the initial fit quite a lot. That will give you clusters of points, which you reduce to a single point in some way (e.g. the one closest to the middle of the cluster, or the one with the highest curvature, etc.). This is an 'over fit' of the stroke, however. The next stage of the algorithm is to iteratively pick the smallest segment, and see what would happen if it is merged with one of its neighboring segments. If merging doesn't increase the overall error too much, you remove the point separating the two segments. Rinse, repeat, until you're done.
It has been a while since I've looked at the new segmenters, but I don't think there have been any breakthroughs.
In my implementation I use curvature median rather than mean in my initial threshold, which seems to give me better results. My heavily modified implementation is here, which is definitely not a self-contained thing, but it might give you some insight. http://code.google.com/p/pen-ui/source/browse/trunk/thesis-code/src/org/six11/sf/CornerFinder.java
I have this graph created with gnuplot
However the red line at the bottom seems like very straight due to the y-axis range although it is not (it should look like the blue one). How can make the range of the y-axis very fine grained (lots of ticks) so very small values of the red graph can be visible ? Hope I was clear thanks.
I can think of two possible solutions to your question.
Use a logarithmic scale with set logscale y. This would change the look of your plot quite a bit but you would still have all the data related to a single scale and it would most probably introduce a "higher resolution" to your red line.
Introduce a second y-axis like in this example.
As far as I know, it is not possible to increase the resolution only on a specific part of an axis. I think, this would lead to more confusion than it would do any good.