I have a data set that I receive from an outside source, and have no real control over.
The data, when plotted, shows two clumps of points with several sparse, irrelevant points. Here is a sample plot:
There is a clump of points on the left, clustered around (1, 16). This clump is actually part of a set of points that lies on (or near to) a line stretching from (1, 17.5) to (2.4, 13).
There is also an apparent curve from (1.75, 18) to (2.75, 12.5).
Finally, there are some sparse points above the second curve, around (2.5, 17).
Visually, it's not difficult to separate these groups of points. However, I need to separate these points within the data file into three groups, which I'll call Line, Curve, and Other (the Curve group is the one I actually need). I'd like to write a program that can do this reasonably well without needing to visually see the plot.
Now, I'm going to add a couple items that make this much worse. This is only a sample set of data. While the shapes of the curve and line are relatively constant from one data set to the next, the positions are not. These regions can (and do) shift, both horizontally and vertically. The only real constant is that there's a negative-slope line from the top-left to the bottom-right of the plot, an almost curve from the top-center to the bottom-right, and most of the sparse points are in the top-right corner, above the curve.
I'm on Linux, and I'm out of ideas. I can tell you the approaches that I've tried, though they have not done well.
First, I cleaned up the data set and sorted it in ascending order by x-coordinate. I thought that maybe the points were sorted in some sort of a logical way that would allow me to 'head' or 'tail' the data to achieve the desired result, but this was not the case.
I can write a code in anything (Python, Fortran, C, etc.) that removes a point if it's not within X distance of the previous point. This would be just fine, except that the scattering of the points is such that two points very near each other in x, are separated by an appreciable distance in y. It also doesn't help that the Line and Curve draw near one another for larger x-values.
I can fit a curve to a partial data set. When I sort the data by x-coordinate, for example, I can choose to only plot the first 30 points, or the last 200, or some set of 40 in the middle somewhere. That's not a problem. But the Line points tuck underneath the Curve points, which causes a problem.
If the Line points were fairly constant (which they're not), I could rotate my plot by some angle so that the Line is vertical and I can just look at the points to the right of that line, then rotate back. This may the best way to go about doing this, but in order to do that, I need to be able to isolate the linear points, which is more or less the essence of the problem.
The other idea that seems plausible to me, is to try to identify point density and split the data into separate files by those parameters. I think this is the best candidate for this problem, since it is independent of point location. However, I'm not sure how to go about doing this, especially because the Line and Curve do come quite close together for larger x-values (In the sample plot, it's x-values greater than about 2).
I know this does not exactly fall in with the request of a MWE, but I don't know how I'd go about providing a more classical MWE. If there's something else I can provide that would help, please ask. Thank you in advance.
Related
Its been a while that I'm stuck with an apparently "simple" problem. My goal is to build the envelope of a set of lines that are "attached" to a curve. Let's say a curve like this:
For the above example I would expect the envelope of lines (whose directions are depicted by arrows and are orthogonal to the edges of the red curve) to be an arc of a circle.
I thought of doing this in two computationally separate ways:
Intersection of consecutive lines: In an ideal smooth world, the envelop of lines attached is a curve where the red lines are all tangent to. Now, coming back to the discrete world I try to obtain the envelope curve by intersecting consecutive lines (for example the first line with the second line would give the first vertex of the envelope).
Evolute of the red curve: Again in an ideal smooth world, one can think of such an envelop as the evolute of the red curve (see Evolute - wikipedia). Therefore, all I had to do in addition to current info was to compute the curvature and then build the evolute (naturally I had to use a discrete version of curvature which you can find its definition here: Discrete Curvature - wikipedia).
Doing any of the above approaches I would get the following result:
However, finding the "correct arc" is heavily dependent on the accuracy of the initial data which is the red curve. As soon as the red curve has some "noises" in the vertices the envelope is heavily distorted. Here I add a picture (where the red curve is visually intact (but not actually) yet the envelope is distorted):
My Question: How can I rectify this? I believe there should be a numerical approach to solve this issue as I badly need this envelope to be correctly built. I'm a mathematician and am not fully aware of the numerical tricks that might exist in dealing with cases like this.
However, I believe that this should be a standard question in computer graphics community though I could not find anything properly relevant after searching for months.
It would be great if the solutions are in MATLAB language. Please let me know if you want me to be more accurate regarding the passage.
For the line intersecton method, yes, because the lines are relatively parallel, any small error in the defining data for a line will produce a dramatic error in their intersection points.
I suggest the following:
Calculate all lines.
Calculate all intersection points of the adjacent lines.
Calculate the distances between all adjacent intersection points.
Sequence plot the distances, and identify all distances which are more than,
perhaps, 2 standard deviations from the trend line of the distances.
If the data is not "too bad" then I think the identified distances
will mostly come in pairs, ie, there is one "bad" intersection line
causing two "bad" distances.
Exclude the "bad" lines and reprocess the remaining intersection points.
The above assumes the granularity of the data is greater where the base curve is curvier.
If the intersection point distances seem to form two trend lines, especially if they look like to two diverging, or two converging, trend lines, then group the intersection lines accordingly, plot two envelopes, and take the average of the two envelopes as "the envelope". (Or perahps even more trend lines, if there is a regular error in the data.)
But, if there are signs of regular data errors, then a contextual assessment and analysis of the data source and how it was generated/gathered/measured might be required to correctly determine which data should be excluded.
I am dealing with a reverse-engineering problem regarding road geometry and estimation of design conditions.
Suppose you have a set of points obtained from the measurement of positions of a road. This road has straight sections as well as curve sections. Straight sections are, of course, represented by lines, and curves are represented by circles of unknown center and radius. There are, as well, transition sections, which may be clothoids / Euler spirals or any other usual track transition curve. A representation of the track may look like this:
We know in advance that the road / track was designed taking this transition + circle + transition principle into account for every curve, yet we only have the measurement points, and the goal is to find the parameters describing every curve on the track, this is, the transition parameters as well as the circle's center and radius.
I have written some code using a nonlinear optimization algorithm, where a user can select start and end points and fit a circle that to the arc section between them, as it shows in next figure:
However, I don't find a suitable way to take the transition into account. After giving it some thought I came to think that this s because, given a set of discrete points -with their measurement error- representing a full curve, it is not entirely clear where to consider it "begins" and where it "ends" and, moreover, it is less clear where to consider the transition, the proper circle and the exit transition "begin" and "end".
Is there any work on this subject which I may have missed? is there a proper way to fit the whole transition + curve + transition structure into the set of points?
As far as I know, there's no method to fit a sequence clothoid1-circle-clothoid2 into a given set of points.
Basic facts are that two points define a straight, and three points define a unique circle.
The clothoid is far more complex, because you need: The parameter A, the final radius Rf, an initial point px,py, the radius Ri at that point, and the tangent T (angle with X-axis) at that point.
These are 5 data you may use to find the solution.
Due to clothoid coords are calculated by expanded Fresnel integrals (see https://math.stackexchange.com/a/3359006/688039 a little explanation), and then apply a translation & rotation, there's no an easy way to fit this spiral into a set of given points.
When I've had to deal with your issue, what I've done is:
Calculate the radius for triplets of consecutive points: p1p2p3, p2p3p4, p3p4p5, etc
Observe the sequence of radius. Similar values mean a circle, increasing/decreasing values mean a clothoid; Big values would mean a straight.
For each basic element (line, circle) find the most probably characteristics (angles, vertices, radius) by hand or by some regression method. Many times the common sense is the best.
For a spiral you may start with aproximated values, taken from the adjacent elements. These values may very well be the initial angle and point, and the initial and final radius. Then you need to iterate, playing with Fresnel and 'space change' until you find a "good" parameter A. Then repeat with small differences in the other values, those you took from adjacents.
Make the changes you consider as good. For example, many values (A, radius) use to be integers, without decimals, just because it was easier for the designer to type.
If you can make a small applet to do these steps then it's enough. Using a typical roads software helps, but doesn't avoid you the iteration process.
If the points are dense compared to the effective radii of curvature, estimate the local curvature by least square fitting of a circle on a small number of points, taking into account that the curvature is most of the time zero.
You will obtain a plot with constant values and ramps that connect them. You can use an estimate of the slope at the inflection points to figure out the transition points.
Example image:
Given a set of connected lines (see thick black lines in example image), how can you generate a set of offset contour lines that form loops (see thin blue lines)? The offset is constant across all lines, and the contours are always parallel to its associated lines.
The input line topology is arbitrary: i.e. it may contain cycles. Note that the number of contour loops is equal to the number of cycles plus one. A solution that just deals with tree topologies only (no cycles) could also be of interest.
Any papers or relevant algorithms out there that tackle this problem?
The basic method is to construct the bissectrix of the angles (on the right side) and draw on it a length such that it achieves the desired offset (a little of trigonometry). And to link them in the loop traversal order. Different capping rules can be used at free endpoints.
For this to be possible, you need a representation of the geometry as a planar graph (quad-edge for instance). Maybe have a look here: https://mathoverflow.net/q/23811.
Anyway, this method will not avoid the overlaps that can arise, nor self-intersecting offsets. These are much more difficult problems that require a global approach, and are similar to the polygon union problem.
I implemented a multi-series line chart like the one given here by M. Bostock and ran into a curious issue which I cannot explain myself. When I choose linear interpolation and set my scales and axis everything is correct and values are well-aligned.
But when I change my interpolation to basis, without any modification of my axis and scales, values between the lines and the axis are incorrect.
What is happening here? With the monotone setting I can achieve pretty much the same effect as the basis interpolation but without the syncing problem between lines and axis. Still I would like to understand what is happening.
The basis interpolation is implementing a beta spline, which people like to use as an interpolation function precisely because it smooths out extreme peaks. This is useful when you are modeling something you expect to vary smoothly but only have sharp, infrequently sampled data. A consequence of this is that resulting line will not connect all data points, changing the appearance of extreme values.
In your case, the sharp peaks are the interesting features, the exception to the typically 0 baseline value. When you use a spline interpolation, you are smoothing over these peaks.
Here is a fun demo to play with the different types of line interpoations:
http://bl.ocks.org/mbostock/4342190
You can drag the data around so they resemble a sharp peak like yours, even click to add new points. Then, switch to a basis interpolation and watch the peak get averaged out.
Currently I am drawing a 3D curve consisting of 1200...1500 straight micro-lines directed by an array of 3D points (x,y,z), but rendering is a bit slow regardless of used technology (Adobe Flash, Three.js).
The curve is a kind of 3D arc with a 180 degree loop at the end, so I thought that skipping some points in places where the curve is more smooth and predictable will speed up rendering.
Could you suggest some algorithm which can determine how close to a straight line the specific piece of 3D curve is?
Update
I tried to make Three.js to render these points as a single curve and it works really fast. But the different pieces of this curve should be differently colored, so I have to draw it as a bunch of separate lines and the only thing I can do to speed it up is to skip every second point in a region where line is close to a straight line.
I can not use OpenGL (WebGL) because not all browsers support it.
The difference between three points and a straight line can be quantified from the distance of the middle one from the line on which the other two rest. Probably getting the two lengths along the line from either end point to the middle one, dividing the distance by both and summing the two results is the easiest way to turn it into a single number.
So:
as the middle point gets closer to the line, the number goes down;
as the line segment grows longer, variations by the mid point need to be proportionally more extreme; and
greater local slope (as if the middle point were very close to either end) produces a greater error.
You can get the distance from a position to a line by obtaining the vector from any point on the line to the position, using the dot product to work out how much of that is movement along the line and then subtracting that from the total. If you don't normalise your vector along the line first you end up having multiplied by the square of it, so no need for a square root operation on that account. Then for the implied length calculation you can just keep and compare all of those as squared.