I'm trying to find powerlines in LIDAR points clouds with skimage.measures ransac() function. This is my very first time meddling with these modules in python so bear with me.
So far all I knew how to do reliably was filtering low or 'ground' points from the cloud to reduce the number of points to deal with.
def filter_Z(las, threshold):
filtered = laspy.create(point_format = las.header.point_format, file_version = las.header.version)
filtered.points = las.points[las.Z > las.Z.min() + threshold]
print(f'original size: {len(las.points)}')
print(f'filtered size: {len(filtered.points)}')
filtered.write('filtered_points2.las')
return filtered
The threshold is something I put in by hand since in the las files I worked with are some nasty outliers that prevent me from dynamically calculating it.
The filtered point cloud, or one of them atleast looks like this:
Note the evil red outliers on top, maybe they're birds or something. Along with them are trees and roofs of buildings. If anyone wants to take a look at the .las files, let me know. I can't put a wetransfer link in the body of the question.
A top down view:
I've looked into it as much as I could, and found the skimage.measure module and the ransac function that comes with it. I played around a bit to get a feel for it and currently I'm stumped on how to continue.
def ransac_linefit_sklearn(points):
model_robust, inliers = ransac(points, LineModelND, min_samples=2, residual_threshold=1000, max_trials=1000)
return model_robust, inliers
The result is quite predictable (I ran ransac on a 2D view of the cloud just to make it a bit easier on the pc)
Using this doesn't really yield any good results in examples like the one I posted. The vegetation clusters have too many points and the line is fitted through it because it has the highest point density.
I tried DBSCAN() to cluster up the points but it didn't work. I also attempted OPTICS() but as I write it still hasn't finished running.
From what I've read on various articles, the best course of action would be to cluster up the points and perform RANSAC on each individual cluster to find lines, but I'm not really sure on how to do that or what clustering method to use in situations like these.
One thing I'm also curious about doing is just filtering out the big blobs of trees that mess with model fititng.
Inadequacy of RANSAC
RANSAC works best whenever your data fits a mono-modal distribution around your model. In the case of this point cloud, it works best whenever there is only one line with outliers, but there are at least 5 lines when viewed birds-eye. Check out this older SO post that discusses your problem. Francesco's response suggests an iterative RANSAC based approach.
Octrees and SVD
Colleagues worked on a similar problem in my previous job. I am not fluent in the approach, but I know enough to provide some hints.
Their approach resembled Francesco's suggestion. They partitioned the point-cloud into octrees and calculated the singular value decomposition (SVD) within each partition. The three resulting singular values will correspond to the geometric distribution of the data.
If the first singular value is significantly greater than the other two, then the points are line-like.
If the first and second singular values are significantly greater than the other, then the points are plane-like
If all three values are of similar magnitude, then the data is just a "glob" of points.
They used these rules iteratively to rule out which points were most likely NOT part of the lines.
Literature
If you want to look into published methods, maybe this paper is a good starting point. Power lines are modeled as hyperbolic functions.
Related
I'm working on a simple project in which I'm trying to describe the relationship between two positively correlated variables and determine if that relationship is changing over time, and if so, to what degree. I feel like this is something people probably do pretty often, but maybe I'm just not using the correct terminology because google isn't helping me very much.
I've plotted the variables on a scatter plot and know how to determine the correlation coefficient and plot a linear regression. I thought this may be a good first step because the linear regression tells me what I can expect y to be for a given x value. This means I can quantify how "far away" each data point is from the regression line (I think this is called the squared error?). Now I'd like to see what the error looks like for each data point over time. For example, if I have 100 data points and the most recent 20 are much farther away from where the regression line/function says it should be, maybe I could say that the relationship between the variables is showing signs of changing? Does that make any sense at all or am I way off base?
I have a suspicion that there is a much simpler way to do this and/or that I'm going about it in the wrong way. I'd appreciate any guidance you can offer!
I can suggest two strands of literature that study changing relationships over time. Typing these names into google should provide you with a large number of references so I'll stick to more concise descriptions.
(1) Structural break modelling. As the name suggest, this assumes that there has been a sudden change in parameters (e.g. a correlation coefficient). This is applicable if there has been a policy change, change in measurement device, etc. The estimation approach is indeed very close to the procedure you suggest. Namely, you would estimate the squared error (or some other measure of fit) on the full sample and the two sub-samples (before and after break). If the gains in fit are large when dividing the sample, then you would favour the model with the break and use different coefficients before and after the structural change.
(2) Time-varying coefficient models. This approach is more subtle as coefficients will now evolve more slowly over time. These changes can originate from the time evolution of some observed variables or they can be modeled through some unobserved latent process. In the latter case the estimation typically involves the use of state-space models (and thus the Kalman filter or some more advanced filtering techniques).
I hope this helps!
I have a spatial dataset that consists of a large number of point measurements (n=10^4) that were taken along regular grid lines (500m x 500m) and some arbitrary lines and blocks in between. Single measurements taken with a spacing of about 0.3-1.0m (varying) along these lines (see example showing every 10th point).
The data can be assumed to be normally distributed but shows a strong small-scale variability in some regions. And there is some trend with elevation (r=0.5) that can easily be removed.
Regardless of the coding platform, I'm looking for a good or "the optimal" way to interpolate these points to a regular 25 x 25m grid over the entire area of interest (5000 x 7000m). I know about the wide range of kriging techniques but I wondered if somebody has a specific idea on how to handle the "oversampling along lines" with rather large gaps between the lines.
Thank you for any advice!
Leo
Kriging technique does not perform well when the points to interpolate are taken on a regular grid, because it is necessary to have a wide range of different inter-points distances in order to well estimate the covariance model.
Your case is a bit particular... The oversampling over the lines is not a problem at all. The main problem is the big holes you have in your grid. If think that these holes will create problems whatever the interpolation technique you use.
However it is difficult to predict a priori if kriging will behave well. I advise you to try it anyway.
Kriging is only suited for interpolating. You cannot extrapolate with kriging metamodel, so that you won't be able to predict values in the bottom left part of your figure for example (because you have no point here).
To perform kriging, I advise you to use the following tools (depending the languages you're more familiar with):
DiceKriging package in R (the one I use preferably)
fields package in R (which is more specialized on spatial fields)
DACE toolbox in matlab
Bonus: a link to a reference book about kriging which is available online: http://www.gaussianprocess.org/
PS: This type of question is more statistics oriented than programming and may be better suited to the stats.stackexchange.com website.
Given an input of 2D points, I would like to segment them in lines. So if you draw a zig-zag style line, each of the segments should be recognized as a line. Usually, I would use OpenCV's
cvHoughLines or a similar approach (PCA with an outlier remover), but in this case the program is not allowed to make "false-positive" errors. If the user draws a line and it's not recognized - it's ok, but if the user draws a curcle and it comes out as a square - it's not ok. So I have an upper bound on the error - but if it's a long line and some of the points have a greater distance from the approximated line, it's ok again. Summed up:
-line detection
-no false positives
-bounded, dynamically adjusting error
Oh, and the points are drawn in sequence, just like hand drawing.
At least it does not have to be fast. It's for a sketching tool. Anyone has an idea?
This has the same difficulty as voice and gesture recognition. In other words, you can never be 100% sure that you've found all the corners/junctions, and among those you've found you can never be 100% sure they are correct. The reason you can't be absolutely sure is because of ambiguity. The user might have made a single stroke, intending to create two lines that meet at a right angle. But if they did it quickly, the 'corner' might have been quite round, so it wouldn't be detected.
So you will never be able to avoid false positives. The best you can do is mitigate them by exploring several possible segmentations, and using contextual information to decide which is the most likely.
There are lots of papers on sketch segmentation every year. This seems like a very basic thing to solve, but it is still an open topic. The one I use is out of Texas A&M, called MergeCF. It is nicely summarized in this paper: http://srlweb.cs.tamu.edu/srlng_media/content/objects/object-1246390659-1e1d2af6b25a2ba175670f9cb2e989fe/mergeCF-sbim09-fin.pdf.
Basically, you find the areas that have high curvature (higher than some fraction of the mean curvature) and slow speed (so you need timestamps). Combining curvature and speed improves the initial fit quite a lot. That will give you clusters of points, which you reduce to a single point in some way (e.g. the one closest to the middle of the cluster, or the one with the highest curvature, etc.). This is an 'over fit' of the stroke, however. The next stage of the algorithm is to iteratively pick the smallest segment, and see what would happen if it is merged with one of its neighboring segments. If merging doesn't increase the overall error too much, you remove the point separating the two segments. Rinse, repeat, until you're done.
It has been a while since I've looked at the new segmenters, but I don't think there have been any breakthroughs.
In my implementation I use curvature median rather than mean in my initial threshold, which seems to give me better results. My heavily modified implementation is here, which is definitely not a self-contained thing, but it might give you some insight. http://code.google.com/p/pen-ui/source/browse/trunk/thesis-code/src/org/six11/sf/CornerFinder.java
I have n points in R^3 that I want to cover with k ellipsoids or cylinders (I don't really care; whichever is easier). I want to approximately minimize the union of the volumes. Let's say n is tens of thousands and k is a handful. Development time (i.e. simplicity) is more important than runtime.
Obviously I can run k-means and use perfect balls for my ellipsoids. Or I can run k-means, then use minimum enclosing ellipsoids per cluster rather than covering with balls, though in the worst case that's no better. I've seen talk of handling anisotropy with k-means but the links I saw seemed to think I had a tensor in hand; I don't, I just know the data will be a union of ellipsoids. Any suggestions?
[Edit: There's a couple votes for fitting a mixture of multivariate Gaussians, which seems like a viable thing to try. Firing up an EM code to do that won't minimize the volume of the union, but of course k-means doesn't minimize volume either.]
So you likely know k-means is NP-hard, and this problem is even more general (harder). Because you want to do ellipsoids it might make a lot of sense to fit a mixture of k multivariate gaussian distributions. You would probably want to try and find a maximum likelihood solution, which is a non-convex optimization, but at least it's easy to formulate and there is likely code available.
Other than that you're likely to have to write your own heuristic search algorithm from scratch, this is just a huge undertaking.
I did something similar with multi-variate gaussians using this method. The authors use kurtosis as the split measure, and I found it to be a satisfactory method for my application, clustering points obtained from a laser range finder (i.e. computer vision).
If the ellipsoids can overlap a lot,
then methods like k-means that try to assign points to single clusters
won't work very well.
Part of each ellipsoid has to fit the surface of your object,
but the rest may be inside it, don't-cares.
That is, covering algorithms
seem to me quite different from clustering / splitting algorithms;
unions are not splits.
Gaussian mixtures with lots of overlaps ?
No idea, but see the picture and code on Numerical Recipes p. 845.
Coverings are hard even in 2d, see
find-near-minimal-covering-set-of-discs-on-a-2-d-plane.
I'm working on a data mining algorithm that considers features in their n-dimensional feature-space and allows surrounding training examples to block the 'visibility' of other training examples effectively taking them out of the effective training set for this particular query.
I've been trying to find an efficient way to determine which points are 'visible' to the query. I though the realm of computer graphics might offer some insight but there is a lot of information to peruse and much of it either can't be generalized to multiple dimensions or is only efficient when the number of dimensions is low.
I was hoping I could get some pointers from those of you who are more intimately knowledgeable in the domain.
The solution I found is to convert the euclidean coordinates into 'hyper-spherical' coordinates. Its similar to the spherical coordinate system except you add an additional angle with a range [0, pi) for each additional dimension beyond three.
After that I can sort the list of points based on their distance from the origin and iterate through comparing each point in the list to the first item looking for angles that overlap. after each iteration you remove the first item in the list and any items that were discovered to have been blocked. then start over with the new first item (closest item).
Don't know if anyone will ever find this useful but I thought I should put the answer up anyways.