as.linnet function causes to hang the machine - spatstat

I am using the following code to read a shapefile as a linnet object, and in the last line where I use as.linnet, the process just hangs and I have to force quit Rstudio, I do not know what is wrong? I tried both packages on CRAN and on github, but both have the same outcome: Rstudio hangs.
library(spatstat)
library(maptools)
library(sp)
setwd("~/documents/rwork/traced")
roads<-readShapeSpatial('NLroads')
spatstat.roads<-as.psp(roads)
#when I do head(spatstat.roads), it gives me only 5 line segments
#while the shapefile has 174 line segments
plot(spatstat.roads)
final_roads<-as.linnet(spatstat.roads)
I do not know then if the problem is with my shapefile? Also I do not know what it means by:
In as.psp.SpatialLinesDataFrame(roads) : 1 columns of data frame discarded
here is the line data i am reading in. Any help would be great. Thanks.

The short answer: your dataset is large; set the argument sparse=TRUE in the last line, and give the computer a few minutes.
The long answer: a SpatialLines object is basically a list of curves, each curve consisting of a sequence of straight line segments. Your dataset roads has 174 curves, consisting of a total of 38635 straight line segments (so there is an average of over 200 line segments in each curve). When you do as.psp(roads), you extract just the straight line segments, so there are 38635 of them (this would be printed out if you typed the name of the object, or you could use nsegments to count them). When you type head(spatstat.roads) you are just getting the first 5 entries.
Your dataset is a SpatialLinesDataFrame in which each curve carries additional data. These additional columns of data are ignored by as.psp currently, so it issues a warning that it has ignored them. You can extract them from the original object if you need them.
The command as.linnet(spatstat.roads) calls the function as.linnet.psp. This tries to guess which of the line segments you intended to be joined in the linear network. It does this by finding cases where two different segments have identical endpoints or very close endpoints. The argument eps controls the closeness threshold. More importantly the argument sparse determines whether to use a sparse matrix representation of the network topology. For this size of dataset, you definitely need the sparse matrices, so set sparse=TRUE.

Related

How can I use 2D infinite lines as keys of an associative container that can be queried by proximity?

I have a thousands of line segments that I'd like to cluster by colinearity. One way to do this is to make an associative container with keys that are infinite lines. With such a container I could use a collection of line segments as values and add a line segment by determining the infinite line of which it is a segment and inserting into the corresponding bin.
Given such a set up, what is the best way to characterize the infinite lines for supporting the ability to query the data structure for line keys that are near a given line?
For example I was thinking of using an R-tree of points (Elsewhere in this project I am already using Boost.Geometry R-trees) where each point is the x-intercept and y-intercept of an infinite line. However, this only works for non-vertical and non-horizontal lines. I could handle vertical and horizontal lines as special cases but then I would not be able to easily query for lines that are "near" a vertical or horizontal line the way that I will be able to query for lines that are near a non-axis aligned line by doing a 2D range query of the intercept points in the R-tree.
I'm wondering if there is some elegant way of handling this problem. How can I represent infinite 2D lines as points such that horizontal and vertical lines are no different than any other kind of line and such that lines that are near each other map to points that are near each other?
I have two solutions.
The first is a simple one with some limitations:
For each infinite line, you could compute the point on the line where the perpendicular drawn from the origin meets the line. You could store the coordinates of this point as a "signature" of that line. This solution will work for all lines except those that pass through the origin. That is because when the line passes through the origin, the "signature" point will always be the origin no matter the slope of the line.
The second solution extends the first one to solve that problem:
In addition to the coordinates of the point described above, you can also store the angle the normal of the line makes with the x-axis. So you'd be representing each line with an ordered triplet (x, y, theta). You can store these triplets in an rtree for 3d points and query that tree.
Two lines that pass through the origin could have a theta value of pi/4 radians and 5*pi/4 respectively. They'd be coincident, but the way they are stored in the rtree doesn't reflect that. So just for the lines that pass through the origin, you could enforce a convention, say - theta must be between 0 and pi. Such a convention would fix the problem. This convention should only be enforced for lines that pass through the origin.
Update:
Coming up with a solution that is better optimized for your use-case will require a clear definition of how you measure the "proximity" between two infinite lines.

Relative risk estimation in spatstat

I am running into problems when computing the relative risk estimation (relrisk.ppp) of two point patterns: One with four marks in a rectangular region and the other with two marks in a circular region.
For the first pattern with four marks, I am able to get the relative risk and the resulting object in a large imlist with 4 elements corresponding to each mark.
However, for the second pattern, it gives a list of 10 elements, of which the first matrix v is empty with NA entries. I am breaking my head on what possibly could be wrong when the created point pattern objects seems to be identical. Any help will be appreciated. Thanks.
For your first dataset, the result is a list of image objects (a list of four objects of class im). For your second dataset, the result of relrisk.ppp is a single image (object of class im). This is the default behaviour when there are only two possible types of points (two possible mark values). See help(relrisk.ppp).
In all cases, you should just be able to plot and print the resulting object. You don't need to examine the internal data of the image.
More explanation: when there are only two possible types of points, the default behaviour of relrisk.ppp is to treat them as case-control data, where the points belonging to the first type are treated as controls (e.g. non-infected people), and the points of the second type are treated as cases (e.g. infected people). The ratio of intensities (cases divided by controls) is estimated as an image.
If you don't want this to happen, set the argument casecontrol=FALSE and then relrisk.ppp will always return a list of images, with one image for each possible mark. Each image gives the spatially-varying probability of that type of point.
It's all explained in help(relrisk.ppp) or in the book.

Fitting multiple curves to one data set

I have a data set that I receive from an outside source, and have no real control over.
The data, when plotted, shows two clumps of points with several sparse, irrelevant points. Here is a sample plot:
There is a clump of points on the left, clustered around (1, 16). This clump is actually part of a set of points that lies on (or near to) a line stretching from (1, 17.5) to (2.4, 13).
There is also an apparent curve from (1.75, 18) to (2.75, 12.5).
Finally, there are some sparse points above the second curve, around (2.5, 17).
Visually, it's not difficult to separate these groups of points. However, I need to separate these points within the data file into three groups, which I'll call Line, Curve, and Other (the Curve group is the one I actually need). I'd like to write a program that can do this reasonably well without needing to visually see the plot.
Now, I'm going to add a couple items that make this much worse. This is only a sample set of data. While the shapes of the curve and line are relatively constant from one data set to the next, the positions are not. These regions can (and do) shift, both horizontally and vertically. The only real constant is that there's a negative-slope line from the top-left to the bottom-right of the plot, an almost curve from the top-center to the bottom-right, and most of the sparse points are in the top-right corner, above the curve.
I'm on Linux, and I'm out of ideas. I can tell you the approaches that I've tried, though they have not done well.
First, I cleaned up the data set and sorted it in ascending order by x-coordinate. I thought that maybe the points were sorted in some sort of a logical way that would allow me to 'head' or 'tail' the data to achieve the desired result, but this was not the case.
I can write a code in anything (Python, Fortran, C, etc.) that removes a point if it's not within X distance of the previous point. This would be just fine, except that the scattering of the points is such that two points very near each other in x, are separated by an appreciable distance in y. It also doesn't help that the Line and Curve draw near one another for larger x-values.
I can fit a curve to a partial data set. When I sort the data by x-coordinate, for example, I can choose to only plot the first 30 points, or the last 200, or some set of 40 in the middle somewhere. That's not a problem. But the Line points tuck underneath the Curve points, which causes a problem.
If the Line points were fairly constant (which they're not), I could rotate my plot by some angle so that the Line is vertical and I can just look at the points to the right of that line, then rotate back. This may the best way to go about doing this, but in order to do that, I need to be able to isolate the linear points, which is more or less the essence of the problem.
The other idea that seems plausible to me, is to try to identify point density and split the data into separate files by those parameters. I think this is the best candidate for this problem, since it is independent of point location. However, I'm not sure how to go about doing this, especially because the Line and Curve do come quite close together for larger x-values (In the sample plot, it's x-values greater than about 2).
I know this does not exactly fall in with the request of a MWE, but I don't know how I'd go about providing a more classical MWE. If there's something else I can provide that would help, please ask. Thank you in advance.

Fourier transform of a time sequence data shows a diagonal line in frequency space

I have sets of data with 1000 equally spaced points (in time-space) and was able to get its Fourier transform (in frequency-space), but the problem is that one set of data shows a diagonal line which passes from the last point of the right hand side back to the last point of the left hand side. I tried lowering the samples by only taking the last 500 points, but it seems to be around even after taking only the last 100 points. Thus maybe it's not sample dependent, but rather something's lacking/wrong with my syntax.
FFT was called by the 3 lines below (which I got from other posts)
sp = np.fft.fft(y1_500)
freq= np.fft.fftfreq(y1_500.shape[-1])
plt.plot(freq, np.abs(sp))
Can anyone tell me what's with the diagonal line?

Importing Transient Data into Paraview

I have a 3D triangulated surface. Nodes and Conn variables store the coordinates and connectivity of the triangles. At each vertex, a scalar quantity, S, and a vector with three components, V, are stored. These data are time-dependent. Also, my geometry does not change over time and I have one surface for all the timesteps.
How should I approach for writing a VTK file that has the transient data over this surface? In other words, I want to write the value of S and V at different timestep on this 3D surface in a single VTK file. I ultimately want to import this VTK file into Paraview for visualization. vtkTemporalDataSet seems to be the solution for me but I could not find an example on how to write an ASCII or binary file for this VTK class. Could vtkPolyData somehow be used to define time so that Paraview knows the transient nature of my dataset? I would appreciate any help or comment.
The VTK file format does not support transient data. However, you can write a series of files that ParaView will interpret as a time sequence. This will work fine with poly data in the VTK file. The file series is defined as files of the same name with a number identifier in them. For example, if you have a series of files named:
MyFile_000.vtk
MyFile_001.vtk
MyFile_002.vtk
ParaView will group these files together in its file browser and when you read them together, it will treat them as a file sequence with 3 time steps.
The bad part of this representation is that you will have to replicate the Nodes and Conn in each file. If that is a problem, you will have to use a different file format that supports multiple time steps using the same connection information (such as the Exodus II file format).

Resources