is it possible to limit number of points per segment - spatstat

I have network of approx 8K segments each of 20m. A random poisson process creates a realization on this network. Next I want to add another poisson process but the points from second process should not fall on segments that are already containing points from first process.
Q1> Largely, I am curios to know if points per segments can be defined. My understanding is this is not possible because of the properties of poisson process. But may be there is a optional argument to limit points per segment? I know it is possible to limit number of points on the whole linnet object, but I am wondering if this is possible per segment of linnet.
Q2> I thought of excluding the segments with points from the first process. My understanding is that I cannot exclude segments from a linnet because the the network gets disconnected / disjoint and this is not preferred in spatstat.
Please correct me on these two issues.
Currently I plan to use random poisson but later when some surveys are finished, I will use covariates to model intensity of points.
Thank you.

What do you mean by avoiding locations picked by the first process? If you are just talking about the exact locations it is already very unlikely that a previously picked point will be chosen unless you have massive amounts of data. These are random double precision numbers so there is room for a lot of distinct points along the linear network.

Related

Parallelizing search in a 2D array on CUDA

I have a 500 x 500 2D array of floats. I wish to search in the vertical and horizontal directions from the middle of the array for the first zero element in both directions. The output should be 4 indices for the first zero element in the North, South, East and West directions. Is there a way to parallelize this search operation on CUDA.
Thanks.
(This answer assumes that you are not searching entire quadrants, but only the straight lines in each direction)
1. In case the array is in CPU memory
In fact, you have a search space of just 1,000 elements. The overhead of copying the data, launching the kernel and waiting for the result is such that it is not worth your trouble.
Do it on the CPU. One of your axes already has the data nicely laid out, consecutively; probably best to work on that axis first. The other axis will be a bitch in terms of memory access, but that's life. You could go multi-threaded here, but I'm not sure it's worth your trouble for so little work. If you did, each thread would wait on its own element.
As far as the algorithm - since your data isn't sorted, it's basically a linear search (up to vectorization). If you've gone multi-threaded - perhaps use a shared variable which a thread occasionally polls to see if an "closer-to-the-center" thread has found a zero yet; and when a thread finds a zero, it updates that variable to let other threads know to stop working.
2. In case the array is in GPU global memory
Now you get lots of (CUDA) 'threads'. So, it makes less sense to use an atomic variable, or polling etc.
We treat each of the four directions separately (although it doesn't have to be 4 separate kernels).
As #RobertCrovella notes, you can treat this problem as a parallel reduction, with each thread assigned an input element: Initially, each thread holds a value of infinity (if its corresponding element is non-zero), or its distance from the center if its corresponding array value is 0. Now, the reduction operator is "minimum".
This is not entirely optimal, because when warp or block results are collected (as part of a parallel reduction), this problem allows for short-circuiting when the lowest non-infinity value is located. You can read up how parallel reduction is implemented - but I really wouldn't bother, because you have a very small amount of computational work here.
Note: It is also possible that your array is in GPU array memory. In that case you would get better locality in both dimensions
It's not really clear how you define "first zero element in the North, South, East and West directions" but I could imagine a rectangular data set broken into 4 quadrants along the diagonals.
We could label the top region the "north region" and we could label the other regions similarly.
with that assumption, In the worst case you have to check every element of the array.
Therefore one possible approach is a parallel reduction.
You would then do a parallel reduction on each region, such that the distance from the center (using the standard distance formula) is minimized, considering the zero elements in the region.
If you are actually only interested in the elements associated with the vertical axis and horizontal axis that pass through the center of the image, then another approach may be better.
Even in that case, I think a parallel reduction would be a typical approach, two for each axis, considering only the zero elements on the axis half.

How to interpret Random Effects Plot from mgcv

I have a few questions regarding using a random effect in a GAM. First, how do you interpret and communicate the output graph?
I have fire modeled as a random effect in this GAM because it is largely a random occurrence at my different field sites and I only noted it as a binary. It wouldn't work as a normal variable since it has too few levels and there is also relatively few sites with fire. However, it greatly improved model variance capture when included so I don't want to simply exclude it. I don't know how to interpret the output and I am also not entirely confident that there wouldn't be another way to include it in the model other than as a random effect. Any help would be greatly appreciated!
The effect has been modelled as a random slope if you didn't code it as a factor in the data. The value on the y axis is the estimated slope; it will be a little smaller in absolute value than if you use Fire as a linear fixed effect in the model formula because it is being penalised (shrunk) towards zero.
This likely should have been fitted as a binary fixed effect; code Fire as a factor with two levels (Yes/No, or Burned / Unburned say). Just because a variable represents something that is random over the data doesn't mean it is a suitable random effect; fire here has some average effect and the fixed effect describes that well. There's nothing stopping you from using Fire coded as a factor as a random effect via the smooth, but with only two levels it's not going the two intercepts aren't going to be estimate that precisely.
Now, if you had repeated observations on n sites and you thought the Fire effect varied across the n sites then you could do s(Site, Fire, bs = 're') where both Site and Fire are factors and you'll get different Fire effects for each Site. Then the plot you show would have many points on it as it is a QQ-plot of the estimated values for the effect of Fire in each Site, hence 1 point per Site. Given the way this model is estimated, these are somewhat assumed to be distributed Gaussian with some variance that is inversely proportional to the smoothness parameter selected by gam() when fitting this random effect smoother. That's why the default plot is as it is; it's a QQ-plot comparing the observed distribution of estimate values of the random effects against the theoretical expectation.

How to design a score or signature function based on the time series data

I want to design a score or signature function based on a time series signal. Usually, the signal has ups and downs.
For a given time window, I desire to design the score function based on the number of times it fluctuates, the duration of the fluctuations, and the magnitude of the fluctuations. I am wondering what kind of math I can use to design the function. I am not sure if the statistical features (mean, median, and so on) would be enough to design unique function such that two time windows would be distinguishable.
Thanks!
Summary statistics will not give you what you want... but it can still be useful.
Things you can try:
Zero crossings on the signal will give you number of fluctuations. You'll have to use some central tendency value to move the signal about the 0 line in order to do this. Alternatively you can use FFT on the original to find the harmonic frequency as part of the score.
Could define the duration of fluctuations as the difference between zero crossings divided by two (since one fluctuation will reach the 0-line twice).
Magnitude can be done by finding the local minima and maxima - check out some packages with peak finding functions. You might want to use the mean or median to rule out local minima and maxima that fall on the wrong side of the line. Alternatively, finding the zero crossings on the derivative signal and then mapping them back to the original will give you all the local minima and maxima as well.

What to do when KMeans returns fewer than K clusters?

I've implemented K-Means in Java and have a bit of a head scratcher. I select my initial centroids by choosing a random value in each dimension within the range of values of the data points. I've run into cases where this results in one or more of these centroids not ending up being the closet centroid of any data point. So what do I do for the next iteration? Just leave it at its original randomized value? Pick a new random value? Compute as an average of the other centroids? Seems like this isn't accounted for in the original algorithm, but probably I've just missed something.
Most implementations of k-means define initial centroids using actual data points, not random points in the bounding box drawn by the variables. However, some suggestions for solving your actual problem are below.
You could take another data-point at random and make it a new cluster centroid. This is very simple and fast to implement, and shouldn't affect the algorithm adversely.
You could also try making a smarter initial selection of cluster centroids using kmeans++. This algorithm chooses the first centroid randomly, and picks the remaining K-1 centroids to try and maximize the inter-centroid distance. By picking smarter centroids, you are much less likely to encounter the problem of a centroid being assigned zero data points.
If you wanted to be slightly more clever clever, you could use the kmeans++ algorithm to make a new centroid whenever a centroid gets assigned zero data points.
The way I've used it, the initial values were taken as random points from the data set, not random points in the spanned space. That means each cluster has at least one point in it initially. You could still get unlucky with outliers but with any luck you'll be able to detect this and restart with different points. (Provided "K clusters of points" is an adequate description of your data)
Instead of picking random values (which can be pretty meaningless if the space of possible values is large in comparison to the clusters), many implementations pick random points from the dataset as the initial centroids.

Determining Note Durations based on Onset Locations

I have a question regarding how to determine the Duration of notes given their Onset Locations.
So for example, I have an array of amplitude values (containing short) and another array of the same size, that contains a 1 if a note onset is detected, and a 0 if not. So basically, the distance between each 1 will be used to determine the duration.
How can I do this? I know that I have to use the Sample Rate and other attributes of the audio data, but is there a particular formula that I can use?
Thank you!
So you are starting with a list of ONSETS, what you are really looking for is a list of OFFSETS.
There are many methods for onset detection (here is a paper on it) https://adamhess.github.io/Onset_Detection_Nov302011.pdf
many of the same methods can be applied to Offset Detection:
Since the onset is marked by an INCREASE in spectral content you can measure a decrease in Spectral content.
take a reasonable time window before and after your onset. (.25-.5s)
Chop up the window into smaller segments and take 50% overlapping Fourier transforms.
compute the difference between the fourier co-efficient between two successive windows decreases and only allow negative changes in SD.
multiple your results by -1.
pick the peaks off of the results
Voila, offsets.
(look at page 7 of the paper listed above for more detail about spectrial difference function, you can apply a modified (as above) version of it_
Well, if your samplerate in Hz is fs, then the time between two nodes is equal to
1/fs * <number of zeros between the two node-ones>
Very simple :-)
Regards

Resources