error message after spatstat::Gcross function: Error in marks.ppp(X, ...) : Sorry, not implemented when the marks are a data frame - spatstat

I read a shapefile using maptools package and then converted to .ppp
hfmd <- readShapeSpatial("NEWMERGE.shp")
hfmd2 <- as(hfmd, 'ppp')
When I typed hfmd2, I received this
Marked planar point pattern: 1092 points`
Mark variables: X, Y, Status, ID`
window: rectangle = [492623.7, 609905.3] x [444011.4, 645190.4] units`
And when I typed Gcross(hfmd2) to run Cross G function, I received this error
Error in marks.ppp(X, ...) :`
Sorry, not implemented when the marks are a data frame`
My questions are:
Does Gcross() only work with Multitype Marked planar point pattern object?
How do I convert a .ppp object to a Multitype Marked planar point pattern object?

Your current ppp has four different mark values: X, Y, Status, ID. Which one do you want to use for Gcross? For now I will assume it is Status. Then you can replace the data.frame containing the four mark values for each point with a single mark vector like this:
marks(hfmd2) <- marks(hfmd2)$Status
Now you should be able to run Gcross:
Gc <- Gcross(hfmd2)
This will estimate Gcross between the first two types in Status.

Related

How do you format a hyperframe in spatstat to be able to run the "mppm" function for point process models?

Below is a list of 3 point patterns with measured tree data.
ppp_list
[[1]]
Marked planar point pattern: 3 points
Mark variables: SPCD, DIA, HT
window: polygonal boundary
enclosing rectangle: [-9215316, -9215301] x [8549428, 8549443] units
[[2]]
Marked planar point pattern: 4 points
Mark variables: SPCD, DIA, HT
window: polygonal boundary
enclosing rectangle: [-8942245, -8942230] x [8838323, 8838337] units
[[3]]
Marked planar point pattern: 7 points
Mark variables: SPCD, DIA, HT
window: polygonal boundary
enclosing rectangle: [-8320491, -8320476] x [9268799, 9268813] units
Below are the covariates that I am adding to be included in the hyperframe.
temp <- c(50.75,65.75,48)
prec <- c(85.75,56.42,38.25)
myhyperframe <- hyperframe(trees=ppp_list, temp=temp, prec=prec)
Below is the structure of my hyperframe.
Hyperframe:
trees temp prec
1 (ppp) 50.75 85.75
2 (ppp) 65.75 56.42
3 (ppp) 48.00 38.25
When I tried to run a point process model using the "mppm" function, it comes up with an error saying that my point pattern is not multitype. I am unsure if I am approaching running point process models correctly. The "mppm" function works for the dataset "simba" and when I made the "waterstriders" dataset a hyperframe.
mppm(trees ~ 1, myhyperframe)
# Error in (function (data, dummy, method = c("grid", "dirichlet"), ...) :
# data pattern is not multitype
Any feedback would be great. Thank you!
This is not a problem of formatting.
mppm and ppm do not yet support the analysis of point patterns with multiple columns of marks, or point patterns with continuous numeric marks.
The error message says that the point pattern is not multitype. A multitype point pattern is one which has a single column of marks, which are a factor.
You can either remove the marks, or convert them to a factor, before applying mppm.

Which is the error of a value corresponding to the maximum of a function?

This is my problem:
The first input is the observed data of MUSE, which is an astronomical instrument provides cubes, i.e. an image for each wavelength with a certain range. This means that, taken all the wavelengths corresponding to the pixel i,j, I can extract the spectrum for this pixel. Since these images are observed, for each pixel I have an error.
The second input is a spectrum template, i.e. a model of a spectrum. This template is assumed to be without error. I map this spectra at various redshift (this means multiply the wavelenghts for a factor 1+z, where z belong to a certain range).
The core of my code is the cross-correlation between the cube, i.e. the spectra extracted from each pixel, and the template mapped at different redshift. The result is a cross-correlation function for each pixel for each z, let's call this computed function as f(z). Taking, for each pixel, the argmax of f(z), I get the best redshift.
This is a common and widely-used process, indeed, it actually works well.
My question:
Since my input, i.e. the MUSE cube, has an error, I have propagated this error through the cross-correlation, obtaining an error on f(z), i.e. each f_i has a error sigma_i. So, how can I compute the error on z_max, which is the value of z corresponding to the maximum of f?
Maybe a solution could be the implementation of bootstrap method: I can extract, within the error of f, a certain number of function, for each of them I computed the argamx, so i can have an idea about the scatter of z_max.
By the way, I'm using python (3.x) and tensorflow has been used to compute the cross-correlation function.
Thanks!
EDIT
Following #TF_Support suggestion I'm trying to add some code and some figures to better understand the problem. But, before this, maybe it's better a little of math.
With this expression I had computed the cross-correlation:
where S is the spectra, T is the template and N is the normalization coefficient. Since S has an error, I had propagated these errors through the previous relation founding:
where SST_k is the the sum of the template squared and sigma_ij is the error on on S_ij (actually, I should have written sigma_S_ij).
The follow function (implemented with tensorflow 2.1) makes the cross-correlation between one template and the spectra of batch pixels, and computes the error on the cross-correlation function:
#tf.function
def make_xcorr_err1(T, S, sigma_S):
sum_spectra_sq = tf.reduce_sum(tf.square(S), 1) #shape (batch,)
sum_template_sq = tf.reduce_sum(tf.square(T), 0) #shape (Nz, )
norm = tf.sqrt(tf.reshape(sum_spectra_sq, (-1,1))*tf.reshape(sum_template_sq, (1,-1))) #shape (batch, Nz)
xcorr = tf.matmul(S, T, transpose_a = False, transpose_b= False)/norm
foo1 = tf.matmul(sigma_S**2, T**2, transpose_a = False, transpose_b= False)/norm**2
foo2 = xcorr**2 * tf.reshape(sum_template_sq**2, (1,-1)) * tf.reshape(tf.reduce_sum((S*sigma_S)**2, 1), (-1,1))/norm**4
foo3 = - 2 * xcorr * tf.reshape(sum_template_sq, (1,-1)) * tf.matmul(S*(sigma_S)**2, T, transpose_a = False, transpose_b= False)/norm**3
sigma_xcorr = tf.sqrt(tf.maximum(foo1+foo2+foo3, 0.))
Maybe, in order to understand my problem, more important than code is an image representing an output. This is the cross-correlation function for a single pixel, in red the maximum value, let's call z_best, i.e. the best cross-correlated value. The figure also shows the 3 sigma errors (the grey limits are +3sigma -3sigma).
If i zoom-in near the peak, I get this:
As you can see the maximum (as any other value) oscillates within a certain range. I would like to find a way to map this fluctuations of maximum (or the fluctuations around the maximum, or the fluctuations of the whole function) to an error on the value corresponding the maximum, i.e. an error on z_best.

Is there an equivalent to ppp marks for pp3?

I would like to run a mark correlation function on a simple 3d dataset, but it seems like you cannot assign marks with a pp3... Am I missing something or are there other means?
The function pp3 doesn’t accept a marks argument, but you can assign marks to the resulting object:
x <- runif(100)
y <- runif(100)
z <- runif(100)
X <- pp3(x, y, z, c(0, 1), c(0,1), c(0,1))
marks(X) <- rnorm(100)
This uses the method marks<-.ppx. Methods for class ppx apply to class pp3.
However, mark correlation functions are not yet implemented for pp3 objects. The help for markcorr says that the point pattern must be two-dimensional.

Read an undirected graph from a list of edges

I am having problems with reading an undirected graph from a list of edges. I have my list of edges in a txt file like this:
BND IEF 0.943176118
BND LQD 0.885572253
BND TIP 0.83072059
BND TLT 0.897231452
DBC USO 0.885015182
etc.
And then my code is:
G0 = nx.Graph()
G0 = nx.read_edgelist(place_holder + "edges_for_graph.txt", nodetype = str, data = (('weight', int),))
But when I run the code I have this problem:
TypeError: Failed to convert weight data 0.943176118 to type <class 'int'>.
I have tried changing the txt file (with only one space between each value) but it is not working, Does anyone know how to fix it because the values are int.
Well, from the data snippet, your weights arn't ints. You either convert them to int somehow or instead store them as floats (probably the second one is what you want to do as the values seem to be between 0 and 1).
For the first way you could preprocess your file to drop the weigths or transform them to values of 1. But to just read them correctly, you would use ('weight', float) instead of ('weight', int) as in the example here.

Percentage diff b/t two strings of different lengths

I have a problem where I am trying to prevent repeats of a string. So far the best solution is to compare the strings for a percentage and check if it is above a certain fixed point.
I've looked up Levenshtein distance but so far I believe it does not accomplish my goal since it compares strings of the same length. Both of my strings are more than likely to be significantly different lengths (stack trace). I'm looking for content or word comparison rather than char to char comparison. A percentage answer is the most important part of this.
I assume someone has an algorithm or would be willing to point me in the right direction? Thank you for reading and even more so for helping!
An indirect example... think of them as being stacktraces in py.test form.
I have filepaths and am comparing them
/test/opt/somedir/blah/something
def do_something(self, x):
return x
SomeError: do_something in 'filepath' threw some exception or something
vs
/test/opt/somedir/blah2/somethingelse
def do_another_thing(self, y):
return y
SomeError: do_another_thing in 'different filepath' threw some exception
But also when you have the same filepath, but different errors. The traces are hundreds of lines long, so showing a full example isn't reasonable. This example is as close as I can get without the actual trace.
One way of going at this would be through applications of the Jaro-Winkler String Similarity metric. Happily, this has a PyPI package.
Let's start off with three string, your two examples, and the begining of your question:
s1 = u'''
/test/opt/somedir/blah/something
def do_something(self, x):
return x
SomeError: do_something in 'filepath' threw some exception or something'''
s2 = u'''
/test/opt/somedir/blah2/somethingelse
def do_another_thing(self, y):
return y
SomeError: do_another_thing in 'different filepath' threw some exception'''
q = u'''
I have a problem where I am trying to prevent repeats of a string. So far the best solution is to compare the strings for a percentage and check if it is above a certain fixed point.'''
Then the similarities are:
>> jaro.jaro_metric(s1, s2)
0.8059572665529058
>> jaro.jaro_metric(s1, q)
0.6562121541167517
However, since you know something of the problem domain (it is a sequence of lines of stacktraces), you could do better by calculating line differences, perhaps:
import itertools
>> [jaro.jaro_metric(l1, l2) for l1, l2 in itertools.izip(s1.split('\n'), s2.split('\n'))]
[1.0,
0.9353471118177001,
0.8402824228911184,
0.9444444444444443,
0.8043725314852076]
So, you need to experiment with this, but you could try, given two stacktraces, calculating a "distance" which is a matrix - the i-j entry would be the similarity between the i-th string of the first to the j-th of the second. (This is a bit computationally expensive.) See if there's a threshold for a percentage or number of entries obtaining very high scores.

Resources