is it possible to run spatstat functions on multiple processors - spatstat

I am curios to know if spastat functions like envelope or MAD tests can be run on multiple processors on a machine to speed up calculations? Is there any document or tutorial to do this?
Thank you.

Unfortunately, parallelisation is not an integrated part of spatstat,
but rather left to the user. For envelopes and MAD tests the easiest
option is probably to run envelope with a smaller number of
realisations on each core and then combine the results using
pool.envelope. How to run envelope in parallel may depend on your
setup. A simple possibility is to use parallel::mclapply which I know
works out of the box on linux, but much better cross platform
alternatives are surely available in packages on CRAN:
library(spatstat)
ppplist <- replicate(4, cells, simplify = FALSE)
envlist <- parallel::mclapply(ppplist, spatstat::envelope, savefuns = TRUE, nsim = 10)
envfinal <- do.call(pool, envlist)
envfinal
#> Pointwise critical envelopes for K(r)
#> and observed value for 'X[[i]]'
#> Obtained from 40 simulations of CSR
#> Alternative: two.sided
#> Significance level of pointwise Monte Carlo test: 2/41 = 0.0488
#> .....................................................................
#> Math.label Description
#> r r distance argument r
#> obs hat(K)[obs](r) observed value of K(r) for data pattern
#> theo K[theo](r) theoretical value of K(r) for CSR
#> lo hat(K)[lo](r) lower pointwise envelope of K(r) from simulations
#> hi hat(K)[hi](r) upper pointwise envelope of K(r) from simulations
#> .....................................................................
#> Default plot formula: .~r
#> where "." stands for 'obs', 'theo', 'hi', 'lo'
#> Columns 'lo' and 'hi' will be plotted as shading (by default)
#> Recommended range of argument r: [0, 0.25]
#> Available range of argument r: [0, 0.25]

Related

Is there any "better way" to train a quadruped to walk using reinforcement Learning

I have been chasing this problem of using RL to train a quadruped to walk. But have got NO noteworthy success. Following are the Details of the GYM ENV I am using.
Sim: pybullet
env.action_space = box(shape=(12,), upper = 1, lower = -1)
converting selected actions and multiplying them by max_actions specified for each joint.
action space are the 3 motor positions(hip_joint_y, hip_joint_x, knee_joint) x 4 legs of the robot
env.observation_space = box(shape=(12,), upper = np.inf, lower = -np.inf)
observation_space include
roll, pitch of the body [r, p]
angular vel [x, y, z]
linear acc [x, y, z]
Binary contact forces for each leg [1 if in contact else 0]. [1, 1, 1, 1]
reward = (
+ distance_reward
- body_rotation_reward
- energy_usage_reward
- body_drift_from x-axis reward
- body_shake_reward)
I have tried the following approaches.
Using PPO from stable-baselines3 for 20 million timesteps [No Distinct improvement]
Using DDPG, TD3, SAC, A2C, and PPO with 5 million timesteps on each algo increasing policy network up to 4 layers of 1024 neurons each [1024, 1024, 1024, 1024] for qf and vf, or actor and critic.
Using the Discrete Delta concept to scale action limits so changing action_space from box to MultiDiscrete with each action limiting from 0 to 6. discrete_delta_vals = [-0.3, -0.1, -0.03, 0, 0.03, 0.1, 0.3]. Each joint value is decided from choosing one value from the discrete_delta_vals list and adding that value to the previous actions.
Keeping hip_joint_y of all legs as zeros and changing action space from box(shape=(12,)) to box(shape=(8,)). Trained this agent for another 6M timesteps, there seems to be a small improvement at first and then the eps_length and mean_reward settles and no significant improvements afterwards.
I have generated Half Ellipsoid Trajectories with IK and That works but that is explicitly Robotics Approach to solve this problem. I am currently looking into DeepMimic to use those trajectories to guide RL to build a stable walking gait. No Significant breakthrough.
Here is the Repo Link
Check the scripts folder and go through the start_training_v(x).py scripts. Thanks in Advance. If you feel like discussing the entire topic to sort this please drop your email in the comment and I'll reach out to you.
Hi try using Nvidia IsaacGym. This uses pytorch end to endon GPU with PPO. I was able to train a custom urdf to walk in about 10 minutes of training

Statistical tests for two random datasets

I need to compare two data sets, i randomly created them in Julia, with rand. I want to know if there's some statistical test (that can be perform in Julia JuMP) that tells me how different the distributions are (having no assumptions of the original distribution).
Why would you want to perform this in JuMP?
This is really a job for the HypothesisTests package:
https://github.com/JuliaStats/HypothesisTests.jl
julia> using HypothesisTests
julia> x, y = rand(100), rand(100);
julia> test = HypothesisTests.ApproximateTwoSampleKSTest(x, y)
Approximate two sample Kolmogorov-Smirnov test
----------------------------------------------
Population details:
parameter of interest: Supremum of CDF differences
value under h_0: 0.0
point estimate: 0.11
Test summary:
outcome with 95% confidence: fail to reject h_0
two-sided p-value: 0.5806
Details:
number of observations: [100,100]
KS-statistic: 0.7778174593052022
julia> pvalue(test)
0.5806177304235198
https://juliastats.org/HypothesisTests.jl/stable/nonparametric/#HypothesisTests.ApproximateTwoSampleKSTest

How to make a dataset similar to that of murchison data in spatstat for ppm and AUc analysis

I have four variables: a point process pattern of species
occurrence, rivers, ponds polygons and land image data. I would like
to make a dataset similar to that of Murchison dataset using these
shape layers but I have failed to manoeuvre.
I need to make a data frame from these polygon shape layers of
rivers, ponds and land cover images together with the point pattern
data of species occurrences I tried using a hyper frame but I am
unable to use a distance function from the river or the ponds.
rivers <- readShapespatial("river.shp") ponds <-
readShapeSpatial(pond.shp") fro <- read.table("fro.txt",
header=TRUE) image <- raster("image.tif")
I would like to combine
these four files as a single spatstat object like that of Murchison
data which comes with spatstat package. if I can put them in a frame
then ponds, land cover, rivers are covariates.
I have used analyst function but return errors that they can not be
used as covariates, fore example x is a list can not be used as
covariates particularly for ponds and rivers when I call the dist
function.
Why do you need a hyperframe? You refer to murchison data and that is not
a hyperframe. It simply a standard R list (with extendend classes
listof, anylist and solist for better printing and plotting in
spatstat, but the actual data structure is just a plain list).
To recreate the murchison data:
library(spatstat)
P <- murchison$gold # Points
L <- murchison$faults # Lines
W <- murchison$greenstone # "Windows
mur <- solist(points = P, lines = L, windows = W)
mur
#> List of spatial objects
#>
#> points:
#> Planar point pattern: 255 points
#> window: rectangle = [352782.9, 682589.6] x [6699742, 7101484] metres
#>
#> lines:
#> planar line segment pattern: 3252 line segments
#> window: rectangle = [352782.9, 682589.6] x [6699742, 7101484] metres
#>
#> windows:
#> window: polygonal boundary
#> enclosing rectangle: [352782.9, 681699.6] x [6706467, 7100804] metres
To use the data in a model they don’t have to be collected in a single list,
but it may be convenient. The following two models are identical:
(mod1 <- ppm(P ~ W))
#> Nonstationary Poisson process
#>
#> Log intensity: ~W
#>
#> Fitted trend coefficients:
#> (Intercept) WTRUE
#> -21.918688 3.980409
#>
#> Estimate S.E. CI95.lo CI95.hi Ztest Zval
#> (Intercept) -21.918688 0.1666667 -22.24535 -21.592028 *** -131.51213
#> WTRUE 3.980409 0.1798443 3.62792 4.332897 *** 22.13252
(mod2 <- ppm(points ~ windows, data = mur))
#> Nonstationary Poisson process
#>
#> Log intensity: ~windows
#>
#> Fitted trend coefficients:
#> (Intercept) windowsTRUE
#> -21.918688 3.980409
#>
#> Estimate S.E. CI95.lo CI95.hi Ztest Zval
#> (Intercept) -21.918688 0.1666667 -22.24535 -21.592028 *** -131.51213
#> windowsTRUE 3.980409 0.1798443 3.62792 4.332897 *** 22.13252
If you insist on a hyperframe you should have a column for each measured
variable, but these are primarily used for when you have several replications
of an experiment, and is not of much use here. The function call is simply:
murhyp <- hyperframe(points = P, lines = L, windows = W)

How to convert intensities to Probabilities in a point pattern using Spatstat in R?

I have two points pattern (ppp) objects p1 and p2. There are X and Y points in p1 and p2 respectively. I have fitted a ppm model (with location coordinates as independent variables) in p1 and then used it to predict "intensity" for each of the Y points in p2.
Now I want to get the probability for event occurrence at that point/zone in p2. How can I use the predicted intensities for this purpose?
Can I do this using Spatstat?
Are there any other alternative.
The intensity is the expected number of points per unit area. In small areas (such as pixels) you can just multiply the intensity by the pixel area to get the probability of presence of a point in the pixel.
fit <- ppm(p1, .......)
inten <- predict(fit)
pixarea <- with(inten, xstep * ystep)
prob <- inten * pixarea
This rule is accurate provided the prob values are smaller than about 0.4.
In a larger region W, the expected number of points is the integral of the intensity function over that region:
EW <- integrate(inten, domain=W)
The result EW is a numeric value, the expected total number of points in W. To get the probability of at least one point,
P <- 1- exp(-EW)
You can also compute prediction intervals for the number of points, using predict.ppm with argument interval="prediction".
Your question, objective and current method are not very clear to me. It
would be beneficial, if you could provide code and graphics, that explains
more clearly what you have done, and what you are trying to obtain. If you
cannot share your data you can use e.g. the built-in dataset chorley as an
example (or simply simulate artificial data):
library(spatstat)
plot(chorley, cols = c(rgb(0,0,0,1), rgb(.8,0,0,.2)))
X <- split(chorley)
X1 <- X$lung
X2 <- X$larynx
mod <- ppm(X1 ~ polynom(x, y, 2))
inten <- predict(mod)
summary(inten)
#> real-valued pixel image
#> 128 x 128 pixel array (ny, nx)
#> enclosing rectangle: [343.45, 366.45] x [410.41, 431.79] km
#> dimensions of each pixel: 0.18 x 0.1670312 km
#> Image is defined on a subset of the rectangular grid
#> Subset area = 315.291058349571 square km
#> Subset area fraction = 0.641
#> Pixel values (inside window):
#> range = [0.002812544, 11.11172]
#> integral = 978.5737
#> mean = 3.103715
plot(inten)
Predicted intensities at the 58 locations in X2
intenX2 <- predict.ppm(mod, locations = X2)
summary(intenX2)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.1372 4.0025 6.0544 6.1012 8.6977 11.0375
These predicted intensities intenX2[i] say that in a small neighbourhood
around each point X2[i] the estimated number of points from X1 is Poisson
distributed with mean intenX2[i] times the area of the small neighbourhood.
So in fact you have estimated a model where in any small area you have a
probability distribution for any number of points happening in that area. If
you want the distribution in a bigger region you just have to integrate the
intensity over that region.
To get a better answer you have to provide more details about your problem.
Created on 2018-12-12 by the reprex package (v0.2.1)

Ways to calculate similarity

I am doing a community website that requires me to calculate the similarity between any two users. Each user is described with the following attributes:
age, skin type (oily, dry), hair type (long, short, medium), lifestyle (active outdoor lover, TV junky) and others.
Can anyone tell me how to go about this problem or point me to some resources?
Another way of computing (in R) all the pairwise dissimilarities (distances) between observations in the data set. The original variables may be of mixed types. The handling of nominal, ordinal, and (a)symmetric binary data is achieved by using the general dissimilarity coefficient of Gower (Gower, J. C. (1971) A general coefficient of similarity and some of its properties, Biometrics 27, 857–874). For more check out this on page 47. If x contains any columns of these data-types, Gower's coefficient will be used as the metric.
For example
x1 <- factor(c(10, 12, 25, 14, 29))
x2 <- factor(c("oily", "dry", "dry", "dry", "oily"))
x3 <- factor(c("medium", "short", "medium", "medium", "long"))
x4 <- factor(c("active outdoor lover", "TV junky", "TV junky", "active outdoor lover", "TV junky"))
x <- cbind(x1,x2,x3,x4)
library(cluster)
daisy(x, metric = "euclidean")
you'll get :
Dissimilarities :
1 2 3 4
2 2.000000
3 3.316625 2.236068
4 2.236068 1.732051 1.414214
5 4.242641 3.741657 1.732051 2.645751
If you are interested on a method for dimensionality reduction for categorical data (also a way to arrange variables into homogeneous clusters) check this
Give each attribute an appropriate weight, and add the differences between values.
enum SkinType
Dry, Medium, Oily
enum HairLength
Bald, Short, Medium, Long
UserDifference(user1, user2)
total := 0
total += abs(user1.Age - user2.Age) * 0.1
total += abs((int)user1.Skin - (int)user2.Skin) * 0.5
total += abs((int)user1.Hair - (int)user2.Hair) * 0.8
# etc...
return total
If you really need similarity instead of difference, use 1 / UserDifference(a, b)
You probably should take a look for
Data Mining and Data Warehousing (Essential)
Machine Learning (Extra)
Artificial Neural Networks (Especially SOM)
Pattern Recognition (Related)
These topics will let you your program recognize similarities and clusters in your users collection and try to adapt to them...
You can then know different hidden common groups of related users... (i.e users with green hair usually do not like watching TV..)
As an advice, try to use ready implemented tools for this feature instead of implementing it yourself...
Take a look at Open Directory Data Mining Projects
Three steps to achieve a simple subjective metric for difference between two datapoints that might work fine in your case:
Capture all your variables in a representative numeric variable, for example: skin type (oily=-1, dry=1), hair type (long=2, short=0, medium=1),lifestyle (active outdoor lover=1, TV junky=-1), age is a number.
Scale all numeric ranges so that they fit the relative importance you give them for indicating difference. For example: An age difference of 10 years is about as different as the difference between long and medium hair, and the difference between oily and dry skin. So 10 on the age scale is as different as 1 on the hair scale is as different as 2 on the skin scale, so scale the difference in age by 0.1, that in hair by 1 and and that in skin by 0.5
Use an appropriate distance metric to combine the differences between two people on the various scales in one overal difference. The smaller this number, the more similar they are. I'd suggest simple quadratic difference as a first attempt at your distance function.
Then the difference between two people could be calculated with (I assume Person.age, .skin, .hair, etc. have already gone through step 1 and are numeric):
double Difference(Person p1, Person p2) {
double agescale=0.1;
double skinscale=0.5;
double hairscale=1;
double lifestylescale=1;
double agediff = (p1.age-p2.age)*agescale;
double skindiff = (p1.skin-p2.skin)*skinscale;
double hairdiff = (p1.hair-p2.hair)*hairscale;
double lifestylediff = (p1.lifestyle-p2.lifestyle)*lifestylescale;
double diff = sqrt(agediff^2 + skindiff^2 + hairdiff^2 + lifestylediff^2);
return diff;
}
Note that diff in this example is not on a nice scale like (0..1). It's value can range from 0 (no difference) to something large (high difference). Also, this method is almost completely unscientific, it is just designed to quickly give you a working difference metric.
Look at algorithms for computing srting difference. Its very similar to what you need. Store your attributes as a bit string and compute the distance between the strings
You should read these two topics.
Most popular clustering algorithm k - means
And similarity matrix are essential in clustering

Resources