I am writing a ray tracer and I wish to fire rays from a point p into a hemisphere above that point according to some distribution.
1) I have derived a method to uniformly sample within a solid angle (defined by theta) above p Image
phi = 2*pi*X_1
alpha = arccos (1-(1-cos(theta))*X_2)
x = sin(alpha)*cos(phi)
y = sin(alpha)*sin*phi
z = -cos(alpha)
Where X is a uniform random number
That works and Im pretty happy with that. But my question is what happens if I do not want a uniform distribution.
I have used the algorithm on page 27 from here and I can draw samples from a piecewise arbitrary distribution. However if I simply say:
alpha = arccos (1-(1-cos(theta)) B1)
Where B is a random number generated from an arbiatry distribution.
It doesn't behave nicely...What am I doing wrong? Thanks in advance. I really really need help on this
Additional:
Perhaps I am asking a leading question. Taking a step back:
Is there a way to generate points on a hemisphere according to an arbitrary distribution. I have a method for uniformly sampling a hemisphere and one for cosine-weighted hemisphere sampling. (pg 663-669 pbrt.org)
With an uniform distribution, you can just average the sample results and obtain the correct result. This is equivalent to divide each sample result by the sample Probability Density Function (PDF) and, in the case of an uniform distribution, it is just 1 / sample_count (i.e. the same of averaging the results).
With an arbitrary distribution, you have still to divide the sample result by the sample PDF however the PDF now depends on the arbitrary distribution you are using. I assume your error is here.
Related
Apologies for the overlap with existing questions; mine is at a more basic skill level. I am working with very sparse occurrences spanning very large areas, so I would like to calculate probability at pixels using the density.ppp function (as opposed to relrisk.ppp, where specifying presences+absences would be computationally intractable). Is there a straightforward way to convert density (intensity) to probabilities at each point?
Maxdist=50
dtruncauchy=function(x,L=60) L/(diff(atan(c(-1,1)*Maxdist/L)) * (L^2 + x^2))
dispersfun=function(x,y) dtruncauchy(sqrt(x^2+y^2))
n=1e3; PPP=ppp(1:n,1:n, c(1,n),c(1,n), marks=rep(1,n));
density.ppp(PPP,cutoff=Maxdist,kernel=dispersfun,at="points",leaveoneout=FALSE) #convert to probabilies?
Thank you!!
I think there is a misunderstanding about fundamentals. The spatstat package is designed mainly for analysing "mapped point patterns", datasets which record the locations where events occurred or things were located. It is designed for "presence-only" data, not "presence/absence" data (with some exceptions).
The relrisk function expects input data about the presence of two different types of events, such as the mapped locations of trees belonging to two different species, and then estimates the spatially-varying probability that a tree will belong to each species.
If you have 'presence-only' data stored in a point pattern object X of class "ppp", then density(X, ....) will produce a pixel image of the spatially-varying intensity (expected number of points per unit area). For example if the spatial coordinates were expressed in metres, then the intensity values are "points per square metre". If you want to calculate the probability of presence in each pixel (i.e. for each pixel, the probability that there is at least one presence point in the pixel), you just need to multiply the intensity value by the area of one pixel, which gives the expected number of points in the pixel. If pixels are small (the usual case) then the presence probability is just equal to this value. For physically larger pixels the probability is 1 - exp(-m) where m is the expected number of points.
Example:
X <- redwood
D <- density(X, 0.2)
pixarea <- with(D, xstep * ystep)
M <- pixarea * D
p <- 1 - exp(-M)
then M and p are images which should be almost equal, and can both be interpreted as probability of presence.
For more information see Chapter 6 of the spatstat book.
If, instead, you had a pixel image of presence/absence data, with pixel values equal to 1 or 0 for presence or absence respectively, then you can just use the function blur in the spatstat package to perform kernel smoothing of the image, and the resulting pixel values are presence probabilities.
I am trying to implement a gradient descent method to fit an EMG as shown in this paper. The paper describes two versions of the EMG equation:
(1)
$$F(t)= \frac{h\cdot\sigma}{\tau}\sqrt{\frac{\pi}{2}}e^{(\frac{\mu-t}
{\tau}+\frac{\sigma^2}{2\tau^2})}\cdot erfc(\frac{1}{\sqrt{2}}(\frac{\mu-t} {\sigma}+\frac{\sigma}{\tau}))$$
and
(2)
[F(t)= h\cdot e^{\frac{-(\mu-t)^2}{2\sigma^2}}\cdot \frac{\sigma}{\tau}\sqrt{\frac{\pi}{2}} \cdot erfcx(\frac{1}{\sqrt{2}}(\frac{\mu-t}{\sigma}+\frac{\sigma}{\tau}))]
The paper defines z as:
[z = \frac{1}{\sqrt{2}}(\frac{\mu-t}{\sigma}+\frac{\sigma}{\tau}))"]
If z is negative equation (1) is used, otherwise equation (2) is used to prevent the function from blowing up. I'm implementing a batch gradient algorithm following the outline from this site, knowing my objective function is a bit different. I'm setting my theta as the following:
[\theta_j = [\mu, h, sigma, \tau]]
So my gradient update formula is the following:
[\theta_j^{k+1} = \theta_j^{k} - \frac{1}{m}\alpha\sum_{i=0}^{m} (F(t_i) - y_i)\frac{\partial F(t_i)}{\partial \theta_j^{k}}]
I've used wolfram alpha to determine all the partial derivatives. My total samples are >4000, so I'm used a batch of around 300 samples to speed up the process. I've found that I have to adjust the initial parameters very slightly to get the best results, otherwise the gradient will just blow-up.
The paper also discusses finding the time coordinate of the EMG Peak, which I'm not sure how the is useful in the gradient descent algorithm. It is found with the following:
[t_0 = \mu + y \cdot \sigma \cdot \sqrt{2}-\frac{\sigma^2}{\tau}]
where y is:
erfcx(y) = \frac{\tau}{\sigma}\sqrt{\frac{2}{\pi}}
So my main questions are:
Am I setting up the gradient descent incorrectly for a non-linear equation?
How does the determination of the time coordinate help with the algorithm?
Please let me know if there is any additional information I can provide. The algorithm itself is each and I didn't want to clog up the post with my function definitions of each gradient.
I am working on a simple AI program that classifies shapes using unsupervised learning method. Essentially I use the number of sides and angles between the sides and generate aggregates percentages to an ideal value of a shape. This helps me create some fuzzingness in the result.
The problem is how do I represent the degree of error or confidence in the classification? For example: a small rectangle that looks very much like a square would yield night membership values from the two categories but can I represent the degree of error?
Thanks
Your confidence is based on used model. For example, if you are simply applying some rules based on the number of angles (or sides), you have some multi dimensional representation of objects:
feature 0, feature 1, ..., feature m
Nice, statistical approach
You can define some kind of confidence intervals, baesd on your empirical results, eg. you can fit multi-dimensional gaussian distribution to your empirical observations of "rectangle objects", and once you get a new object you simply check the probability of such value in your gaussian distribution, and have your confidence (which would be quite well justified with assumption, that your "observation" errors have normal distribution).
Distance based, simple approach
Less statistical approach would be to directly take your model's decision factor and compress it to the [0,1] interaval. For example, if you simply measure distance from some perfect shape to your new object in some metric (which yields results in [0,inf)) you could map it using some sigmoid-like function, eg.
conf( object, perfect_shape ) = 1 - tanh( distance( object, perfect_shape ) )
Hyperbolic tangent will "squash" values to the [0,1] interval, and the only remaining thing to do would be to select some scaling factor (as it grows quite quickly)
Such approach would be less valid in the mathematical terms, but would be similar to the approach taken in neural networks.
Relative approach
And more probabilistic approach could be also defined using your distance metric. If you have distances to each of your "perfect shapes" you can calculate the probability of an object being classified as some class with assumption, that classification is being performed at random, with probiability proportional to the inverse of the distance to the perfect shape.
dist(object, perfect_shape1) = d_1
dist(object, perfect_shape2) = d_2
dist(object, perfect_shape3) = d_3
...
inv( d_i )
conf(object, class_i) = -------------------
sum_j inv( d_j )
where
inv( d_i ) = max( d_j ) - d_i
Conclusions
First two ideas can be also incorporated into the third one to make use of knowledge of all the classes. In your particular example, the third approach should result in confidence of around 0.5 for both rectangle and circle, while in the first example it would be something closer to 0.01 (depending on how many so small objects would you have in the "training" set), which shows the difference - first two approaches show your confidence in classifing as a particular shape itself, while the third one shows relative confidence (so it can be low iff it is high for some other class, while the first two can simply answer "no classification is confident")
Building slightly on what lejlot has put forward; my preference would be to use the Mahalanobis distance with some squashing function. The Mahalanobis distance M(V, p) allows you to measure the distance between a distribution V and a point p.
In your case, I would use "perfect" examples of each class to generate the distribution V and p is the classification you want the confidence of. You can then use something along the lines of the following to be your confidence interval.
1-tanh( M(V, p) )
I apologise for the newbishness of this question in advance but I am stuck. I am trying to solve this question,
I can do parts i)-1v) but I am stuck on v. I know to calculate the margin y, you do
y=2/||W||
and I know that W is the normal to the hyperplane, I just don't know how to calculate it. Is this always
W=[1;1] ?
Similarly, the bias, W^T * x + b = 0
how do I find the value x from the data points? Thank you for your help.
Consider building an SVM over the (very little) data set shown in Picture for an example like this, the maximum margin weight vector will be parallel to the shortest line connecting points of the two classes, that is, the line between and , giving a weight vector of . The optimal decision surface is orthogonal to that line and intersects it at the halfway point. Therefore, it passes through . So, the SVM decision boundary is:
Working algebraically, with the standard constraint that , we seek to minimize . This happens when this constraint is satisfied with equality by the two support vectors. Further we know that the solution is for some . So we have that:
Therefore a=2/5 and b=-11/5, and . So the optimal hyperplane is given by
and b= -11/5 .
The margin boundary is
This answer can be confirmed geometrically by examining picture.
My original question
I read that to convert a RGB pixel into greyscale RGB, one should use
r_new = g_new = b_new = r_old * 0.3 + g_old * 0.59 + b_old * 0.11
I also read, and understand, that g has a higher weighting because the human eye is more sensitive to green. Implementing that, I saw the results were the same as I would get from setting an image to 'greyscale' in an image editor like the Gimp.
Before I read this, I imagined that to convert a pixel to greyscale, one would convert it to HSL or HSV, then set the saturation to zero (hence, removing all colour). However, when I did this, I got a quite different image output, even though it also lacked colour.
How does s = 0 exactly differ from the 'correct' way I read, and why is it 'incorrect'?
Ongoing findings based on answers and other research
It appears that which luminance coefficients to use is the subject of some debate. Various combinations and to-greyscale algorithms have different results. The following are some presets used in areas like TV standards:
the coefficients defined by ITU-R BT.601 (NTSC?) are 0.299r + 0.587g + 0.114b
the coefficients defined by ITU-R BT.709 (newer) are 0.2126r + 0.7152g + 0.0722b
the coefficients of equal thirds, (1/3)(rgb), is equivalent to s = 0
This scientific article details various greyscale techniques and their results for various images, plus subjective survey of 119 people.
However, when converting an image to greyscale, to achieve the 'best' artistic effect, one will almost certainly not be using these predefined coefficients, but tweaking the contribution from each channel to produce the best output for the particular image.
Although these transformation coefficients exist, nothing binds you to using them. As long as the total intensity of each pixel is unchanged, the contributions from each channel can be anything, ranging from 0 to 100%.
Photographers converting images to grayscale use channel mixers to adjust levels of each channel (RGB or CMYK). In your image, there are many reds and greens, so it might be desirable (depending on your intent) to have those channels more highly represented in the gray level intensity than the blue.
This is what distinguishes "scientific" transformation of the image from an "artistic" combination of the bands.
An additional consideration is the dynamic range of values in each band, and attempting to preserve them in the grayscale image. Boosting shadows and/or highlights might require increasing the contribution of the blue band, for example.
An interesting article on the topic here.... "because human eyes don't detect brightness linearly with color".
http://www.scantips.com/lumin.html
Looks like these coefficients come from old CRT technology and are not well adapted to today's monitors, from the Color FAQ:
The coefficients 0.299, 0.587 and
0.114 properly computed luminance for monitors having phosphors that were
contemporary at the introduction of
NTSC television in 1953. They are
still appropriate for computing video
luma to be discussed below in section
11. However, these coefficients do not accurately compute luminance for
contemporary monitors.
Couldn't find the right conversion coefficient, however.
See also RGB to monochrome conversion
Using s = 0 in HSL/HSV and converting to RGB results in R = G = B, so is the same as doing r_old * 1/3 + g_old * 1/3 + b_old * 1/3.
To understand why, have a look at the Wikipedia page that describes conversion HSV->RGB. Saturation s will be 0, so C and X will be, too. You'll end up with R_1,G_1,B_1 being (0,0,0) and then add m to the final RGB values which results in (m,m,m) = (V,V,V). Same for HSL, result will be (m,m,m) = (L,L,L).
EDIT: OK, just figured out the above is not the complete answer, although it's a good starting point. RGB values will be all the same, either L or V, but it still depends on how L and V were originally calculated, again, see Wikipedia. Seems the program/formulas you've used for converting used the 1/3 * R + 1/3 * G + 1/3 * B solution or one of the other two (hexcone/bi-hexcone).
So after all, using HSL/HSV just means you'll have to decide which formula to use earlier and conversion to RGB grayscale values later is just isolating the last component.