How to calculate added path length (APL) image segmentation metric? - geometry

Background
I'm trying to calculate the added path length (APL) metric used in segmantic segmentation for radiotherapy treatment planning. The metric originated in this paper but I can't find any explanation on how to calculate it, just the following figure that indicates shape A (black), and the surface edits (dotted yellow lines) required to create shape B:
added path length
Currently I'm calculating this metric by summing all the surface pixels that are in shape B, but not in shape A (similar to this) and multiplying by pixel width (assuming isometric pixels) to obtain a value in mm. I've also added a tolerance parameter that allows some deviation between surfaces of shape A and B before considering the surface as "edited".
Questions
Any good references for how the original authors calculated this metric?
Any thoughts on going from voxel to mm version of this metric?

I had the same problem, and implemented a version of the APL metric in platipy. The code is available here. This is open-source Python code.
$ pip install platipy
(Note - you may have to update pip with pip install -U pip if you get errors).
from platipy.imaging.label.comparison import compute_metric_total_apl
compute_metric_total_apl(label_A, label_B, distance_threshold=3)
This will return the (total) APL in millimetres. You may also find the function compute_metric_mean_apl useful, which computes the slice-wise averaged APL.
You may notice the added variable distance_threshold. If the two contours are closer than this distance, they are considered identical. It is used to make the APL more sensitive to true differences (i.e. ignore negligible, voxel-scale variations). Just set it to zero to get the APL as per the original definition.

Related

Can I tell Excel to keep an automated quadratic model y>= 0, without trying to run a manual model?

I am trying to create a quadratic model in Excel with tennis ranks data.
When running the automatic model trendline function it gives me a model with negative y values, which can obviously not occur for ranks.
How do I tell Excel to keep model y-values >=0?
Thank you!
Screenshot below/here refer.
There are several advantages to understanding the formulation / construct of the quadratic trend. For instance, replicating the 'automatic trend' using 'linest' as follows provides the user with additional control over the individual terms, and can highlight any graphical errors:
=$L$3+$K$3*D3+$J$3*D3^2+$I$3*D3^3
This demonstrates a cubic regression (white dots) - which coincide with Excel's 'automatic' trend line.
Summary of possible issues
There are several potential issues | remedies you can consider, depending on the goodness of fit, data in question, etc. A non-exhaustive list of issues you may be encountering include the following:
Issue
Resolve
1) Overfitting
Reduce # terms (e.g. order = 2 instead of 3 etc.)
2) Wrong fit
Attempt Lognormal
3) Negative left
Set Intercept
4) Graphical error
Use scatter chart, sort x values (ascending)
5) Outliers
Various: exclude/adjust, fit separate curve (Extreme Value Theory), manually adjust polynomial terms noting reduction to goodness of fit etc.
1) Overfitting
Trendline options: reduce the order per screenshot:
2) Lognormal | Other
Transform / consider other fits/curves (you can also place y and x axes on lognormal scale which will automatically remove negatives, although consider outliers and impact upon R-squared / goodness of fit).
3) Negative left
In certain circumstances, a negative left may be removed by setting the intercept to an appropriate value.
4) Graphical error
It's often easier to use a scatter chart, with x-values ordered per description (regression parameters may be affected otherwise).
5) Outliers
It may be the case you're fitting to 1 or 2 outliers. Consider reducing complexity/number of terms; or adjusting / omitting outliers suitably. There is an entire branch of statistics that deal with the distribution of extreme values/outliers (Extreme Value Theory - beyond scope of present answer).
Other remarks:
Rounding errors in the automatic trend-line function can lead to inaccuracies; human-error in replicating 'automatic trend-line' displayed on the chart - suggesting linest / exact formulation preferable).
Reference(s)
Data / formulation for first screenshot here
Useful video content: here

How can I center a continuous raster layer by subtracting the mean and dividing by the standard deviation using qgis raster calculator?

I'm trying to extrapolate a spatial model to a map in QGIS using the raster calculator. The model predicts the value of a pixel given the environmental conditions at this location (e.g. elevation, slope, ect.). Hence, each explanatory variable is a raster layer. For the model I centered my covariates by subtracting the mean and dividing by the standard deviation. Now I want to do the same to my raster layers. As trivial as this seems, I didn't manage to do so. I tried (among similar attempts - like changing upper case, etc.):
(my.raster-mean(my.raster))/sd(my.raster)
The calculator tells me the expression is valid, however, mean() and sd() don't work. And I could not find a site with the built-in functions for the raster calculator.
Any help in this is most apreciated! Many thanks in advance!

Suggestion on matplotlib colors : Need distinct shades.

I'm plotting a voronoi diagram in which I shade the polygons depending on a proportional probability( By which I mean, If I were to plot give polygons their total probability might be 1).This is my code where I give the facecolor as the probability value.
matplotlib.patches.Polygon(poly, facecolor= probList[i])
The problem is the shades are not distinct enough to reflect my probability values. I'm fine with going any colors as long as the shades reflect probability.StackOverflow ppl, please throw in your suggestions.Thanks!
Picking from matplotlibs colormaps is probably a good start. The link shows all of the preset colormaps.
My favorites (and a common choice) for ordered values (like probability, which goes from zero to one) are hot or afmhot, because they show good distinction of intermediate values and have clear perceptual ordering.
Below are the sequential colormaps from matplotlib (taken from the reference above). Or, see the full set (again, at the reference above) if you want more distinction at the cost of less obvious ordering. (Even if you choose to not use a sequential colormap, you might still want to avoid the unfortunately popular jet colormap because, amongst other reasons, it starts and ends with dark colors, making it hard to understand).

How do I deal with color spaces and gamma in the PNG file format?

Here's my problem:
I'm doing some rendering using spectral samples, and want to save an image showing the results. I am weighting my spectral power function by the CIE XYZ color matching functions to obtain an XYZ color space result. I multiply this XYZ color tuple by the matrix given by this page for converting to sRGB, and clamp the results to (0,1).
To save the image, I scale the converted tuple by 255 and cast it to bytes, and pass the array to libpng's png_write_image(). When I view a uniform intensity, pure-color spectrum rendered this way, it looks wrong; there are dark bands in the transitions between the colors. This is perhaps not surprising, because to convert from XYZ to sRGB, the color components must be raised to 2.4 after the matrix multiply (or linearly scaled if they are small enough). But if I do this, it looks worse! Only after raising to 1/2.2 does it start to look right. It seems like, in the absence of me doing anything, the decoded images are having a gamma of ~2.2 applied twice.
Here's the way I would expect it to work: I apply the matrix to XYZ, and I have a roughly energy-linear RGB tuple. I raise this to 2.2, and now have a perceptually linear color tuple. I encode these numbers as they are (thus making efficient use of the file precision), and store a field in the file that says "these bytes have been encoded with gamma 2.2". Then at image load time, the decoding system un-applies the encoded gamma, then applies the system gamma before display. (And thus from an authoring perspective, I shouldn't have to care what the viewer's system gamma is). But the results I'm getting suggest it doesn't work this way.
Worse, I have tried calling png_set_gAMA() with both 2.2 and 1/2.2 and see no difference in the image. I get similar results with png_set_sRGB() (which I believe should force the gamma to 1/2.2).
There must be something I have backwards or don't understand with regards to either how I should be converting my color values, or how PNG handles gamma and color spaces. To break this down into a few clarifying questions:
What is the color space of the byte values I am expected to pass to write_png()?
What calls, if any, must I make to libpng in order to specify the color space and gamma of the passed bytes, to ensure proper display? Why might they fail?
How does the gamma field in the the png file relate to the exponent I have applied to the passed color channel values, if any?
If I am expected to invert a gamma curve before sending my image data (which I doubt, but seems necessary now), should that inversion include the linear part of the sRGB curve?
Furthermore, I see hints that "white point" matters in conversion between XYZ and sRGB. It is unclear to me whether the matrices in the site given above include a renormalization to D65 (it does not match Wikipedia's matrix)-- or even when such a conversion is necessary. Most of the literature I've found glosses over the details. Is there yet another step in the conversion not mentioned in the wiki article, or will this be handled automatically?
It is pretty much the way you expected. png_set_gAMA() causes libpng to write a gAMA
chunk in the output PNG file. It doesn't change the pixels themselves. A png-compliant
viewer is supposed to use the gamma value from the chunk, combined with the gamma of the display, to write the pixel intensity values properly on the display. Most decoders won't actually do the two-step (unapply the image gamma, then apply the system gamma) method you described, although the result is conceptually the same: It will combine the image gamma with the system gamma to create a lookup table, then use that table to convert the pixels in one step.
From what you observed (gamma=2.2 and gamma=1/2.2 behaving the same), it appears that you are using a viewer that doesn't do anything with the PNG gAMA chunk data.
You said:
because to convert from XYZ to sRGB, the color components must be raised to 2.4 after the matrix multiply...
No, this is incorrect. Going from linear (XYZ) to sRGB, you do NOT raise to 2.4 nor 2.2, that is for going FROM sRGB to linear.
Going from linear to sRGB you raise to ^(1/2.2) or if using the sRGB piecewise, you'll see 1/2.4 — the effective gamma you are applying is ^0.45455
On the wikipedia page you linked, this is the FORWARD transformation.
From XYZ to sRGB:
That of course being after the correct matrix is applied. Assuming everything is in D65, then:
Straight Talk about Linear
Light in the real world is linear. If you triple 100 photons, you then have 300 photons. But the human eye does not see a trippling, we see only a modest increast by comparison.
This is in part why transfer curves or "gamma" is used, to make the most of the available code space in an 8 bit image (oversimplification on my part I know).
To do this, a linear light value is raised to the power of 0.455, and to get that sRGB value back to a linear space, then we raise it to the inverse, i.e. ^1/0.455 otherwise known as ^2.2
The use of the matrixes must be done in linear space. but after transiting the matrix, you need to apply the trc or "gamma" encoding. Based on your statements, no, things are not having 2.2 added twice, you are simply going the wrong way.
You wrote: " It seems like, in the absence of me doing anything, the decoded images are having a gamma of ~2.2 applied twice. "
I think your monitor (hardwrare or your systems icc profile) has already a gamma setting itself.

Create CDF for Anderson Darling test for Octave forge Statistics package function

I am using Octave and I would like to use the anderson_darling_test from the Octave forge Statistics package to test if two vectors of data are drawn from the same statistical distribution. Furthermore, the reference distribution is unlikely to be "normal". This reference distribution will be the known distribution and taken from the help for the above function " 'If you are selecting from a known distribution, convert your values into CDF values for the distribution and use "uniform'. "
My question therefore is: how would I convert my data values into CDF values for the reference distribution?
Some background information for the problem: I have a vector of raw data values from which I extract the cyclic component (this will be the reference distribution); I then wish to compare this cyclic component with the raw data itself to see if the raw data is essentially cyclic in nature. If the the null hypothesis that the two are the same can be rejected I will then know that most of the movement in the raw data is not due to cyclic influences but is due to either trend or just noise.
If your data has a specific distribution, for instance beta(3,3) then
p = betacdf(x, 3, 3)
will be uniform by the definition of a CDF. If you want to transform it to a normal, you can just call the inverse CDF function
x=norminv(p,0,1)
on the uniform p. Once transformed, use your favorite test. I'm not sure I understand your data, but you might consider using a Kolmogorov-Smirnov test instead, which is a nonparametric test of distributional equality.
Your approach is misguided in multiple ways. Several points:
The Anderson-Darling test implemented in Octave forge is a one-sample test: it requires one vector of data and a reference distribution. The distribution should be known - not come from data. While you quote the help-file correctly about using a CDF and the "uniform" option for a distribution that is not built in, you are ignoring the next sentence of the same help file:
Do not use "uniform" if the distribution parameters are estimated from the data itself, as this sharply biases the A^2 statistic toward smaller values.
So, don't do it.
Even if you found or wrote a function implementing a proper two-sample Anderson-Darling or Kolmogorov-Smirnov test, you would still be left with a couple of problems:
Your samples (the data and the cyclic part estimated from the data) are not independent, and these tests assume independence.
Given your description, I assume there is some sort of time predictor involved. So even if the distributions would coincide, that does not mean they coincide at the same time-points, because comparing distributions collapses over the time.
The distribution of cyclic trend + error would not expected to be the same as the distribution of the cyclic trend alone. Suppose the trend is sin(t). Then it never will go above 1. Now add a normally distributed random error term with standard deviation 0.1 (small, so that the trend is dominant). Obviously you could get values well above 1.
We do not have enough information to figure out the proper thing to do, and it is not really a programming question anyway. Look up time series theory - separating cyclic components is a major topic there. But many reasonable analyses will probably be based on the residuals: (observed value - predicted from cyclic component). You will still have to be careful about auto-correlation and other complexities, but at least it will be a move in the right direction.

Resources