Histogram plots in pymc, what do different aspects mean?

Histogram plots in pymc, what do different aspects mean? - statistics

I have defined a stochastic random variable (and many more but for the sake of this question, one is enough)
tau = pm.DiscreteUniform("tau", lower = 0, upper = 74)
After sampling using MCMC, when I plot the trace of tau, I get the following figure
Now my question is What do this black line and the two dotted lines denote ?
In all earlier figures that I had seen, the black line used to divide the area under histogram under 2 halves (almost) and dotted lines would also cover almost same are around the black line, so I used to think the bold line as mean value and the 2 dotted lines as 95% confidence interval (quite obviously I am wrong).
I will also like to verify my understanding about the height of the histogram.
According to me, the height of the histogram at 45 denotes the number of times, the sampler picked up the value 45, please correct me if I am wrong

The lines are the median (solid line) and the interquartile range (dotted lines). The histograms just illustrate the frequencies of the sample values.

Related

What is the endpoint calculation in the Xiaolin Wu algorithm doing?

The Xiaolin Wu algorithm draws an anti-aliased line between two points. The points can be at sub-pixel, i.e. non-integer coordinates. I'll assume the reader is familiar with the algorithm and just recall the important features. We loop across the major (longer) axis of the line, let's say it's the x-axis, basically proceeding column-by-column. In each column we color two pixels. The computation is equivalent to this: place a 1x1 square centered on the line, at the point whose x coordinate is the center of the the given column of pixels. Let's call it S. If we think of each pixel as a 1x1 square in the plane, we now calculate the area of intersection between S and each of the two pixels it straddles, and use those areas as the intensities with which to color each pixel.
That's nice and clear, but what is going on with the calculations for the endpoints? Because the endpoints can be at non-integer positions, they have to be treated as a special case. Here's the pseudocode from the linked Wikipedia article for handling the first endpoint x0, y0:
// handle first endpoint
xend := round(x0)
yend := y0 + gradient * (xend - x0)
xgap := rfpart(x0 + 0.5)
xpxl1 := xend // this will be used in the main loop
ypxl1 := ipart(yend)
plot(ypxl1, xpxl1, rfpart(yend) * xgap)
plot(ypxl1+1, xpxl1, fpart(yend) * xgap)
I edited out the if (steep) condition, so this is the code for the case when the slope of the line is less than 1. rfpart is 1-fpart, and fpart is the fractional part. ipart is the integer part.
I just have no idea what this calculation is supposed to be doing, and I can't find any explanations online. I can see that yend is the y-coordinate of the line above xend, and xend is the x coordinate of the pixel that the starting point (x0, y0) is inside of. Why are we even bothering to calculate yend? It's as if we're extending the line until the nearest integer x-coordinate.
I realize that we're coloring both the pixel that the endpoint is in, and the pixel either immediately above or below it, using certain intensities. I just don't understand the logic behind where those intensities come from.

With the Xiaolin Wu algorithm (and sub-pixel rendering techniques in general) we imagine that the screen is a continuous geometric plane, and each pixel is a 1x1 square region of that plane. We identify the centers of the pixels as being the points with integer coordinates.
First, we find the so-called "major axis" of the line, the axis along which the line is longest. Let's say that it's the x axis. We now loop across each one-pixel-wide column that the line passes through. For each column, we find the point on the line which is at the center of that column, i.e. such that the x-axis is an integer. We imagine there's a 1x1 square centered at that point. That square will completely fill the width of that column and will overlap two different pixels. We color each of those pixels according to the area of the overlap between the square and the pixel.
For the endpoints, we do things slightly differently: we still draw a square centered at the place where the line crosses the centerline of the column, but we cut that square off in the horizontal direction at the endpoint of the line. This is illustrated below.
This is a zoomed-in view of four pixels. The black crosses represent the centers of those pixels, and the red line is the line we want to draw. The red circle (x0, y0) is the starting point for the line, the line should extend from that point off to the right.
You can see the grey squares centered on the red crosses. Each pixel is going to be colored according to the area of overlap with those squares. However, in the left-hand column, we cut-off the square at x-coordinate x0. In light grey you can see the entire square, but only the part in dark grey is used for the area calculation. There are probably other ways we could have handled the endpoints, for instance we could have shifted the dark grey region up a bit so it's vertically centered at the y-coordinate y0. Presumably it doesn't make much visible difference, and this is computationally efficient.
I've annotated the drawing using the names of variables from the pseudocode on Wikipedia.

The algorithm is approximate at endpoints. This is justified because exact computation would be fairly complex (and depend on the type of endpoint), for a result barely perceivable. What matters is aliasing along the segment.

Shading Area Between Two Line Charts and Axes

I am on day #2 of searching the web and, while I have found plenty of hits that seem like they should work, none of them seem to apply to my particular situation.
I have an Excel chart with two series displayed. One is a sort of exponential decay curve, and one is a constant that intersects with the exponential curve, but does not continue past it (the final x-value of the orange line is estimated to make it look like it intersects the blue curve):
The raw data for the blue curve is as follows (leaving off data labels for confidentiality reasons, but x-values are on the left and y-values are on the right):
The orange line is simply set at 24 all the way across until it intersects with the blue curve.
So here's the problem I need to solve: I need to fill in all of the area below the blue curve with one color, and I need to fill in the area below the orange line with another color. Everything above the blue curve needs to be blank (transparent). Here's an illustration of what I want:
I know in order to get the coloring/shading I need to use an area chart. However, when I try to change the chart type to Area the scales of the axes change for each series and they no longer match up, and I am unable to edit the axes (can't set min, max, etc) to make them match up again. Additionally, only the area directly beneath the constant line fills in (as expected), but I am looking for a way to fill in the area between the orange line, the blue curve, and the axes:
How might one go about doing what I need to do?
If there's any other information I could provide that would be of help, please let me know and I'll be sure to add it in.
EDIT:
I can extend the orange line to follow the blue line off to the right, which may help fill in the lower area. However, when I switch to an area chart I still get the issue with mismatched axes with scale I can't edit:
Notice how the "567" point (the x-value where the orange line should intersect the blue curve) is spaced evenly between "500" and "600", rather than scaling slightly to the right of center as I would have expected.
How do I keep the spacing of one tick every 100 units on the x-axis but keep the datapoint for 567?

You could find the intersection point's coordinates (graphically or analitically), then split your data in two separate series within the same graph as follows :
Edit post comment section :
For some reason x-values are considered by default as text.
Righ click the x-axis > format > Select date on the axis
Then play with the principal and base in days/months to have the intervals you want.
Good parameters for this data :
main : 100 in days
base : in days

I would just have two identical charts : one does the blue and the other the orange then lay the orange chart on top of the blue and make it transparent ... worked a treat in the past...

Interpolated curves between existing curves do not look correct

I have a chart that has several existing curves on it that I have tried to interpolate new curves in between. I have used linear interpolation in the form of y = ((x - x1)(y2 - y1) / (x2 - x1)) + y1, however the new curves look out of place.
You can see in the picture that every second line (from the bottom) is the interpolated line. While the very second line data points are exactly centered between the first and third data points in the y axis, the third line data points are not centered between the second and fourth y data points, making the graph look skew.
So I am thinking linear interpolation may not be what I am after here. Can someone recommend another method that would create curves between the existing ones, but replicates the same form?

Sudden changes in gradient are hard to interpolate. When you're at the point where you want an interpolated line to suddenly change gradient, there is no information from the points in close proximity that give information as to where the sudden change in gradient should occur.
To replicate the pattern, you actually need to copy the gradient of the line below then smoothly transition to the gradient of the line above. Visually we can see that it should occur half way between the change in gradients for the lines above and below on either side, but detecting the locations of those changes is not trivial.
The points where the sudden change in gradient are occurring are separated by a large change in the x-axis by only a small change in the y-axis. When calculating y-values for x-values in between the the changes in gradient you get the aberrations. I suggest trying to interpolate x-values based on y-values instead. For each curve, for each small arbitrary step in the y-axis, find/calculate the closest x-values from the curve on either side and take the average to plot your interpolation.

An unconventional approach may be a piece-meal style of interpolation. It may be possible to model the 3 regions of different gradients separately.
Start by identifying the 2 lines that would be drawn through the 2 sets of kinks, creating 3 regions of space. The vertical line would stop at the horizontal line near the bottom right corner of the graph.
For each region (and potentially for each value of x in each region) determine the gradient of the lines. When you're doing your interpolation of a new line, for each starting point (x1, y1), look at which region it falls in. Use the gradient of that region as a significant factor when determining the next point. Keep doing this until you reach a region boundary. When the interpolated point crosses into a different region, then use the gradient of that region as a significant factor to interpolate the next point.
It will be quite pointy if you did this strictly, so graph with some smoothing (or incorporate a smoothing factor using weighted averages of the gradients as you transition between regions, but this could be a whole lot of effort without necessarily closer results!)

Gnuplot_Set line style_frequency of the points

Hello I have plotted data from three different files against time series. I have used different line colours and line points. I have two questions regarding the line points.
In the plot below the frequency of the line points differ in one of the output. I could not figure out the reason for it as I used the same code for all the three outputs.
set style line1 lc rgb 83b300 lw 4 pt 4 pi500 ps2
The input files for the dark green and light green colour outputs contains the data in time steps of 0.01 seconds whereas the input file for the orange colour output contain the data in time steps of 0.02. Could it be the reason for different frequency of the line points?
Is it possible to get the line points wiht a phase shift? I mean all the line points should not be alligned in the same vertical line and there should be some phase shift. so that it will be easy to distinguish if all the three outputs fall on the same line

I think you answered your first question already, since you have a different sampling in one of the three files.
For the second part, you could sample the three files with different time steps that are not divisible by one another, then the point would not align. You could also introduce a shift doing
plot "file" using ($1+0.005):2 ...
but then the plot would not ultimately reflect the underlying data.
As a final comment, why is your y-range so large?

how to choose a range for filtering points by RGB color?

I have an image and I am picking colors by RGB (data sampling). I select N points from a specific region in the image which has the "same" color. By "same" I mean, that part of the image belongs to an object, (let's say a yellow object). Each picked point in the RGB case has three values [R,G,B]. For example: [120,150,225]. And the maximum and minimum for each field are 255 and 0 respectively.
Let's assume that I picked N points from the region of the object in the image. The points obviously have different RGB values but from the same family (a gradient of the specific color).
Question:
I want to find a range for each RGB field that when I apply a color filter on the image the pixels related to that specific object remain (to be considered as inliers). Is it correct to find the maximum and minimum from the sampled points and consider them as the filter range? For example if the max and min of the field R are 120 ,170 respectively, can it be used as a the range that should be kept.
In my opinion, the idea is not true. Because when choosing the max and min of a set of sampled data some points will be out of that range and also there will be some point on the object that doesn't fit in this range.
What is a better solution to include more points as inliers?
If anybody needs to see collected data samples, please let me know.

I am not sure I fully grasp what you are asking for, but in my opinion filtering in RGB is not the way to go. You should use a different color space than RGB if you want to compare pixels of similar color. RGB is good for representing colors on a screen, but you actually want to look at the hue, saturation and intensity (lightness, or luminance) for analysing visible similarities in colors.
For example, you should convert your pixels to HSI or HSL color space first, then compare the different parameters you get. At that point, it is more natural to compare the resulting hue in a hue range, saturation in a saturation range, and so on.
Go here for further information on how to convert to and from RGB.

What happens here is that you implicitly try to reinvent either color indexing or histogram back-projection. You call it color filter but it is better to focus on probabilities than on colors and color spaces. Colors of course not super reliable and change with lighting (though hue tends to stay the same given non-colored illumination) that's why some color spaces are better than others. You can handle this separately but it seems that you are more interested in the principles of calculating "filtering operation" that will do segmentation of the foreground object from background. Hopefully.
In short, a histogram back-projection works by first creating a histogram for R, G, B within object area and then back-projecting them into the image in the following way. For each pixel in the image find its bin in the histogram, calculate its relative weight (probability) given overall sum of the bins and put this probability into the image. In such a way each pixel would have probability that it belongs to the object. You can improve it by dividing with probability of background if you want to model background too.
The result will be messy but somewhat resemble an object segment plus some background noise. It has to be cleaned and then reconnected into object using separate methods such as connected components, grab cut, morphological operation, blur, etc.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string