Normal distribution shapes

Normal distribution shapes - statistics

enter image description here
Hi everyone
I'm having hard time differentiating between shapes: Symmetric & Skewed
There are some clear graphs. You don't need to think twice
But here for example: the histogram makes me really confused
Is it a right skewed? is it symmetric? Totally lost.
I have tried many ways to get the right answers:
1- comparing between mean=26.75 and median=25.5 values
2- calculating the following distances:
a) The distance from the min to the median is (less/equal or greater)
than the median to the max.
b) The distance from the minimum value to Q1 is (less/equal or
greater) than the one from Q3 to the max
c) The distance from Q1 to the median is (less/equal or greater) than
the one from the median to Q3
It does not lead me to any conclusion
Right skewed: Everything follows the right skewed rules except c)
Symmetric: Looking at the graph, it seems symmetric, but depending on the calculation it is not
Help please.
Thank you in advance.

Related

Weighted Least Squares vs Monte Carlo comparison

I have an experimental dataset of the following values (y, x1, x2, w), where y is the measured quantity, x1 and x2 are the two independet variables and w is the error of each measurement.
The function I've chosen to describe my data is
These are my tasks:
1) Estimate values of bi
2) Estimate their standard errors
3) Calculate predicted values of f(x1, x2) on a mesh grid and estimate their confidence intervals
4) Calculate predicted values of
and definite integral
and their confidence intervals on a mesh grid
I have several questions:
1) Can all of my tasks be solved by weighted least squares? I've solved task 1-3 using WLS in matrix form by linearisation of the chosen function, but I have no idea, how to solve step №4.
2) I've performed Monte Carlo simulations to estimate bi and their s.e. I've generated perturbated values y'i from normal distribution with mean yi and standard deviation wi. I did this operation N=5000 times. For each perturbated dataset I estimated b'i, and from 5000 values of b'i I calculated mean values and their standard distribution. In the end, bi estimated from Monte-Carlo simulation coincide with those found by WLS. Am I correct, that standard deviations of b'i must be devided by № of Degrees of freedom to obtain standard error?
3) How to estimate confidence bands for predicted values of y using Monte-Carlo approach? I've generated a bunch of perturbated bi values from normal distribution using their BLUE as mean and standard deviations. Then I calculated lots of predicted values of f(x1,x2), found their means and standard deviations. Values of f(x1,x2) found by WLS and MC coincide, but s.d. found from MC are 5-45 order higher than those from WLS. What is the scaling factor that I'm missing here?
4) It seems that some of parameters b are not independent of each other, since there are only 2 independent variables. Should I take this into account in question 3, when I generate bi values? If yes, how can this be done? Should I use Chi-squared test to decide whether generated values of bi are suitable for further calculations, or should they be rejected?
In fact, I not only want to solve tasks I've mentioned earlier, but also I want to compare the two methods for regression analysys. I would appreciate any help and suggestions!

Distance between straight lines

I work in the oil & gas industry and I'm seeking advice about how to calculate the minimum distance between a set of wells (the wells are drawn as straight lines on a map). My goal is for each individual well to have a unique "spacing" value (measured in feet) which is basically the straight-line horizontal distance to the closest wellbore on a map. Below is a simple example of what I'm trying to accomplish (assume the pipe | symbol is a wellbore and the dashes are the distance between the wells)
|--|---|-|
In the drawing above we have 4 wells. The 1st well (starting from the far left) would have a spacing value of 2 (since there are 2 dashes to the closest well), the 2nd well would also have a value of 2 (since the closest well is the one to the far left which is two spaces away), the 3rd well would have a value of 1, and the 4th well would have a value of 1.
Now imagine that I have hundreds of these wells (each with latitude/longitude points that describe the start & end points of each well) and I have them all mapped in TIBCO Spotfire (scattered across Texas). Do you guys know if it would even be possible to automate a calculation like the above? I would also like to build in a rule that says the max distance between wells is 2640 ft (half of a mile).
Any ideas are appreciated!

I think you should be able to do this without any R or iron python.
Within Spotfire, you can calculate the distance in miles between 2 points using the formula below (substitute 6371 for 3958.756 to get the answer in kilometres).
GreatCircleDistance([Lat 1],[Lon 1],[Lat 2],[Lon 2]) * 3958.756
For your use case, you could cross join your table of locations, so that you have a row for every possible location combination, then calculate the distance between them using the formula above. After that, it should be pretty straight forward to find each wells closest pair.