How to find the maximum and lowest value of a random normal or log-normal distribution? - python-3.x

This is my first question on Stack Overflow so forgive me if I'm not in conformity with some norms. That being said, this is my problem:
Edited:
I have a continuous variable where I can only measure some points of data and I need to assess the probability curve for the maximum and lowest values between each data point. I have the std deviation and the variable works on lognormal distribution, this means the average is a log-mean and the std deviation is multiplicative.
Example:
Assuming a car's speed is normally distributed and there are no traffic laws, at 10 AM the car is travelling at the speed of 40 MPH, at 11 AM he is travelling at 60 MPH, the standard deviation is a 10% change of its speed every hour. There is this 1h blackout in between where you have no information, but you should be able to estimate: the more probable highest speed the car achieved in this time, the more probable lowest speed, and somehow a probability distribution of everything in between. You can even assume Its the least unlikely probability that its speed at 10 AM was its lowest speed and at 11 AM was it highest speed in the period (if the car speed is truly random at every scale you can even assume its limiting the impossible). The outcome is a lognormal distribution which could be used to simulate scenarios regarding that car.
I'm not an expert in statistics and I understand only the basics and some theory, how should I address this problem?
I'm using this on Python 3.x in case you guys know an way to address that problem there.

Related

Actuarial vs. predicted survival comparison

I have a set of patients and their actuarial 1- and 5-years survival. I have also used their data with a certain commonly utilised medical score, that calcualtes survival probability for 1- and 5-years (for example 75% and 55% respectively). I'd like to compare both survival rates.
I did calculate the mean survival probability for all patients at 1- and 5-years as the mean of predicted survival probabilities. I then calculated the mean actuarial survival by using 100% if alive at 1 year and 0% if dead at 5 years. I then compared the means of both groups with a t-test.
I have a feeling that what i am doing is grossly incorrect and goes against all rules of statistics, however i have not find any solution of my problem anywhere. Maybe someone can help me? R packages and codes are welcome.

Descriptive statistics, percentiles

I am stuck in a statistics assignment, and would really appreciate some qualified help.
We have been given a data set and are then asked to find the 10% with the lowest rate of profit, in order to decide what Profit rate is the maximum in order to be considered for a program.
the data has:
Mean = 3,61
St. dev. = 8,38
I am thinking that i need to find the 10th percentile, and if i run the percentile function in excel it returns -4,71.
However I tried to run the numbers by hand using the z-score.
where z = -1,28
z=(x-μ)/σ
Solving for x
x= μ + z σ
x=3,61+(-1,28*8,38)=-7,116
My question is which of the two methods is the right one? if any at all.
I am thoroughly confused at this point, hope someone has the time to help.
Thank you
This is the assignment btw:
"The Danish government introduces a program for economic growth and will
help the 10 percent of the rms with the lowest rate of prot. What rate
of prot is the maximum in order to be considered for the program given
the mean and standard deviation found above and assuming that the data
is normally distributed?"
The excel formula is giving the actual, empirical 10th percentile value of your sample
If the data you have includes all possible instances of whatever you’re trying to measure, then go ahead and use that.
If you’re sampling from a population and your sample size is small, use a t distribution or increase your sample size. If your sample size is healthy and your data are normally distributed, use z scores.
Short story is the different outcomes suggest the data you’ve supplied are not normally distributed.

How to design a score or signature function based on the time series data

I want to design a score or signature function based on a time series signal. Usually, the signal has ups and downs.
For a given time window, I desire to design the score function based on the number of times it fluctuates, the duration of the fluctuations, and the magnitude of the fluctuations. I am wondering what kind of math I can use to design the function. I am not sure if the statistical features (mean, median, and so on) would be enough to design unique function such that two time windows would be distinguishable.
Thanks!
Summary statistics will not give you what you want... but it can still be useful.
Things you can try:
Zero crossings on the signal will give you number of fluctuations. You'll have to use some central tendency value to move the signal about the 0 line in order to do this. Alternatively you can use FFT on the original to find the harmonic frequency as part of the score.
Could define the duration of fluctuations as the difference between zero crossings divided by two (since one fluctuation will reach the 0-line twice).
Magnitude can be done by finding the local minima and maxima - check out some packages with peak finding functions. You might want to use the mean or median to rule out local minima and maxima that fall on the wrong side of the line. Alternatively, finding the zero crossings on the derivative signal and then mapping them back to the original will give you all the local minima and maxima as well.

calculating reliability of measurements

I have many measurements of age of the same person. Let's say:
[23 25 32 23 25]
I would like to output a single value and a reliability score of this value. The single value can be the average.
Reliability, I don't know well how to calculate it. The value should be between 0 and 1, where 1 means all ages are equal and a very unreliable measurement should be near 0.
Probably the variance should be used here, but it's not clear to me how to normalize it between 0 and 1 in a meaningful way (1/(x+1) is not much meaningful :)).
Assume some probability distribution (or determine what probability distribution your data fits most accurately). A good choice is a normal distribution, which for discrete data requires a continuity correction. See example here: http://www.milefoot.com/math/stat/pdfc-normaldisc.htm
In your example, your reliability score for the average age of 26 (25.6 rounded to nearest integer), is simply the probability that X falls in the range (25.5, 26.5).
The easiest way for assessing reliability (or internal consistency) is to use Cronbach's alpha. I guess most statistics software has this method built-in.
https://en.wikipedia.org/wiki/Cronbach%27s_alpha

Verify transmit power to be within certain limits of its expected value over 95% of test measurements

I have a requirement where I have to verify the transmit power out of a device as measured at its connector is within 2 dB of its expected value over 95% of test measurements.
I am using a signal analyzer to analyze the transmitted power. I only get the average power value, min, max and stdDev of the measurements and not the individual power measurements.
Now, the question is how would I verify the "95% thing" using average power, min, max and stdDev. It seems that I can use normal distribution to find the 95% confidence level.
I would appreciate if someone can help me on this.
Thanks in anticipation
The way I'm reading this, it seems you are a statistical beginner, so if I'm wrong there, the rest of this answer will probably be insultingly basic, and I'm sorry.
Anyway, the idea is that if a dataset is normally distributed, and all the observations are independent of one another, then 95% of the data points will fall within 1.96 standard deviations of the mean.
Do you get identical estimates of average power every time you measure, or are there some slight random differences from reading to reading? My guess is that it's the second. If you were to measure the power a whole bunch of times, and each time you plotted your average power value on a histogram, then that histogram of sample means would have the shape of a bell curve. This bell curve of sample means would have its own mean and standard deviation, and if you have thousands or millions of data points going into the calculation of each average power reading, it's not horrible to assume that it is a normal distribution. The explanation for this phenomenon is known as the 'central limit theorem', and I recommend both the Khan academy's presentation of it as well as the wikipedia page on it.
On the other hand, if your average power is the mean of some small number of data points, like for instance n= 5, or n= 30, then assumption of a normal distribution of sample means can be pretty bad. In this case, your 95% confidence interval around the average power goes from qt(0.975,n-1)*SD/sqrt(n) below the average to qt(0.975,n-1)*SD/sqrt(N) above the average, where qt(0.975,n-1) is the 97.5th percentile of the t distribution with n-1 degrees of freedom, and SD is your measured standard deviation.

Resources