What is the formula for calculating these probabilities? [closed] - statistics

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
The probability that you randomly choose a red marble from a bag is is 0.6. A random sample of 6 marbles are drawn from the bag (The sample has an appropriate binomial distribution.)
What is the probability that exactly four of these marbles are red?
What is the probability that two or fewer of these marbles are red?

The probability is according to the binomial distribution:
So to calculate the probability for 4 red marbles outta 6 from scratch using python, it will be:
n = 6
k = 4
p = 0.6
import scipy.special
# the two give the same results
scipy.special.binom(n, k)*(p**k)*((1-p)**(n-k))
0.31104000000000004
We can use the binomial distribution from scipy:
from scipy.stats import binom
binom.pmf(4,6,0.6)
0.31104
For two or fewer of these marbles are red, basically it's the probability of 0,1,2
sum(binom.pmf([0,1,2],6,0.6))
0.17920000000000008
Or you can use the cdf:
binom.cdf(2,6,0.6)

Related

How many ways to normalize data? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I am curious about how many ways can we normalize data in data processing step before we use it to train machine learning model, deep learning model and so on.
All I know is
Z-score normalization = (data - mean)/variance.
Min-Max normalization = (data - min)/(max - min)
Do we have other ways except these two that I know?
There are many ways to normalize the data prior to training a model, some depends on the task, data type (tabular, image, signals) and data distribution. You can find the most important ones in scikit-learn preprocessing subpackage:
To highlight few that I have been using consistently, Box-Cox or Yeo-Johnson transformation, where it is used when your feature's distribution is skewed. This will minimize the skewness through maximum likelihood.
Another normalization technique is called Robust Scaler that is can perform better than the Z-score normalization if your dataset contains many outliers as they can falsely influence the sample mean and variance.

Model weights means in Machine Learning [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I'm currently learning machine learning.i get confused what is Model weights term. please explain to me what is model weight really means
Weights are the numbers you use to turn your samples into a prediction. In many (most?) cases this is what you are learning with your system. For example, suppose you want to predict house price using only the house size (x). You might use a simple linear regression model that tries to fit a line to the data. The formula you will use is the formula for a line:
y = w * x + b
Here x is given (the house size) and you use w and b to predict y the price. In this case w and b are your weights. The goal is to determine which w and b give the best fit to the data.
In more complex models like neural networks (or even more complicated linear regression) you may have dramatically more weights in you model, but the basic idea of finding those weights that best fit the data is the same.

Hamming distance [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
My work is in genetics and I'm using the Hamming distance (in Matlab) to calculate the genetic distance between genotypes of a virus.
For example: Type 1 has structure 01234 and Type 2 has structure 21304 etc. Obviously there are many genotypes present. Because the genotypes have the same length, I thought using the Hamming distance would be fine.
My question is this: How can I order the genotypes based on the Hamming distance. Another way of putting this: how can I sort the genotypes into clusters based on the Hamming distance between them?
Thanks
You can use severel methodes to cluster such data.
Based on the distance matrix you can use UPGMA or neighbor joining
Single linkage or complete linkage are also distance based cluster methodes.

what is the advantage to use Spline to represent curve? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Often hear about curve modeled using spline. What's the advantage of using spline?
Spline data consists of control points and weights that are related to each other (a point on a spline depends on the coordinates and weights several neighboring control points). Curve data would either be a large set of closely spaced points to approximate the curve (expensive to store, where spline data is sparse), or an equation which might take a lot of horsepower to solve for y from a given x. Splines can be cheaply computed and subdivided/interpolated to achieve the desired precision but a curve of explicit points loses precision without having weight information. Splines are also really useful in vector art (think Flash or Adobe Illustrator) and 3D graphics because you can intuitively drag a few control points around to get exactly the curve you want instead of having to move a ton of individual curve points.

How to solve my simple geometric task? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
It seems I completely forgot geometry :-( It looks like a simple. I need it for my flash game.
I drawn it in Carmetal program:
I need coordinates of C(x,y). Please help me to find a solution.
You can stick with simple trig...
Here, the blue line length is (By - Ay). So the angle at B is acos((By - Ay) / AB). Subtracting that angle from the angle ABC, you find the angle at B in the larger triangle. Knowing the length BC and that angle, you can calculate the length of the brown line with
l1 = BC/sin(small_angle)
Similarly, the length of the blue and red lines together is
l2 = BC/cos(small_angle)
And C is (Bx + l1, By - l2).

Resources