Simple Probability - statistics

Five U.S. Senators are selected at random from the 100 U.S. Senators. What is
the probability that at least two of the five selected are from the same state? There are 50 states, each with two senators.

The probability that the second senator selected is not from the same state as the first is 98/99.
The probability that the third senator selected is not from the same states as the first two is 96/98.
The probability that the fourth senator selected is not from the same state as the first three is 94/97.
The probability that the fifth senator selected is not from the same state as the first four is 92/96.
Therefore the probability that all five senators are from different states 98*96*94*92/(99*98*97*96) =~ 0.90. In all other cases you get two or more from the same state, with a probability of ~0.10.

Related

How to choose between geometric and negative binomial distributions

A sample question for an actuarial science exam sample exam goes like this:
"Calculate the probability that there will be at least four months in which no accidents occur before the fourth month in which at least one accident occurs.
A company takes out an insurance policy to cover accidents that occur at its manufacturing plant. The probability that one or more accidents will occur during any given month is 3/5.
The number of accidents that occur in any given month is independent of the number of accidents that occur in all other months."
I interpreted this as what is the probability (P) of no accidents during any of at least 3 months before one or more accidents occur in the following month.
I assumed a geometric distribution and calculated two different ways, got the same answer both times:
Given: "event": "one or more accidents in a month"
p(event) = 3/5; q(non event) = 1-p = 2/5
One event occurs after 3 or more months of no events: P = q^3psum(k=0->inf)(q^k) = q^3p(1/(1-q)) = q^3 = (2/5)^3 = 0.064
P = 1 - Prob(one or more accidents occur in one or more of the first three months). Same answer: 0.064.
But 0.064 is not among the answer choices.
The exam offers its solution as using the negative binomial distribution as follows:
"Solution: D
If a month with one or more accidents is regarded as success and k = the number of failures before the fourth success, then k follows a negative binomial distribution and the requested probability is:
Alternatively the solution is
which can be derived directly or by regarding the problem as a negative binomial distribution with
success taken as a month with no accidents
k = the number of failures before the fourth success, and calculating"
So my question is: How to infer that the correct probability distribution to consider is the negative binomial ?? In my reading of the question, it is the first "success" not the fourth "success" that occurs after three failures hence geometric distribution (or, equivalently, (1,p) NB distribution).
What am I missing?
Thanks in advance.
I think they asked to calculate the probability of the event before an Rth success occurs. So, the whole point of negative binomial distribution is to find the probabilities of events before Rth success in "N-R" trials. whereas it is quite different with geometric distribution where you find the probability of the first success.
I hope my explanation was understandable, also I just stumbled upon this.

Draw coins (without replacement) at random

I am running into a problem that asks me to calculate the probability and standard deviation. The problem is follows:
Coins with values 1 through N (inclusive) are placed into a bag. All the coins from the bag are iteratively drawn (without replacement) at random. For the first coin, you are paid the value of the coin. For subsequent coins, you are paid the absolute difference between the drawn coin and the previously drawn coin. For example, if you drew 5,3,2,4,1
, your payments would be 5,2,1,2,3
for a total payment of 13
.
What is the standard deviation of your total payment for N=20?
What is the probability that your total payment is greater than or equal to 160 for N=20?
Can I get your help with these two questions?
Thanks!!

Elegant and functional solution to linear optimization constraint?

Assume you have a linear optimization problem where you are trying to determine which attendees of a corporate event qualify to go through to the VIP event. You are trying to maximize a pre-defined utility, where each attendee has a particular utility, subject to a number of constraints which include:
No more than 4 people from any company at the VIP event
At least 4 companies represented at the VIP event.
Assume you have 400 attendees and only 10 can attend the VIP event, and there are employees from at least 50 different companies to choose from.
I have set up my problem in excel, where I have a row for each attendee, and a binary column for my linear optimization program’s ‘changeable cells’, where 1 is selected if the attendee is chosen for the VIP event and 0 otherwise.
What code can I then write to satisfy the above constraints?
What I have tried so far…
Currently my only solution for dealing with the first constraint listed above is to have 50 additional binary columns (one for each company) where if an attendee is going to the VIP event and they are from the company represented in that particular column, it will list a ‘1’, and ‘0’ otherwise. Then have another 50 cells that sum each column and then set a constraint that says those cells must be less than or equal to 4.
I feel there must be a more elegant and efficient way of doing this however.
I also cannot currently think of a way to write the code to satisfy the second constraint. I have tried having a separate column that displays company names when the changeable cell equals 1, then counting the number of unique values in that column, and then applying a constraint to that cell such that it must be greater than or equal to 3, but apparently that is a “non-linear” constraint.

Simulation in Excel using probability

I am trying to create a spreadsheet that can find the most likely probability that a student scored a specific grade on a test.
Only one student can score a grade and only one grade can have a student.
I have limited information about each student.
There are 5 students (1,2,3,4,5)
and the grades possible are only (100,90,80,70,60)
In the spreadsheet a 1 denotes that the student DIDN'T score that grade.
Does anyone know how to make a simulation that I can find the most likely probability of what student scored what grade?
Link:
https://docs.google.com/spreadsheets/d/1a8uUIRzUKsY3DolTM1A0ISqMd-42WCUCiDsxmUT5TKI/edit?usp=sharing
Based on your response in comments, each student has an equal likelihood of getting each grade. No simulation is necessary.
If you want to simulate it anyway, don't use Excel*. Create a vector of students, and pair it with a shuffled vector of the grades. Lather, rinse, repeat as many times as you want to see that the student-to-grade matching is uniformly distributed.
* - To get an idea of how bad Excel can be for random variate generation, enable the Analysis Toolpak, go to "Data -> Data Analysis" on the ribbon, and select "Random Number Generation". Fill in the tabs that you want 10 variables, number of random numbers 2000, a "Normal" distribution, leave the mean and std dev at 0 and 1, and enter a "Random Seed" value of 123. You will find that the resulting table contains 3 instances of the value "-9.35764". Values that extreme should occur about once per twenty thousand years if you generate a billion a second. Getting three of them is so extreme that it should happen once per 1030 times the current estimated age of the universe. Conclude that a) it's your lucky day, or b) Excel sucks at random numbers, and despite being informed about this as far back as 1998 Microsoft hasn't bothered to fix it.

How to obtain Incremental standard deviations from a set of standard deviations?

I have a data set containing three columns, first column represents number of trials, second column represents experimental values, and the third column represents corresponding standard deviation.
With each experiment there is an increment in my experimental values. To get the incremental values, I hold my first value as the reference value and subtract this reference value from each subsequent value and use them to create fourth column of these incremental values.
My problem begins right from here. How do I create a new set of incremental standard deviations for the incremental experimental values I got? My apology if the problem is not well defined but hopefully someone will eventually be able to help me out. Many thanks!
Below is my data set,
Trial Mean SD Incr Mean Incre SD
1 45.311 4.668 0
2 56.682 2.234 11.371
3 62.197 2.266 16.886
4 70.550 4.751 25.239
5 80.528 4.412 35.217
6 87.453 4.542 42.142
7 89.979 2.185 44.668
8 96.859 3.476 51.548
To be clear, for other readers, your incremental mean is actually the difference between trial 1 and the other trials.
Variances add directly when you subtract (or add) independent normal distributions. So you first want to convert that standard deviation to a variance by squaring it, and then you can add the variances, and then you can take the square root to turn it back into a standard deviation. Note when using this kind of Pythagorean combination, you are assuming that trial 1 is independent from the trials, so for example, you cannot do things like have some sample in both trials.
Logically this makes sense that your so called "incremental SD" will always be greater than the individual SDs, since the uncertainty of both distributions contributes towards the uncertainty of the difference.

Resources