What is nbinom.rvs returning? - python-3.x

I'm trying to understand what scipy.stats.nbinom.rvs is returning. Here is a sample of code:
*Code:**
from scipy.stats import nbinom
for i in range(10):
x = nbinom.rvs(n = 20, p = 0.5, size = 1)
print(str(i) + ": " + str(x[0]))
I thought this was basically saying: How many trials did it take to find 20 successes when flipping a coin (p=0.5). But a sample of my output shows some returns are well below 20. And since its impossible to get 20 success in 8 flips, I clearly don't understand the return value. Help please.
Sample output:
0: 19
1: 25
2: 14
3: 24
4: 30
5: 8
6: 28
7: 21
8: 14
9: 30
I've looked at the docs online but just seeing "random variates" isn't very helpful

From the docstring of scipy.stats.nbinom:
The probability mass function of the number of failures for `nbinom` is:
.. math::
f(k) = \binom{k+n-1}{n-1} p^n (1-p)^k
for :math:`k \ge 0`.
`nbinom` takes :math:`n` and :math:`p` as shape parameters where n is the
number of successes, whereas p is the probability of a single success.
So the values that you see are the number of "failures" that occur before achieving n "successes".
There is a note on the wikipedia page for the negative binomial distribution that is worth repeating here:
Different texts adopt slightly different definitions for the negative binomial distribution. They can be distinguished by whether the support starts at k = 0 or at k = r, whether p denotes the probability of a success or of a failure, and whether r represents success or failure, so it is crucial to identify the specific parametrization used in any given text.

Related

Program does not run faster as expected when checking much less numbers for finding primes

I made a program to find primes below a given number.
number = int(input("Enter number: "))
prime_numbers = [2] # First prime is needed.
for number_to_be_checked in range(3, number + 1):
square_root = number_to_be_checked ** 0.5
for checker in prime_numbers: # Checker will become
# every prime number below the 'number_to_be_checked'
# variable because we are adding all the prime numbers
# in the 'prime_numbers' list.
if checker > square_root:
prime_numbers.append(number_to_be_checked)
break
elif number_to_be_checked % checker == 0:
break
print(prime_numbers)
This program checks every number below the number given as the input. But primes are of the form 6k ± 1 only. Therefore, instead of checking all the numbers, I defined a generator that generates all the numbers of form 6k ± 1 below the number given as the input. (I added 3 also in the prime_numbers list while initializing it as 2,3 cannot be of the form 6k ± 1)
def potential_primes(number: int) -> int:
"""Generate the numbers potential to be prime"""
# Prime numbers are always of the form 6k ± 1.
number_for_function = number // 6
for k in range(1, number_for_function + 1):
yield 6*k - 1
yield 6*k + 1
Obviously, the program should have been much faster because I am checking comparatively many less numbers. But, counterintuitively the program is slower than before. What could be the reason behind this?
In every six numbers, three are even and one is a multiple of 3. The other two are 6-coprime, so are potentially prime:
6k+0 6k+1 6k+2 6k+3 6k+4 6k+5
even even even
3x 3x
For the three evens your primality check uses only one division (by 2) and for the 4th number, two divisions. In all, five divisions that you seek to avoid.
But each call to a generator has its cost too. If you just replace the call to range with the call to create your generator, but leave the other code as is(*), you are not realizing the full savings potential.
Why? Because (*)if that's the case, while you indeed test only 1/3 of the numbers now, you still test each of them by 2 and 3. Needlessly. And apparently the cost of generator use is too high.
The point to this technique known as wheel factorization is to not test the 6-coprime (in this case) numbers by the primes which are already known to not be their divisors, by construction.
Thus, you must start with e.g. prime_numbers = [5,7] and use it in your divisibility testing loop, not all primes, which start with 2 and 3, which you do not need.
Using nested for loop along with square root will be heavy on computation, rather look at Prime Sieve Algorithm which is much faster but does take some memory.
One way to use the 6n±1 idea is to alternate step sizes in the main loop by stepping 2 then 4. My Python is not good, so this is pseudocode:
function listPrimes(n)
// Deal with low numbers.
if (n < 2) return []
if (n = 2) return [2]
if (n = 3) return [2, 3]
// Main loop
primeList ← [2, 3]
limit ← 1 + sqrt(n) // Calculate square root once.
index ← 5 // We have checked 2 and 3 already.
step ← 2 // Starting step value: 5 + 2 = 7.
while (index <= limit) {
if (isPrime(index)) {
primeList.add(index)
}
index ← index + step
step ← 6 - step // Alternate steps of 2 and 4
}
return primeList
end function

Given a String of Buckets (alphabets). Find the cost (possibly minimal) to bring all the buckets at the base

Bob is a construction worker who does mathematics for increasing his efficiency. He is working on a site and has n buckets of cement-lined up with different characters (a – z) marked upon them. He has a strict command from the senior that he cannot change the order of the buckets.
Before starting his work, he has been given a string s of length n in which the character at position i (1 <= i <= n) gives us the mark on the i'th bucket. He can only carry one bucket at a time and bring it back to the base site. In each round, he has a criterion on which bucket to pick up. He will take the bucket with the smallest character marked upon it (a<b<z) and if there are multiple buckets with the smallest character, then he will take the one closest to the base.
The cost of picking up a bucket B is the number of buckets he passes through while walking from the site to get B (the bucket which he picks is also included). In each round, the cost accumulates. Find the final cost incurred by Bob while completing his job.
Constraints
1 < t,m < 10^5
The sum of n over all test cases does not exceed 10^6
SAMPLE INPUT
2
badce
SAMPLE OUTPUT
7
Explanation
badce - Firstly Bob takes the second basket with mark 'a' and adds 2 to the cost.
bdce - Then he takes the first basket with the mark 'b' and adds 1 to the cost.
dce - Then he takes the second basket with the mark 'c' and adds 2 to the cost.
de - Again he takes the first basket with the mark 'd' and adds 1 to the cost.
e - Again he takes the first basket with the mark 'e' and adds 1 to the cost.
The total cost becomes 7 units.
I have tried to code in Python but giving TLE for some cases.
Here is my approach-->
n = int(input())
s = input()
count_array = [0] * 26
for i in range(n):
count_array[ord(s[i])-97] += 1
alphabets = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
ans = 0
for i in range(26):
while count_array[i] > 0:
idx = s.index(alphabets[i])
ans += idx + 1
if idx > -1: s = s[0:idx] + s[idx+1:]
count_array[i] -= 1
print(ans)
I am looking for an optimized approach that takes O(nlogn) or O(n) time complexity. Thank You.
This runs in O(n). For every char, check how many previous chars will be transported later.
def get_cost(s):
result = 0
seen = [0] * 26
for c in s:
idx = ord(c) - ord('a')
result += 1 + sum(seen[idx+1:])
seen[idx] += 1
return result

problem with rounding in calculating minimum amount of coins in change (python)

I have a homework assignment in which I have to write a program that outputs the change to be given by a vending machine using the lowest number of coins. E.g. £3.67 can be dispensed as 1x£2 + 1x£1 + 1x50p + 1x10p + 1x5p + 1x2p.
However, I'm not getting the right answers and suspect that this might be due to a rounding problem.
change=float(input("Input change"))
twocount=0
onecount=0
halfcount=0
pttwocount=0
ptonecount=0
while change!=0:
if change-2>=0:
change=change-2
twocount+=1
else:
if change-1>=0:
change=change-1
onecount+=1
else:
if change-0.5>=0:
change=change-0.5
halfcount+=1
else:
if change-0.2>=0:
change=change-0.2
pttwocount+=1
else:
if change-0.1>=0:
change=change-0.1
ptonecount+=1
else:
break
print(twocount,onecount,halfcount,pttwocount,ptonecount)
RESULTS:
Input: 2.3
Output: 10010
i.e. 2.2
Input: 3.4
Output: 11011
i.e. 3.3
Some actually work:
Input: 3.2
Output: 11010
i.e. 3.2
Input: 1.1
Output: 01001
i.e. 1.1
Floating point accuracy
Your approach is correct, but as you guessed, the rounding errors are causing trouble. This can be debugged by simply printing the change variable and information about which branch your code took on each iteration of the loop:
initial value: 3.4
taking a 2... new value: 1.4
taking a 1... new value: 0.3999999999999999 <-- uh oh
taking a 0.2... new value: 0.1999999999999999
taking a 0.1... new value: 0.0999999999999999
1 1 0 1 1
If you wish to keep floats for output and input, multiply by 100 on the way in (cast to integer with int(round(change))) and divide by 100 on the way out of your function, allowing you to operate on integers.
Additionally, without the 5p, 2p and 1p values, you'll be restricted in the precision you can handle, so don't forget to add those. Multiplying all of your code by 100 gives:
initial value: 340
taking a 200... new value: 140
taking a 100... new value: 40
taking a 20... new value: 20
taking a 20... new value: 0
1 1 0 2 0
Avoid deeply nested conditionals
Beyond the decimal issue, the nested conditionals make your logic very difficult to reason about. This is a common code smell; the more you can eliminate branching, the better. If you find yourself going beyond about 3 levels deep, stop and think about how to simplify.
Additionally, with a lot of branching and hand-typed code, it's very likely that a subtle bug or typo will go unnoticed or that a denomination will be left out.
Use data structures
Consider using dictionaries and lists in place of blocks like:
twocount=0
onecount=0
halfcount=0
pttwocount=0
ptonecount=0
which can be elegantly and extensibly represented as:
denominations = [200, 100, 50, 10, 5, 2, 1]
used = {x: 0 for x in denominations}
In terms of efficiency, you can use math to handle amounts for each denomination in one fell swoop. Divide the remaining amount by each available denomination in descending order to determine how many of each coin will be chosen and subtract accordingly. For each denomination, we can now write a simple loop and eliminate branching completely:
for val in denominations:
used[val] += amount // val
amount -= val * used[val]
and print or show a final result of used like:
278 => {200: 1, 100: 0, 50: 1, 10: 2, 5: 1, 2: 1, 1: 1}
The end result of this is that we've reduced 27 lines down to 5 while improving efficiency, maintainability and dynamism.
By the way, if the denominations were a different currency, it's not guaranteed that this greedy approach will work. For example, if our available denominations are 25, 20 and 1 cents and we want to make change for 63 cents, the optimal solution is 6 coins (3x 20 and 3x 1). But the greedy algorithm produces 15 (2x 25 and 13x 1). Once you're comfortable with the greedy approach, research and try solving the problem using a non-greedy approach.

Keep Getting ZeroDivisonError Whenever using module

So I am working on a problem which need me to get factors of a certain number. So as always I am using the module % in order to see if a number is divisible by a certain number and is equal to zero. But when ever I am trying to do this I keep getting an error saying ZeroDivisionError . I tried adding a block of code like this so python does not start counting from zero instead it starts to count from one for potenial in range(number + 1): But this does not seem to work. Below is the rest of my code any help will be appreciated.
def Factors(number):
factors = []
for potenial in range(number + 1):
if number % potenial == 0:
factors.append(potenial)
return factors
In your for loop you are iterating from 0 (range() assumes starting number to be 0 if only 1 argument is given) up to "number". There is a ZeroDivisionError since you are trying to calculate number modulo 0 (number % 0) at the start of the for loop. When calculating the modulo, Python tries to divide number by 0 causing the ZeroDivisionError. Here is the corrected code (fixed the indentation):
def get_factors(number):
factors = []
for potential in range(1, number + 1):
if number % potential == 0:
factors.append(potential)
return factors
However, there are betters ways of calculating factors. For example, you can iterate only up to sqrt(n) where n is the number and then calculate "factor pairs" e.g. if 3 is a factor of 15 then 15/3 which is 5 is also a factor of 15.
I encourage you to try an implement a more efficient algorithm.
Stylistic note: According to PEP 8, function names should be lowercase with words separated by underscores. Uppercase names generally indicate class definitions.

Statistical Analysis Error? python 3 proof read please

The code below generates two random integers within range specified by argv, tests if the integers match and starts again. At the end it prints some stats about the process.
I've noticed though that increasing the value of argv reduces the percentage of tested possibilities exponentially.
This seems counter intuitive to me so my question is, is this an error in the code or are the numbers real and if so then what am I not thinking about?
#!/usr/bin/python3
import sys
import random
x = int(sys.argv[1])
a = random.randint(0,x)
b = random.randint(0,x)
steps = 1
combos = x**2
while a != b:
a = random.randint(0,x)
b = random.randint(0,x)
steps += 1
percent = (steps / combos) * 100
print()
print()
print('[{} ! {}]'.format(a,b), end=' ')
print('equality!'.upper())
print('steps'.upper(), steps)
print('possble combinations = {}'.format(combos))
print('explored {}% possibilitys'.format(percent))
Thanks
EDIT
For example:
./runscrypt.py 100000
will returm me something like:
[65697 ! 65697] EQUALITY!
STEPS 115867
possble combinations = 10000000000
explored 0.00115867% possibilitys
"explored 0.00115867% possibilitys" <-- This number is too low?
This experiment is really a geometric distribution.
Ie.
Let Y be the random variable of the number of iterations before a match is seen. Then Y is geometrically distributed with parameter 1/x (the probability of generating two matching integers).
The expected value, E[Y] = 1/p where p is the mentioned probability (the proof of this can be found in the link above). So in your case the expected number of iterations is 1/(1/x) = x.
The number of combinations is x^2.
So the expected percentage of explored possibilities is really x/(x^2) = 1/x.
As x approaches infinity, this number approaches 0.
In the case of x=100000, the expected percentage of explored possibilities = 1/100000 = 0.001% which is very close to your numerical result.

Resources