random numbers from geometric distribution such that their sum equals SUM - python-3.x

I want to draw k random numbers i_1,...,i_k with min <= i <= max from an exponentially shaped distribution of values with m,std being median and standard of the population's values. The sum(i1,..,ik) should equal a given parameter SUM.
Example:
Given:
k = 9 SUM = 175 min = 8 max = 40 m = 14
Desired:
[9, 10, 11, 12, 14, 17, 23, 30, 39]
I don't know if this is actually possible without depending on luck to draw a combination satisfying the SUM rule. I'd appreciate any kind of help or comment. Thank you.
EDIT: In a former version I wrote about exponentional distributions where an exact solution is impossible, rather I meant an exponentially shaped distribution with discrete values like a geometric distribution for instance.
EDIT2: Corrected the number k in the example.

Related

What are handy Haskell concepts to generate numbers of the form 2^m*3^n*5^l [duplicate]

This question already has answers here:
New state of the art in unlimited generation of Hamming sequence
(3 answers)
Closed 10 months ago.
I am trying generate numbers of the form 2^m*3^n*5^l where m, n, and l are natural numbers including 0.
The sequence follows: 1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24, 25, 27, 30, 32, .....
I am testing it by getting the one millionth number. I implemented it using list comprehension and sorting, but it takes too long. I want a faster solution. I have been spending days trying to do this to no avail.
I do not want a complete solution. I just want to know what Haskell concepts are necessary in accomplishing it.
Here's an approach that doesn't need any Haskell concepts, just some math and computer science.
Grab a library that offers priority queues.
Initialize a priority queue containing only the number 1.
Loop the following indefinitely: extract the minimum value from the queue. Put it next in the output list. Insert that number times 2, 3, and 5 as three individual entries in the queue. Make sure the queue insert function merges duplicates, because there will be a lot of them thanks to commutativity of multiplication.
If you have a maximum you're working up to, you can use it to prune insertions to the queue as a minor optimization. Alternatively, you could take advantage of actual Haskell properties and just return an infinite list using laziness.
First, write a function of type Int -> Bool that dermines if a given integer is in the sequence you defined. It would divide the number by 2 as many times as possible (without creating a fraction), then divide it by 3 as many times as possible, and finally divide it by 5 as many times as possible. After all of this, if the number is larger than 1, then it cannot be expressed as a products of twos, threes, and fives, so the function would return false. Otherwise, the number is in your sequence, so the function returns true.
Then take the infinite sequence of integers greater than 0, and use the function above to filter out all numbers that are not in the sequence.
Carl's approach can be improved by inserting less elements when removing the minimal element x: As 2<3<4<5<6 you can just
append 3*x/2 if x is even but not divisible by 4
append 4*x/3 if x is divisible by 3
append 5*x/4 if x is divisible by 4
append 6*x/5 if x is divisible by 5
In code it looks like this:
g2 x | mod x 4 == 0 = [5*div x 4]
| even x = [3*div x 2]
| otherwise = []
g3 x | mod x 3 == 0 = [4*div x 3]
| otherwise = []
g5 x | mod x 5 == 0 = [6*div x 5]
| otherwise = []
g x = concatMap ($ x) [g2, g3, g5]
So you if your remove the minimal element x from the priority queue, you have to insert the elements of g x into the priority queue. On my laptop I get the millionth element after about 8 min, even if I use just a list instead of the better priority queue, as the list grows only to a bit more than 10000 elements.

recurrence relation of the dynamic programming to get the maximum credits

My problem is I have a fixed time and I must get the highest profit.
How can I write recurrence relation for a dynamic program?
An example for my problem:
The times are [3, 2, 4, 2, 1]
The profits are [20, 15, 20, 25, 20]
The requested hours are 6
The answer should be 65 by picking the times with indices 0,3,4 that have profit 20 + 25 + 20 = 65.
Lets define the function f(i,h) that gives the maximum profit for exactly h hours from the profits with indices(0,1,2...,i), then the result for your case is f(size_of_the_array , number_of_hours) = f(5,6).
The main recurrence formula will be like this:
f(i,h) = max(w1,w2);
w1 = f(i-1,h); //don't consider the i-th profit in the sum
w2 = f(i-1,h-time[i]) + profit[i]; : h>=time[i] //consider the i-th profit in the sum
This is similar to a standard problem in dynamic programming called 0-1Knapsack
you can study it first, then your question will be easy to solve.

Excel: Average Difference in a single formula

I have a dataset like this:
10, 23, 43, 45, 56;
12, 25, 21, 23, 40;
I want to know the average of the difference between the two rows like
mean (10 - 12, 23 - 25, 43 -21 ...)
Of course, this is only an example and the actual rows are hundreds of element long. I would like to compute the average of the difference without having to compute somewhere the difference and then having the average. (The sheet is already pretty big)
Thanks a lot
Mathematically, what you are asking for is identical to:
=AVERAGE(A1:E1)-AVERAGE(A2:E2)
Regards
Try,
=AVERAGE(INDEX((A1:E1)-(A2:E2), , ))
If there were missing values in one range or the other, you would need something like
=AVERAGEIFS(A1:G1,A1:G1,"<>",A2:G2,"<>")-AVERAGEIFS(A2:G2,A1:G1,"<>",A2:G2,"<>")
(I have tested it with blanks in G1 and F2)

np.percentile not equal to quartiles

I'm trying to calculate the quartiles for an array of values in python using numpy.
X = [1, 1, 1, 3, 4, 5, 5, 7, 8, 9, 10, 1000]
I would do the following:
quartiles = np.percentile(X, range(0, 100, 25))
quartiles
# array([1. , 2.5 , 5. , 8.25])
But this is incorrect, as the 1st and 3rd quartiles should be 2 and 8.5, respectively.
This can be shown as the following:
Q1 = np.median(X[:len(X)/2])
Q3 = np.median(X[len(X):])
Q1, Q3
# (2.0, 8.5)
I can't get my heads round what np.percentile is doing to give a different answer. Any light shed on this, I'd be very grateful for.
There is no right or wrong, but simply different ways of calculating percentiles The percentile is a well defined concept in the continuous case, less so for discrete samples: different methods would not make a difference for a very big number of observations (compared to the number of duplicates), but can actually matter for small samples and you need to figure out what makes more sense case by case.
To obtain you desired output, you should specify interpolation = 'midpoint' in the percentile function:
quartiles = np.percentile(X, range(0, 100, 25), interpolation = 'midpoint')
quartiles # array([ 1. , 2. , 5. , 8.5])
I'd suggest you to have a look at the docs http://docs.scipy.org/doc/numpy/reference/generated/numpy.percentile.html

Dynamic Programming : Why the 1?

The following pseudocode finds the smallest number of coins needed to sum upto S using DP. Vj is the value of coin and min represents m as described in the following line.
For each coin j, Vj≤i, look at the minimum number of coins found for the i-Vjsum (we have already found it previously). Let this number be m. If m+1 is less than the minimum number of coins already found for current sum i, then we write the new result for it.
1 Set Min[i] equal to Infinity for all of i
2 Min[0]=0
3
4 For i = 1 to S
5 For j = 0 to N - 1
6 If (Vj<=i AND Min[i-Vj]+1<Min[i])
7 Then Min[i]=Min[i-Vj]+1
8
9 Output Min[S]
Can someone explain the significance of the "+1 " in line 6? Thanks
The +1 is because you need one extra coin. So for example, if you have:
Vj = 5
Min[17] = 4
And you want to know the number of coins it will take to get 22, then the answer isn't 4, but 5. It takes 4 coins to get to 17 (according to the previously calculated result Min[17]=4), and an additional one coin (of value Vj = 5) to get to 22.
EDIT
As requested, an overview explanation of the algorithm.
To start, imagine that somebody told you you had access to coins of value 5, 7 and 17, and needed to find the size of the smallest combination of coins which added to 1000. You could probably work out an approach to doing this, but it's certainly not trivial.
So now let's say in addition to the above, you're also given a list of all the values below 1000, and the smallest number of coins it takes to get those values. What would your approach be now?
Well, you only have coins of value 5, 7, and 23. So go back one step- the only options you have are a combination which adds to 995 + an extra 5-value coin, a combination which adds to 993 + an extra 7-value, or a combination up to 977 + an extra 23-value.
So let's say the list has this:
...
977: 53 coins
...
993: 50 coins
...
995: 54 coins
(Those examples were off the top of my head, I'm sure they're not right, and probably don't make sense, but assume they're correct for now).
So from there, you can see pretty easily that the lowest number of coins it will take to get 1000 is 51 coins, which you do by taking the same combination as the one in the list which got 993, then adding a single extra 7-coin.
This is, more or less, what your algorithm does- except instead of aiming just to calculate the number for 1000, it's aim would be to calculate every number up to 1000. And instead of being passed the list for lower numbers in from somewhere external, it would keep track of the values it had already calculated.

Resources