non-linear programming problem using python - python-3.x

I have this (non?) linear programming problem which am not sure on how to go about solving it. So i have the following variables x,y and their bounds:
x_lower=[0,0,0,0,0,0]
x_upper=[100,20,50,200,10,50]
list_y=[1.41,1.42,5.60,5.70,8.60,8.80]
I want to pass these through the following terms:
back_true=(x*y)
back_false=(-x*y/y)
lay_true=(x+x*(y-1)**(-1))
lay_false=(-x*y/y)
where x is a random integer with bounds 0 and term x_upper[i] and is paired with a term 'y' from list_y[i]
This is in order to get the combination of x's that minimizes the difference between the maximum of the sums of the terms in the three lists while keeping the minimum value of each sum result non-negative.
res=[back_true[0],lay_false[1],back_false[2],lay_true[3],back_false[4],lay_true[5]]
res2=[back_false[0],lay_true[1],back_true[2],lay_false[3],back_false[4],lay_true[5]]
res3=[back_false[0],lay_true[1],back_false[2],lay_true[3],back_true[4],lay_false[5]])
the maximum of each would therefore be given by using the the following lsits paired with list_y:
for x in [100,0,0,200,0,50] >>> res = 439.9634 (max); res2 = -13.59 ; res3 = -159.362
for x in [0,20,50,0,0,50] >>> res = -243.59 ; res2 = 404.0293 (max); res3 = -182.381
for x in [0,20,0,50,200,0] >>> res= 92.5531; res2 = -32.381; res3 = 1848.257 (max)
sum(res (max),res2 (max) ,res3 (max))= 2692.25
i want to get the combination which minimizes the sum of the max values for the three res terms. As you can see what maximizes the term for one violates the non negative constraint in at least one other.
I not only want to keep these all above zero but get the highest possible sum of the three 'res' terms, that is:
find list of combinations of 'x' that mininimizes [sum(res,res2,res3) (maxes) minus sum(res,res2,res3) using x combination], while each of res, res2, res3 >=0
Does any one know how i could go about this?
I was playing round with linprog from scipy optimize but it doesn't seem to take more complex terms like the ones i want to use so not sure if i can use this for it.

Related

Calculating a custom probability distribution in python (numerically)

I have a custom (discrete) probability distribution defined somewhat in the form: f(x)/(sum(f(x')) for x' in a given discrete set X). Also, 0<=x<=1.
So I have been trying to implement it in python 3.8.2, and the problem is that the numerator and denominator both come out to be really small and python's floating point representation just takes them as 0.0.
After calculating these probabilities, I need to sample a random element from an array, whose each index may be selected with the corresponding probability in the distribution. So if my distribution is [p1,p2,p3,p4], and my array is [a1,a2,a3,a4], then probability of selecting a2 is p2 and so on.
So how can I implement this in an elegant and efficient way?
Is there any way I could use the np.random.beta() in this case? Since the difference between the beta distribution and my actual distribution is only that the normalization constant differs and the domain is restricted to a few points.
Note: The Probability Mass function defined above is actually in the form given by the Bayes theorem and f(x)=x^s*(1-x)^f, where s and f are fixed numbers for a given iteration. So the exact problem is that, when s or f become really large, this thing goes to 0.
You could well compute things by working with logs. The point is that while both the numerator and denominator might underflow to 0, their logs won't unless your numbers are really astonishingly small.
You say
f(x) = x^s*(1-x)^t
so
logf (x) = s*log(x) + t*log(1-x)
and you want to compute, say
p = f(x) / Sum{ y in X | f(y)}
so
p = exp( logf(x) - log sum { y in X | f(y)}
= exp( logf(x) - log sum { y in X | exp( logf( y))}
The only difficulty is in computing the second term, but this is a common problem, for example here
On the other hand computing logsumexp is easy enough to to by hand.
We want
S = log( sum{ i | exp(l[i])})
if L is the maximum of the l[i] then
S = log( exp(L)*sum{ i | exp(l[i]-L)})
= L + log( sum{ i | exp( l[i]-L)})
The last sum can be computed as written, because each term is now between 0 and 1 so there is no danger of overflow, and one of the terms (the one for which l[i]==L) is 1, and so if other terms underflow, that is harmless.
This may however lose a little accuracy. A refinement would be to recognize the set A of indices where
l[i]>=L-eps (eps a user set parameter, eg 1)
And then compute
N = Sum{ i in A | exp(l[i]-L)}
B = log1p( Sum{ i not in A | exp(l[i]-L)}/N)
S = L + log( N) + B

What is this normalization curve? Constant ^ (Constant ^ Observation Indexed to 100)

My apologies, but I'm not quite sure how to even ask this question. I have some normalization curves I've been using at work, and I'd like to know more about them so I speak about them intelligently. They have an s shape like a sigmoid function, but their general formula is the following:
Constant ^ (Constant ^ Observation Indexed to 100)
First, index a variable from 0 to 100 with the highest observation equal to 100, then insert into the equations below for curves with different slopes.
s1 = 0.0000000001 ^ (0.97 ^ Index)
s2 = 0.0000000002 ^ (0.962 ^ Index)
s3 = 0.0000000003 ^ (0.953 ^ Index)
And so on, up to s10. The resulting values are compressed between 0 and 1. s10 has the steepest slope with values that skew toward 1, and s1 has the shallowest slope with values that skew toward 0.
I think they're very clever, and they work well for our purposes, but I don't know what to even call them. Can anyone point me in the right direction? Again, apologies for the vagueness and if this is inappropriately tagged.
The functions you describe are special cases of the Gompertz functions; Gompertz functions have a sigmoidal shape and have many applications across different domains. For example in biology, Gompertz functions are used to model bacterial and tumour cell growth.
To see how your equations relate to the more general Gompertz functions, let's rewrite the equations for s
On a side note, we can see that taking the double-log of s (i.e. log log s) linearises the equation as a function of the index.
We can now compare this with the more general Gompertz function
Taking the natural logarithm gives
We then set a=1 and take the natural logarithm again
So the equations you give are algebraically identical to the Gompertz functions with parameters
Let's plot the function for the three sets of parameters that you give in your post (I use R here but it's easy to do something similar in e.g. Python)
# Define a function f which takes the index and two parameters a and b
# We use a helper function scale01 to scale the values of f in the interval [0,1]
# using min-max scaling
scale01 <- function(x) (x - min(x)) / (max(x) - min(x))
f <- function(idx, a, b) scale01(a ^ (b ^ idx))
# Calculate s for the three different sets of parameters and
# using integer index values from 0 to 100
idx <- 0:100
lst <- lapply(list(
s1 = list(a = 0.0000000001, b = 0.97),
s2 = list(a = 0.0000000002, b = 0.962),
s3 = list(a = 0.0000000003, b = 0.953)),
function(pars) f(idx, a = pars$a, b = pars$b))
# Plot
library(ggplot2)
df <- cbind(idx = idx, stack(lst))
ggplot(df, aes(idx, values, colour = ind)) + geom_line()

math.sqrt function python gives same result for two different values [duplicate]

Why does the math module return the wrong result?
First test
A = 12345678917
print 'A =',A
B = sqrt(A**2)
print 'B =',int(B)
Result
A = 12345678917
B = 12345678917
Here, the result is correct.
Second test
A = 123456758365483459347856
print 'A =',A
B = sqrt(A**2)
print 'B =',int(B)
Result
A = 123456758365483459347856
B = 123456758365483467538432
Here the result is incorrect.
Why is that the case?
Because math.sqrt(..) first casts the number to a floating point and floating points have a limited mantissa: it can only represent part of the number correctly. So float(A**2) is not equal to A**2. Next it calculates the math.sqrt which is also approximately correct.
Most functions working with floating points will never be fully correct to their integer counterparts. Floating point calculations are almost inherently approximative.
If one calculates A**2 one gets:
>>> 12345678917**2
152415787921658292889L
Now if one converts it to a float(..), one gets:
>>> float(12345678917**2)
1.5241578792165828e+20
But if you now ask whether the two are equal:
>>> float(12345678917**2) == 12345678917**2
False
So information has been lost while converting it to a float.
You can read more about how floats work and why these are approximative in the Wikipedia article about IEEE-754, the formal definition on how floating points work.
The documentation for the math module states "It provides access to the mathematical functions defined by the C standard." It also states "Except when explicitly noted otherwise, all return values are floats."
Those together mean that the parameter to the square root function is a float value. In most systems that means a floating point value that fits into 8 bytes, which is called "double" in the C language. Your code converts your integer value into such a value before calculating the square root, then returns such a value.
However, the 8-byte floating point value can store at most 15 to 17 significant decimal digits. That is what you are getting in your results.
If you want better precision in your square roots, use a function that is guaranteed to give full precision for an integer argument. Just do a web search and you will find several. Those usually do a variation of the Newton-Raphson method to iterate and eventually end at the correct answer. Be aware that this is significantly slower that the math module's sqrt function.
Here is a routine that I modified from the internet. I can't cite the source right now. This version also works for non-integer arguments but just returns the integer part of the square root.
def isqrt(x):
"""Return the integer part of the square root of x, even for very
large values."""
if x < 0:
raise ValueError('square root not defined for negative numbers')
n = int(x)
if n == 0:
return 0
a, b = divmod(n.bit_length(), 2)
x = (1 << (a+b)) - 1
while True:
y = (x + n//x) // 2
if y >= x:
return x
x = y
If you want to calculate sqrt of really large numbers and you need exact results, you can use sympy:
import sympy
num = sympy.Integer(123456758365483459347856)
print(int(num) == int(sympy.sqrt(num**2)))
The way floating-point numbers are stored in memory makes calculations with them prone to slight errors that can nevertheless be significant when exact results are needed. As mentioned in one of the comments, the decimal library can help you here:
>>> A = Decimal(12345678917)
>>> A
Decimal('123456758365483459347856')
>>> B = A.sqrt()**2
>>> B
Decimal('123456758365483459347856.0000')
>>> A == B
True
>>> int(B)
123456758365483459347856
I use version 3.6, which has no hardcoded limit on the size of integers. I don't know if, in 2.7, casting B as an int would cause overflow, but decimal is incredibly useful regardless.

Statistical Analysis Error? python 3 proof read please

The code below generates two random integers within range specified by argv, tests if the integers match and starts again. At the end it prints some stats about the process.
I've noticed though that increasing the value of argv reduces the percentage of tested possibilities exponentially.
This seems counter intuitive to me so my question is, is this an error in the code or are the numbers real and if so then what am I not thinking about?
#!/usr/bin/python3
import sys
import random
x = int(sys.argv[1])
a = random.randint(0,x)
b = random.randint(0,x)
steps = 1
combos = x**2
while a != b:
a = random.randint(0,x)
b = random.randint(0,x)
steps += 1
percent = (steps / combos) * 100
print()
print()
print('[{} ! {}]'.format(a,b), end=' ')
print('equality!'.upper())
print('steps'.upper(), steps)
print('possble combinations = {}'.format(combos))
print('explored {}% possibilitys'.format(percent))
Thanks
EDIT
For example:
./runscrypt.py 100000
will returm me something like:
[65697 ! 65697] EQUALITY!
STEPS 115867
possble combinations = 10000000000
explored 0.00115867% possibilitys
"explored 0.00115867% possibilitys" <-- This number is too low?
This experiment is really a geometric distribution.
Ie.
Let Y be the random variable of the number of iterations before a match is seen. Then Y is geometrically distributed with parameter 1/x (the probability of generating two matching integers).
The expected value, E[Y] = 1/p where p is the mentioned probability (the proof of this can be found in the link above). So in your case the expected number of iterations is 1/(1/x) = x.
The number of combinations is x^2.
So the expected percentage of explored possibilities is really x/(x^2) = 1/x.
As x approaches infinity, this number approaches 0.
In the case of x=100000, the expected percentage of explored possibilities = 1/100000 = 0.001% which is very close to your numerical result.

counting results from a defined matrix

So I am very new to programming and Haskell is the first language that I'm learning. The problem I'm having is probably a very simple one but I simply can not find an answer, no matter how much I search.
So basically what I have is a 3x3-Matrix and each of the elements has a number from 1 to 3. This Matrix is predefined, now all I need to do is create a function which when I input 1, 2 or 3 tells me how many elements there are in this matrix with this value.
I've been trying around with different things but none of them appear to be allowed, for example I've defined 3 variables for each of the possible numbers and tried to define them by
value w =
let a=0
b=0
c=0
in
if matrix 1 1==1 then a=a+1 else if matrix 1 1==2 then b=b+1
etc. etc. for every combination and field.
<- ignoring the wrong syntax which I'm really struggling with, the fact that I can't use a "=" with "if, then" is my biggest problem. Is there a way to bypass this or maybe a way to use "stored data" from previously defined functions?
I hope I made my question somewhat clear, as I said I've only been at programming for 2 days now and I just can't seem to find a way to make this work!
By default, Haskell doesn't use updateable variables. Instead, you typically make a new value, and pass it somewhere else (e.g., return it from a function, add it into a list, etc).
I would approach this in two steps: get a list of the elements from your matrix, then count the elements with each value.
-- get list of elements using list comprehension
elements = [matrix i j | i <- [1..3], j <- [1..3]]
-- define counting function
count (x,y,z) (1:tail) = count (x+1,y,z) tail
count (x,y,z) (2:tail) = count (x,y+1,z) tail
count (x,y,z) (3:tail) = count (x,y,z+1) tail
count scores [] = scores
-- use counting function
(a,b,c) = count (0,0,0) elements
There are better ways of accumulating scores, but this seems closest to what your question is looking for.
Per comments below, an example of a more idiomatic counting method, using foldl and an accumulation function addscore instead of the count function above:
-- define accumulation function
addscore (x,y,z) 1 = (x+1,y,z)
addscore (x,y,z) 2 = (x,y+1,z)
addscore (x,y,z) 3 = (x,y,z+1)
-- use accumulation function
(a,b,c) = foldl addscore (0,0,0) elements

Resources