Sympy - Limit with parameter constraint - python-3.x

I try to calculate the limit of a function with a constraint on one of its parameters. Unfortunately, I got stuck with the parameter constraint.
I used the following code where 0 < alpha < 1 should be assumed
import sympy
sympy.init_printing()
K,L,alpha = sympy.symbols("K L alpha")
Y = (K**alpha)*(L**(1-alpha))
sympy.limit(sympy.assumptions.refine(Y.subs(L,1),sympy.Q.positive(1-alpha) & sympy.Q.positive(alpha)),K,0,"-")
Yet, this doesn't work. Is there any possibility to handle assumptions as in Mathematica?
Best and thank you,
Fabian

To my knowledge, the assumptions made by the Assumptions module are not yet understood by the rest of SymPy. But limit can understand an assumption that is imposed at the time a symbol is created:
K, L = sympy.symbols("K L")
alpha = sympy.Symbol("alpha", positive=True)
Y = (K**alpha)*(L**(1-alpha))
sympy.limit(Y.subs(L, 1), K, 0, "-")
The limit now evaluates to 0.
There isn't a way to declare a symbol to be a number between 0 and 1, but one may be able to work around this by declaring a positive symbol, say t, and letting L = t/(1+t).

Related

Mixed integer nonlinear programming with gekko python

I want to solve the following optimization problem using Gekko in python 3.7 window version.
Original Problem
Here, x_s are continuous variables, D and Epsilon are deterministic and they are also parameters.
However, since minimization function exists in the objective function, I remove it using binary variables(z1, z2) and then the problem becomes MINLP as follows.
Modified problem
With Gekko,
(1) Can both original problem & modified problem be solved?
(2) How can I code summation in the objective function and also D & epsilon which are parameters in Gekko?
Thanks in advance.
Both problems should be feasible with Gekko but the original appears easier to solve. Here are a few suggestions for the original problem:
Use m.Maximize() for the objective
Use sum() for the inner summation and m.sum() for outer summation for the objective function. I switch to m.sum() when the summation would create an expression that is over 15,000 characters. Using sum() creates one long expression and m.sum() breaks the summation into pieces but takes longer to compile.
Use m.min3() for the min(Dt,xs) terms or slack variables s with x[i]+s[i]=D[i]. It appears that Dt (size 30) is an upper bound, but it has different dimensions that xs (size 100). Slack variables are much more efficient than using binary variables.
D = np.array(100)
x = m.Array(m.Var,100,lb=0,ub=2000000)
The modified problem has 6000 binary variables and 100 continuous variables. There are 2^6000 potential combinations of those variables so it may take a while to solve, even with the efficient branch and bound method of APOPT. Here are a few suggestions for the modified problem:
Use matrix multiplications when possible. Below is an example of matrix operations with Gekko.
from gekko import GEKKO
import numpy as np
m = GEKKO(remote=False)
ni = 3; nj = 2; nk = 4
# solve AX=B
A = m.Array(m.Var,(ni,nj),lb=0)
X = m.Array(m.Var,(nj,nk),lb=0)
AX = np.dot(A,X)
B = m.Array(m.Var,(ni,nk),lb=0)
# equality constraints
m.Equations([AX[i,j]==B[i,j] for i in range(ni) \
for j in range(nk)])
m.Equation(5==m.sum([m.sum([A[i][j] for i in range(ni)]) \
for j in range(nj)]))
m.Equation(2==m.sum([m.sum([X[i][j] for i in range(nj)]) \
for j in range(nk)]))
# objective function
m.Minimize(m.sum([m.sum([B[i][j] for i in range(ni)]) \
for j in range(nk)]))
m.solve()
print(A)
print(X)
print(B)
Declare z1 and z2 variables as integer type with integer=True. Here is more information on using the integer type.
Solve locally with m=GEKKO(remote=False). The processing time will be large and the public server resets connections and deletes jobs every day. Switch to local mode to avoid a potential disruption.

Wrong result of sympy integration with symbol limits

from sympy import *
s = Symbol("s")
y = Symbol("y")
raw_function = 1/(150.0-0.5*y)
result = integrate(raw_function, (y, 0, s)
The above snippet gets a wrong result: -2.0*log(0.5*s - 150.0) + 10.0212705881925 + 2.0*I*pi,
but we can know the right result is -2.0*log(-0.5*s + 150.0) + 10.0212705881925, so what's wrong?
Are you sure about the correct result, WolframAlpha says it is the same as Sympy here.
Edit:
This function diverges (and the integral too) around y=300, see its plot here (it diverges the same way as 1/x does but offset to y=300)
Meaning that you are constrained to s < 300 to have a well defined (and finite) integrale. In that range, the value of the integral is equal to what sympy is providing you.

Given a exponential probability density function, how to generate random values using the random generator in Excel?

Based on a set of experiments, a probability density function (PDF) for an exponentially distributed variable was generated. Now the goal is to use this function in a Monte carlo simulation. I am vaguely familiar with PDF's and random generator, especially for normal and log-normal distributions. However, I am not quite able to figure this out. Would be great if someone can help.
Here's the function:
f = γ/2R * exp⁡(-γl/2R) (1-exp⁡(-γ) )^(-1) H (2R-l)
f is the probability density function,
1/γ is the mean of the distribution,
R is a known fixed variable,
H is the heaviside step function,
l is the variable that is exponentially distributed
Well. I don't know how to do it in Excel, but using inverse method it is easy to get the answer (assuming there is RANDOM() function which returns uniform numbers in the [0...1] range)
l = -(2R/γ)*LOG(1 - RANDOM()*(1-EXP(-γ)))
Easy to check boundary values
if RANDOM()=0, then l = 0
if RANDOM()=1, then l = 2R
UPDATE
So there is a PDF
PDF(l|R,γ) = γ/2R * exp⁡(-lγ/2R)/(1-exp⁡(-γ)), l in the range [0...2R]
First, check that it is normalized
∫ PDF(l|R,γ) dl from 0 to 2R = 1
Ok, it is normalized
Then compute CDF(l|R,γ)
CDF(l|R,γ) = ∫ PDF(l|R,γ) dl from 0 to l =
(1 - exp⁡(-lγ/2R))/(1-exp⁡(-γ))
Check again, CDF(l=2R|R,γ) = 1, good.
Now set CDF(l|R,γ)=RANDOM(), solve it wrt l and get your sampling expression. Check it at the RANDOM() returning 0 or RANDOM() returning 1, you should get end points of l interval.

Why KL divergence is giving nan? Is it some mathematical error or my input data is incorrect?

In the following code s returns nan. As each value in Q<1 so it returns a negative value when I take its log. Does it mean that I can not calculate KL divergence with these values of P and Q or can I fix it?
`P= np.array([1.125,3.314,2.7414])
Q=np.array([0.42369288, 0.89152044, 0.60905852])
for i in range(len(P)):
if P[i] != 0 and Q[i]!=0:
s= P[i] *np.log(P[i]/Q[i])
print("s: ",s)`
First of, P and Q should describe probability mass functions, meaning that each element should be in the interval [0,1] and they each should sum to 1, which is not the case for your examples.
The second np.log is wrong. Is there a reason you put it there or was it a typo? It should be P[i]*np.log(P[i]/Q[i]). You also want to perform the sum over all these terms for i.
Finally there is a technical issue of what to do if P[i] = 0. In that case np.log(0) would cause problems. The actual contribution of the term should be 0 in that case (because lim_{x->0} x*log(x) = 0). You can guarantee this, e.g. by handling this case specially with an if clause.
The case of Q[i] = 0 would cause similar issues, however the KL divergence doesn't exist if Q[i] = 0, but not P[i] = 0, anyway.

Algorithm to solve Local Alignment

Local alignment between X and Y, with at least one column aligning a C
to a W.
Given two sequences X of length n and Y of length m, we
are looking for a highest-scoring local alignment (i.e., an alignment
between a substring X' of X and a substring Y' of Y) that has at least
one column in which a C from X' is aligned to a W from Y' (if such an
alignment exists). As scoring model, we use a substitution matrix s
and linear gap penalties with parameter d.
Write a code in order to solve the problem efficiently. If you use dynamic
programming, it suffices to give the equations for computing the
entries in the dynamic programming matrices, and to specify where
traceback starts and ends.
My Solution:
I've taken 2 sequences namely, "HCEA" and "HWEA" and tried to solve the question.
Here is my code. Have I fulfilled what is asked in the question? If am wrong kindly tell me where I've gone wrong so that I will modify my code.
Also is there any other way to solve the question? If its available can anyone post a pseudo code or algorithm, so that I'll be able to code for it.
public class Q1 {
public static void main(String[] args) {
// Input Protein Sequences
String seq1 = "HCEA";
String seq2 = "HWEA";
// Array to store the score
int[][] T = new int[seq1.length() + 1][seq2.length() + 1];
// initialize seq1
for (int i = 0; i <= seq1.length(); i++) {
T[i][0] = i;
}
// Initialize seq2
for (int i = 0; i <= seq2.length(); i++) {
T[0][i] = i;
}
// Compute the matrix score
for (int i = 1; i <= seq1.length(); i++) {
for (int j = 1; j <= seq2.length(); j++) {
if ((seq1.charAt(i - 1) == seq2.charAt(j - 1))
|| (seq1.charAt(i - 1) == 'C') && (seq2.charAt(j - 1) == 'W')) {
T[i][j] = T[i - 1][j - 1];
} else {
T[i][j] = Math.min(T[i - 1][j], T[i][j - 1]) + 1;
}
}
}
// Strings to store the aligned sequences
StringBuilder alignedSeq1 = new StringBuilder();
StringBuilder alignedSeq2 = new StringBuilder();
// Build for sequences 1 & 2 from the matrix score
for (int i = seq1.length(), j = seq2.length(); i > 0 || j > 0;) {
if (i > 0 && T[i][j] == T[i - 1][j] + 1) {
alignedSeq1.append(seq1.charAt(--i));
alignedSeq2.append("-");
} else if (j > 0 && T[i][j] == T[i][j - 1] + 1) {
alignedSeq2.append(seq2.charAt(--j));
alignedSeq1.append("-");
} else if (i > 0 && j > 0 && T[i][j] == T[i - 1][j - 1]) {
alignedSeq1.append(seq1.charAt(--i));
alignedSeq2.append(seq2.charAt(--j));
}
}
// Display the aligned sequence
System.out.println(alignedSeq1.reverse().toString());
System.out.println(alignedSeq2.reverse().toString());
}
}
#Shole
The following are the two question and answers provided in my solved worksheet.
Aligning a suffix of X to a prefix of Y
Given two sequences X and Y, we are looking for a highest-scoring alignment between any suffix of X and any prefix of Y. As a scoring model, we use a substitution matrix s and linear gap penalties with parameter d.
Give an efficient algorithm to solve this problem optimally in time O(nm), where n is the length of X and m is the length of Y. If you use a dynamic programming approach, it suffices to give the equations that are needed to compute the dynamic programming matrix, to explain what information is stored for the traceback, and to state where the traceback starts and ends.
Solution:
Let X_i be the prefix of X of length i, and let Y_j denote the prefix of Y of length j. We compute a matrix F such that F[i][j] is the best score of an alignment of any suffix of X_i and the string Y_j. We also compute a traceback matrix P. The computation of F and P can be done in O(nm) time using the following equations:
F[0][0]=0
for i = 1..n: F[i][0]=0
for j = 1..m: F[0][j]=-j*d, P[0][j]=L
for i = 1..n, j = 1..m:
F[i][j] = max{ F[i-1][j-1]+s(X[i-1],Y[j-1]), F[i-1][j]-d, F[i][j-1]-d }
P[i][j] = D, T or L according to which of the three expressions above is the maximum
Once we have computed F and P, we find the largest value in the bottom row of the matrix F. Let F[n][j0] be that largest value. We start traceback at F[n][j0] and continue traceback until we hit the first column of the matrix. The alignment constructed in this way is the solution.
Aligning Y to a substring of X, without gaps in Y
Given a string X of length n and a string Y of length m, we want to compute a highest-scoring alignment of Y to any substring of X, with the extra constraint that we are not allowed to insert any gaps into Y. In other words, the output is an alignment of a substring X' of X with the string Y, such that the score of the alignment is the largest possible (among all choices of X') and such that the alignment does not introduce any gaps into Y (but may introduce gaps into X'). As a scoring model, we use again a substitution matrix s and linear gap penalties with parameter d.
Give an efficient dynamic programming algorithm that solves this problem optimally in polynomial time. It suffices to give the equations that are needed to compute the dynamic programming matrix, to explain what information is stored for the traceback, and to state where the traceback starts and ends. What is the running-time of your algorithm?
Solution:
Let X_i be the prefix of X of length i, and let Y_j denote the prefix of Y of length j. We compute a matrix F such that F[i][j] is the best score of an alignment of any suffix of X_i and the string Y_j, such that the alignment does not insert gaps in Y. We also compute a traceback matrix P. The computation of F and P can be done in O(nm) time using the following equations:
F[0][0]=0
for i = 1..n: F[i][0]=0
for j = 1..m: F[0][j]=-j*d, P[0][j]=L
for i = 1..n, j = 1..m:
F[i][j] = max{ F[i-1][j-1]+s(X[i-1],Y[j-1]), F[i][j-1]-d }
P[i][j] = D or L according to which of the two expressions above is the maximum
Once we have computed F and P, we find the largest value in the rightmost column of the matrix F. Let F[i0][m] be that largest value. We start traceback at F[i0][m] and continue traceback until we hit the first column of the matrix. The alignment constructed in this way is the solution.
Hope you get some idea about wot i really need.
I think it's quite easy to find resources or even the answer by google...as the first result of the searching is already a thorough DP solution.
However, I appreciate that you would like to think over the solution by yourself and are requesting some hints.
Before I give out some of the hints, I would like to say something about designing a DP solution
(I assume you know this can be solved by a DP solution)
A dp solution basically consisting of four parts:
1. DP state, you have to self define the physical meaning of one state, eg:
a[i] := the money the i-th person have;
a[i][j] := the number of TV programmes between time i and time j; etc
2. Transition equations
3. Initial state / base case
4. how to query the answer, eg: is the answer a[n]? or is the answer max(a[i])?
Just some 2 cents on a DP solution, let's go back to the question :)
Here's are some hints I am able to think of:
What is the dp state? How many dimensions are enough to define such a state?
Thinking of you are solving problems much alike to common substring problem (on 2 strings),
1-dimension seems too little and 3-dimensions seems too many right?
As mentioned in point 1, this problem is very similar to common substring problem, maybe you should have a look on these problems to get yourself some idea?
LCS, LIS, Edit Distance, etc.
Supplement part: not directly related to the OP
DP is easy to learn, but hard to master. I know a very little about it, really cannot share much. I think "Introduction to algorithm" is a quite standard book to start with, you can find many resources, especially some ppt/ pdf tutorials of some colleges / universities to learn some basic examples of DP.(Learn these examples is useful and I'll explain below)
A problem can be solved by many different DP solutions, some of them are much better (less time / space complexity) due to a well-defined DP state.
So how to design a better DP state or even get the sense that one problem can be solved by DP? I would say it's a matter of experiences and knowledge. There are a set of "well-known" DP problems which I would say many other DP problems can be solved by modifying a bit of them. Here is a post I just got accepted about another DP problem, as stated in that post, that problem is very similar to a "well-known" problem named "matrix chain multiplication". So, you cannot do much about the "experience" part as it has no express way, yet you can work on the "knowledge" part by studying these standard DP problems first maybe?
Lastly, let's go back to your original question to illustrate my point of view:
As I knew LCS problem before, I have a sense that for similar problem, I may be able to solve it by designing similar DP state and transition equation? The state s(i,j):= The optimal cost for A(1..i) and B(1..j), given two strings A & B
What is "optimal" depends on the question, and how to achieve this "optimal" value in each state is done by the transition equation.
With this state defined, it's easy to see the final answer I would like to query is simply s(len(A), len(B)).
Base case? s(0,0) = 0 ! We can't really do much on two empty string right?
So with the knowledge I got, I have a rough thought on the 4 main components of designing a DP solution. I know it's a bit long but I hope it helps, cheers.

Resources