What will be the correct output? - python-3.x

I want to implement the following equation given in a paper “A Three-Layered Mutually Reinforced Model for
Personalized Citation Recommendation” (Page 4). According to the description in the paper, B should be a square matrix, whereas I am getting a vector.
I have tried the following code:
querySplit = query.split(',')
queryText = querySplit[0]
qt_tag = word_tokenize(queryText.rstrip().lower().translate(translator))
qt_vector = model.infer_vector(qt_tag)
def eq_b(query):
vecs = np.asarray(
[spatial.distance.cosine(spatial.distance.cosine(query, model.docvecs[i]), model.docvecs[i]) for i in
range(Docs_len)])
return vecs / vecs.sum()
b = eq_b(qt_vector)
print("B", b)

The formula you wrote for B is not correct. From the paper, B*Rt_p is equal to what you have, but not B itself. This means that the actual formula for the matrix is:
B=np.matmul(eq_b(qt_vector),transpose(Rt_p))/norm(Rt_p)^2
You basically add that extra stuff so that when you do the multiplication with Rt_p, all the terms involving Rt_p are cancelled and you are left with eq_b(qt_vector). The cancellation is due to the fact that
transpose(Rt_p)*Rt_p ==norm(Rt_p)^2

Related

How to understand this efficient implementation of PageRank calculation

For reference, I'm using this page. I understand the original pagerank equation
but I'm failing to understand why the sparse-matrix implementation is correct. Below is their code reproduced:
def compute_PageRank(G, beta=0.85, epsilon=10**-4):
'''
Efficient computation of the PageRank values using a sparse adjacency
matrix and the iterative power method.
Parameters
----------
G : boolean adjacency matrix. np.bool8
If the element j,i is True, means that there is a link from i to j.
beta: 1-teleportation probability.
epsilon: stop condition. Minimum allowed amount of change in the PageRanks
between iterations.
Returns
-------
output : tuple
PageRank array normalized top one.
Number of iterations.
'''
#Test adjacency matrix is OK
n,_ = G.shape
assert(G.shape==(n,n))
#Constants Speed-UP
deg_out_beta = G.sum(axis=0).T/beta #vector
#Initialize
ranks = np.ones((n,1))/n #vector
time = 0
flag = True
while flag:
time +=1
with np.errstate(divide='ignore'): # Ignore division by 0 on ranks/deg_out_beta
new_ranks = G.dot((ranks/deg_out_beta)) #vector
#Leaked PageRank
new_ranks += (1-new_ranks.sum())/n
#Stop condition
if np.linalg.norm(ranks-new_ranks,ord=1)<=epsilon:
flag = False
ranks = new_ranks
return(ranks, time)
To start, I'm trying to trace the code and understand how it relates to the PageRank equation. For the line under the with statement (new_ranks = G.dot((ranks/deg_out_beta))), this looks like the first part of the equation (the beta times M) BUT it seems to be ignoring all divide by zeros. I'm confused by this because the PageRank algorithm requires us to replace zero columns with ones (except along the diagonal). I'm not sure how this is accounted for here.
The next line new_ranks += (1-new_ranks.sum())/n is what I presume to be the second part of the equation. I can understand what this does, but I can't see how this translates to the original equation. I would've thought we would do something like new_ranks += (1-beta)*ranks.sum()/n.
This happens because in the row sums
e.T * M * r = e.T * r
by the column sum construction of M. The convex combination with coefficient beta has the effect that the sum over the new r vector is again 1. Now what the algorithm does is to take the first matrix-vector product b=beta*M*r and then find a constant c so that r_new = b+c*e has row sum one. In theory this should be the same as what the formula says, but in the floating point practice this approach corrects and prevents floating point error accumulation in the sum of r.
Computing it this way also allows to ignore zero columns, as the compensation for them is automatically computed.

how to convert a DFA to a regular expression?

I am reading the book: introduction to the theory of computation and got stuck on this example.
Convert a DFA to an equivalent expression by converting it first to a GNFA(generalized nondeterministic finite automaton) and then convert GNFA to a regular expression.
here is the example:
enter image description here
I should use this recursively to arrive at the the fourth state:
enter image description here
Unfortunately, I cannot understand what is going on from b to c? I only understand that we are trying to get rid of state 2, but how we arrive at c from b?
Thank you very much!
This can be quite tricky at first but I suggest you check definition 1.64 and see the function CONVERT(G) for more clearance. But as a brief explanation using the function for each possible neighbour state:
First from a to b, add a start state and a new accept state;
Afterwards you need to calculate each new path after qrip is removed, in this case state 1;
So, from start to q2, you get only label a from epsilon and a;
Same goes from start to q3, resulting only in b;
Now from q2 to q2 going trough qrip, you have label a to qrip and label a to get back, so you get (aa U b);
Same goes to q3 to q3 through qrip, so resulting in bb, notice that there is no loop in q3 so no union;
Now from q2 to q3 through qrip, you only need to concatenate a and b resulting in ab label;
Lastly the other way around, from q3 to q2 going through qrip, concatenate b and a resulting in ba but this time making the union with the previous path between q3 and q2;
Now choose a new qrip and proceed to do the same algorithm again.
Hope the explanation was clear enough, but as said before refer to the algorithm in the book for a better and more detailed explanation.
The two popular methods for converting a given DFA to its regular expression are-
Arden’s Method
State Elimination Method
Arden’s Theorem states that:
Let P and Q be two regular expressions over ∑.
To use Arden’s Theorem, the following conditions must be satisfied-
The transition diagram must not have any ∈ transitions.
There must be only a single initial state.
Step-01:
Form an equation for each state considering the transitions which come towards that state.
Add ‘∈’ in the equation of the initial state.
Step-02:
Bring the final state in the form R = Q + RP to get the required regular expression.
If P does not contain a null string ∈, then-
R = Q + RP has a unique solution i.e. R = QP*

Error in caret-svm - "NAs are not allowed in subscripted assignments"

experts. I am a beginner to R. I am trying to use caret-SVM to make classification. The kernel is svmPoly.
First, I used the default parameters to train the model with leave-one-out cross-validation
The code is :
ctrl <- trainControl(method = "LOOCV",
classProbs = T,
savePredictions = T,
repeats = 1)
modelFit <- train(group~.,data=table_svm,method="svmPoly",
preProc = c("center","scale"),
trControl = ctrl)
The best accuracy is 80%. And the final values used for the model were degree = 1, scale = 0.1 and C = 1 .
Second, I tried to tune the parameters.
The code is:
grid_svmpoly=expand.grid(degree=c(1:11),scale=seq(0,5,length.out=25),C=10^c(0:4))
modelFit_tune <- train(group~.,data=table_svm,method="svmPoly",
preProc = c("center","scale"),
tuneGrid=grid_svmpoly,
trControl = ctrl)
I got an error message: Error in { :
task 264 failed - "NAs are not allowed in subscripted assignments"
I checked the data and found no NA.
There must be some NA inside the data-set. I am not new to this but not much expert. To ensure there is no NA inside first convert data-set into matrix format using:
x <- data.matrix(dataframe)
then use which() function which very handy in this case:
which(is.na(x)==T)
I hope this will help you finding the answer. The values will be in row wise order.
Let me know if this resolve your query.

Can good type systems distinguish between matrices in different bases?

My program (Hartree-Fock/iterative SCF) has two matrices F and F' which are really the same matrix expressed in two different bases. I just lost three hours of debugging time because I accidentally used F' instead of F. In C++, the type-checker doesn't catch this kind of error because both variables are Eigen::Matrix<double, 2, 2> objects.
I was wondering, for the Haskell/ML/etc. people, whether if you were writing this program you would have constructed a type system where F and F' had different types? What would that look like? I'm basically trying to get an idea how I can outsource some logic errors onto the type checker.
Edit: The basis of a matrix is like the unit. You can say 1L or however many gallons, they both mean the same thing. Or, to give a vector example, you can say (0,1) in Cartesian coordinates or (1,pi/2) in polar. But even though the meaning is the same, the numerical values are different.
Edit: Maybe units was the wrong analogy. I'm not looking for some kind of record type where I can specify that the first field will be litres and the second gallons, but rather a way to say that this matrix as a whole, is defined in terms of some other matrix (the basis), where the basis could be any matrix of the same dimensions. E.g., the constructor would look something like mkMatrix [[1, 2], [3, 4]] [[5, 6], [7, 8]] and then adding that object to another matrix would type-check only if both objects had the same matrix as their second parameters. Does that make sense?
Edit: definition on Wikipedia, worked examples
This is entirely possible in Haskell.
Statically checked dimensions
Haskell has arrays with statically checked dimensions, where the dimensions can be manipulated and checked statically, preventing indexing into the wrong dimension. Some examples:
This will only work on 2-D arrays:
multiplyMM :: Array DIM2 Double -> Array DIM2 Double -> Array DIM2 Double
An example from repa should give you a sense. Here, taking a diagonal requires a 2D array, returns a 1D array of the same type.
diagonal :: Array DIM2 e -> Array DIM1 e
or, from Matt sottile's repa tutorial, statically checked dimensions on a 3D matrix transform:
f :: Array DIM3 Double -> Array DIM2 Double
f u =
let slabX = (Z:.All:.All:.(0::Int))
slabY = (Z:.All:.All:.(1::Int))
u' = (slice u slabX) * (slice u slabX) +
(slice u slabY) * (slice u slabY)
in
R.map sqrt u'
Statically checked units
Another example from outside of matrix programming: statically checked units of dimension, making it a type error to confuse e.g. feet and meters, without doing the conversion.
Prelude> 3 *~ foot + 1 *~ metre
1.9144 m
or for a whole suite of SI units and quanities.
E.g. can't add things of different dimension, such as volumes and lengths:
> 1 *~ centi litre + 2 *~ inch
Error:
Expected type: Unit DVolume a1
Actual type: Unit DLength a0
So, following the repa-style array dimension types, I'd suggest adding a Base phantom type parameter to your array type, and using that to distinguish between bases. In Haskell, the index Dim
type argument gives the rank of the array (i.e. its shape), and you could do similarly.
Or, if by base you mean some dimension on the units, using dimensional types.
So, yep, this is almost a commodity technique in Haskell now, and there's some examples of designing with types like this to help you get started.
This is a very good question. I don't think you can encode the notion of a basis in most type systems, because essentially anything that the type checker does needs to be able to terminate, and making judgments about whether two real-valued vectors are equal is too difficult. You could have (2 v_1) + (2 v_2) or 2 (v_1 + v_2), for example. There are some languages which use dependent types [ wikipedia ], but these are relatively academic.
I think most of your debugging pain would be alleviated if you simply encoded the bases in which you matrix works along with the matrix. For example,
newtype Matrix = Matrix { transform :: [[Double]],
srcbasis :: [Double], dstbasis :: [Double] }
and then, when you M from basis a to b with N, check that N is from b to c, and return a matrix with basis a to c.
NOTE -- it seems most people here have programming instead of math background, so I'll provide short explanation here. Matrices are encodings of linear transformations between vector spaces. For example, if you're encoding a rotation by 45 degrees in R^2 (2-dimensional reals), then the standard way of encoding this in a matrix is saying that the standard basis vector e_1, written "[1, 0]", is sent to a combination of e_1 and e_2, namely [1/sqrt(2), 1/sqrt(2)]. The point is that you can encode the same rotation by saying where different vectors go, for example, you could say where you're sending [1,1] and [1,-1] instead of e_1=[1,0] and e_2=[0,1], and this would have a different matrix representation.
Edit 1
If you have a finite set of bases you are working with, you can do it...
{-# LANGUAGE EmptyDataDecls #-}
data BasisA
data BasisB
data BasisC
newtype Matrix a b = Matrix { coefficients :: [[Double]] }
multiply :: Matrix a b -> Matrix b c -> Matrix a c
multiply (Matrix a_coeff) (Matrix b_coeff) = (Matrix multiplied) :: Matrix a c
where multiplied = undefined -- your algorithm here
Then, in ghci (the interactive Haskell interpreter),
*Matrix> let m = Matrix [[1, 2], [3, 4]] :: Matrix BasisA BasisB
*Matrix> m `multiply` m
<interactive>:1:13:
Couldn't match expected type `BasisB'
against inferred type `BasisA'
*Matrix> let m2 = Matrix [[1, 2], [3, 4]] :: Matrix BasisB BasisC
*Matrix> m `multiply` m2
-- works after you finish defining show and the multiplication algorithm
While I realize this does not strictly address the (clarified) question – my apologies – it seems relevant at least in relation to Don Stewart's popular answer...
I am the author of the Haskell dimensional library that Don referenced and provided examples from. I have also been writing – somewhat under the radar – an experimental rudimentary linear algebra library based on dimensional. This linear algebra library statically tracks the sizes of vectors and matrices as well as the physical dimensions ("units") of their elements on a per element basis.
This last point – tracking physical dimensions on a per element basis – is rather challenging and perhaps overkill for most uses, and one could even argue that it makes little mathematical sense to have quantities of different physical dimensions as elements in any given vector/matrix. However, some linear algebra applications of interest to me such as kalman filtering and weighted least squares estimation typically use heterogeneous state vectors and covariance matrices.
Using a Kalman filter as an example, consider a state vector x = [d, v] which has physical dimensions [L, LT^-1]. The next (future) state vector is predicted by multiplication by the state transition matrix F, i.e.: x' = F x_. Clearly for this equation to make sense F cannot be arbitrary but must have size and physical dimensions [[1, T], [T^-1, 1]]. The predict_x' function below statically ensures that this relationship holds:
predict_x' :: (Num a, MatrixVector f x x) => Mat f a -> Vec x a -> Vec x a
predict_x' f x_ = f |*< x_
(The unsightly operator |*< denotes multiplication of a matrix on the left with a vector on the right.)
More generally, for an a priori state vector x_ of arbitrary size and with elements of arbitrary physical dimensions, passing a state transition matrix f with "incompatible" size and/or physical dimensions to predict_x' will cause a compile time error.
In F# (which originally evolved from OCaml), you can use units of measure. Andrew Kenned, who designed the feature (and also created a very interesting theory behind it) has a great series of articles that demonstrate it.
This can quite likely be used in your scenario - although I don't fully understand the question. For example, you can declare two unit types like this:
[<Measure>] type litre
[<Measure>] type gallon
Adding litres and gallons gives you a compile time error:
1.0<litre> + 1.0<gallon> // Error!
F# doesn't automatically insert conversion between different units, but you can write a conversion function:
let toLitres gal = gal * 3.78541178<litre/gallon>
1.0<litre> + (toLitres 1.0<gallon>)
The beautiful thing about units of measure in F# is that they are automatically inferred and functions are generic. If you multiply 1.0<gallon> * 1.0<gallon>, the result is 1.0<gallon^2>.
People have used this feature for various things - ranging from conversion of virtual meters to screen pixels (in solar system simulations) to converting currencies (dollars in financial systems). Although I'm not expert, it is quite likely that you could use it in some way for your problem domain too.
If it's expressed in a different base, you can just add a template parameter to act as the base. That will differentiate those types. A float is a float is a float- if you don't want two float values to be the same if they actually have the same value, then you need to tell the type system about it.

Understanding Bayes' Theorem

I'm working on an implementation of a Naive Bayes Classifier. Programming Collective Intelligence introduces this subject by describing Bayes Theorem as:
Pr(A | B) = Pr(B | A) x Pr(A)/Pr(B)
As well as a specific example relevant to document classification:
Pr(Category | Document) = Pr(Document | Category) x Pr(Category) / Pr(Document)
I was hoping someone could explain to me the notation used here, what do Pr(A | B) and Pr(A) mean? It looks like some sort of function but then what does the pipe ("|") mean, etc?
Pr(A | B) = Probability of A happening given that B has already happened
Pr(A) = Probability of A happening
But the above is with respect to the calculation of conditional probability. What you want is a classifier, which uses this principle to decide whether something belongs to a category based on the previous probability.
See http://en.wikipedia.org/wiki/Naive_Bayes_classifier for a complete example
I think they've got you covered on the basics.
Pr(A | B) = Pr(B | A) x Pr(A)/Pr(B)
reads: the probability of A given B is the same as the probability of B given A times the probability of A divided by the probability of B. It's usually used when you can measure the probability of B and you are trying to figure out if B is leading us to believe in A. Or, in other words, we really care about A, but we can measure B more directly, so let's start with what we can measure.
Let me give you one derivation that makes this easier for writing code. It comes from Judea Pearl. I struggled with this a little, but after I realized how Pearl helps us turn theory into code, the light turned on for me.
Prior Odds:
O(H) = P(H) / 1 - P(H)
Likelihood Ratio:
L(e|H) = P(e|H) / P(e|¬H)
Posterior Odds:
O(H|e) = L(e|H)O(H)
In English, we are saying that the odds of something you're interested in (H for hypothesis) are simply the number of times you find something to be true divided by the times you find it not to be true. So, say one house is robbed every day out of 10,000. That means that you have a 1/10,000 chance of being robbed, without any other evidence being considered.
The next one is measuring the evidence you're looking at. What is the probability of seeing the evidence you're seeing when your question is true divided by the probability of seeing the evidence you're seeing when your question is not true. Say you are hearing your burglar alarm go off. How often do you get that alarm when it's supposed to go off (someone opens a window when the alarm is on) versus when it's not supposed to go off (the wind set the alarm off). If you have a 95% chance of a burglar setting off the alarm and a 1% chance of something else setting off the alarm, then you have a likelihood of 95.0.
Your overall belief is just the likelihood * the prior odds. In this case it is:
((0.95/0.01) * ((10**-4)/(1 - (10**-4))))
# => 0.0095009500950095
I don't know if this makes it any more clear, but it tends to be easier to have some code that keeps track of prior odds, other code to look at likelihoods, and one more piece of code to combine this information.
I have implemented it in Python. It's very easy to understand because all formulas for Bayes theorem are in separate functions:
#Bayes Theorem
def get_outcomes(sample_space, f_name='', e_name=''):
outcomes = 0
for e_k, e_v in sample_space.items():
if f_name=='' or f_name==e_k:
for se_k, se_v in e_v.items():
if e_name!='' and se_k == e_name:
outcomes+=se_v
elif e_name=='':
outcomes+=se_v
return outcomes
def p(sample_space, f_name):
return get_outcomes(sample_space, f_name) / get_outcomes(sample_space, '', '')
def p_inters(sample_space, f_name, e_name):
return get_outcomes(sample_space, f_name, e_name) / get_outcomes(sample_space, '', '')
def p_conditional(sample_space, f_name, e_name):
return p_inters(sample_space, f_name, e_name) / p(sample_space, f_name)
def bayes(sample_space, f, given_e):
sum = 0;
for e_k, e_v in sample_space.items():
sum+=p(sample_space, e_k) * p_conditional(sample_space, e_k, given_e)
return p(sample_space, f) * p_conditional(sample_space, f, given_e) / sum
sample_space = {'UK':{'Boy':10, 'Girl':20},
'FR':{'Boy':10, 'Girl':10},
'CA':{'Boy':10, 'Girl':30}}
print('Probability of being from FR:', p(sample_space, 'FR'))
print('Probability to be French Boy:', p_inters(sample_space, 'FR', 'Boy'))
print('Probability of being a Boy given a person is from FR:', p_conditional(sample_space, 'FR', 'Boy'))
print('Probability to be from France given person is Boy:', bayes(sample_space, 'FR', 'Boy'))
sample_space = {'Grow' :{'Up':160, 'Down':40},
'Slows':{'Up':30, 'Down':70}}
print('Probability economy is growing when stock is Up:', bayes(sample_space, 'Grow', 'Up'))
Pr(A | B): Conditional probability of A : i.e. probability of A, given that all we know is B
Pr(A) : Prior probability of A
Pr is the probability, Pr(A|B) is the conditional probability.
Check wikipedia for details.
the pipe (|) means "given".
The probability of A given B is equal to the probability of B given A x Pr(A)/Pr(B)
Based on your question I can strongly advise that you need to read some undergraduate book on Probability Theory first. Without this you will not advance properly with your task on Naive Bayes Classifier.
I would recommend you this book http://www.athenasc.com/probbook.html or look at MIT OpenCourseWare.
The pipe is used to represent conditional probability.
Pr(A | B) = Probability of A given B
Example:
Let's say you are not feeling well and you surf the web for the symptoms. And the internet tells you that if you have these symptoms then you have XYZ disease.
In this case:
Pr(A | B) is what you are trying to find out, which is:
The probability of you having XYZ GIVEN THAT you have certain symptoms.
Pr(A) is the probability of having the disease XYZ
Pr(B) is the probability of having those symptoms
Pr(B | A) is what you find out from the internet, which is:
The probability of having the symptoms GIVEN THAT you have the disease.

Resources