Fitting of GLM with statsmodels - python-3.x

Python's statsmodels module offers a set of methods to estimate GLM as illustrated in https://www.statsmodels.org/devel/examples/notebooks/generated/glm.html
e.g.
glm_binom = sm.GLM(data.endog, data.exog, family=sm.families.Binomial())
What is the link function in above example? Is it logit link? How can I use other link like loglog?
I tried below without any success
glm_binom = sm.GLM(data.endog, data.exog, family=sm.families.Binomial(link = 'loglog'))
Any pointer will be very helpful

In the latest statsmodels stable release (currently v0.13.2), only the following link functions are available for each sm.families.family:
Family
ident
log
logit
probit
cloglog
pow
opow
nbinom
loglog
logc
Gaussian
X
X
X
X
X
X
X
X
X
Inv Gaussian
X
X
X
Binomial
X
X
X
X
X
X
X
X
X
Poisson
X
X
X
Neg Binomial
X
X
X
X
Gamma
X
X
X
Tweedie
X
X
X
Alternatively, the list of available link functions can be obtained by:
sm.families.family.<familyname>.links
Lastly, in order to change the default link function of the GLM in statsmodels you need to specify the link parameter in the family parameter:
sm.GLM(y, X, family=sm.families.Binomial(link=sm.families.links.loglog()))
P.S. The default link for the Binomial family is the logit link.

Related

distribution of a vector (U,V)

I'm struggling with the following statistics exercise, any help would be really appreciated
find the distribution of the vector (U,V) given that
U = X*Y and V = (1-X)*Y
X and Y are independent
X assumes beta distribution with parameters (a1,a2)
Y assumes chi-squared distribution with parameter 2*(a1+a2)

Unresolved top level overloading

Task is to find all two-valued numbers representable as the sum of the sqrt's of two natural numbers.
I try this:
func = [sqrt (x) + sqrt (y) | x <- [10..99], y <- [10..99], sqrt (x) `mod` 1 == 0, sqrt (y) `mod` 1 == 0]
Result:
Unresolved top-level overloading Binding : func
Outstanding context : (Integral b, Floating b)
How can I fix this?
This happens because of a conflict between these two types:
sqrt :: Floating a => a -> a
mod :: Integral a => a -> a -> a
Because you write mod (sqrt x) 1, and sqrt is constrained to return the same type as it takes, the compiler is left trying to find a type for x that simultaneously satisfies the Floating constraint of sqrt and the Integral constraint of mod. There are no types in the base library that satisfy both constraints.
A quick fix is to use mod' :: Real a => a -> a -> a:
import Data.Fixed
func = [sqrt (x) + sqrt (y) | x <- [10..99], y <- [10..99], sqrt (x) `mod'` 1 == 0, sqrt (y) `mod'` 1 == 0]
However, from the error you posted, it looks like you may not be using GHC, and mod' is probably a GHC-ism. In that case you could copy the definition (and the definition of the helper function div') from here.
But I recommend a more involved fix. The key observation is that if x = sqrt y, then x*x = y, so we can avoid calling sqrt at all. Instead of iterating over numbers and checking if they have a clean sqrt, we can iterate over square roots; their squares will definitely have clean square roots. A straightforward application of this refactoring might look like this:
sqrts = takeWhile (\n -> n*n <= 99)
. dropWhile (\n -> n*n < 10)
$ [0..]
func = [x + y | x <- sqrts, y <- sqrts]
Of course, func is a terrible name (it's not even a function!), and sqrts is a constant we could compute ourselves, and is so short we should probably just inline it. So we might then simplify to:
numberSums = [x + y | x <- [4..9], y <- [4..9]]
At this point, I would be wondering whether I really wanted to write this at all, preferring just
numberSums = [8..18]
which, unlike the previous iteration, doesn't have any duplicates. It has lost all of the explanatory power of why this is an interesting constant, though, so you would definitely want a comment.
-- sums of pairs of numbers, each of whose squares lies in the range [10..99]
numberSums = [8..18]
This would be my final version.
Also, although the above definitions were not parameterized by the range to search for perfect squares in, all the proposed refactorings can be applied when that is a parameter; I leave this as a good exercise for the reader to check that they have understood each change.

How do curried functions work in Haskell?

I am learning Haskell. I got to know that any function in Haskell can take only one argument. So, if you see a function max 2 4; it actually is (max 2) 4. What they say is that 1st 2 is applied (as a parameter) to max which returns a functions, that takes 4 as the parameter. What I fail to understand is what happens when 2 is applied to max? What does it mean that it returns a function called (max 2)?
Let me give another example, to make my question more clear. Take this function: multiply x y x = x*y*z. They say it actually is evaluated this way: ((multiply x) y) z. Now I give this input: multiply 2*4*5
How is this evaluated?
multiply 2
returns (multiply 2) and 4 is applied as parameter:
(multiply 2) 4
Now what does this return -- ((multiply 2) 4) or multiply 8? If it multiplies 4 and 2 at this step, how does Haskell know that it has to do that (because the function can multiply only 3 parameters)?
Just think it mathematically: suppose there is a function taking two variables: f(x, y). Fix x=2 would give you a new function with one variable: g(y)=f(2, y)
If f(x, y) = max(x, y) which gives the maximum of x and y, g(y) = f(2, y) = max(2, y) gives the maximum of 2 and y.
For f(x, y, z) = x * y * z, g(y, z) = f(2, y, z) = 2 * y * z, and h(z) = g(4, z) = f(2, 4, z) = 2 * 4 * z.
Also you can fix x=2 and z=4 to form p(y) = f(2, y, 4). In Haskell it is
\y -> multiply 2 y 4
For the implementation, Haskell would not actually multiply 2 and 4 because it's lazy evaluated. That is, it would not compute a value until it has to.

Why is this symmetry assertion wrong?

I am really confused over why there is always a counter example to my following assertion.
//assertions must NEVER by wrong
assert Symmetric{
all r: univ -> univ | some ~r iff (some x, y: univ | x not in y and y not in x and
(x->y in r) and (y->x in r))
}
check Symmetric
The counter-example always shows 1 element in univ set. However, this should not be the case since I specified that there will be some ~r iff x not in y and y not in x. The only element should not satisfy this statement.
Yet why does the model keep showing a counterexample to my assertion?
---INSTANCE---
integers={}
univ={Univ$0}
Int={}
seq/Int={}
String={}
none={}
this/Univ={Univ$0}
skolem $Symmetric_r={Univ$0->Univ$0}
Would really appreciate some guidance!
In Alloy, assertions are used to check the correctness of logic sentences (properties of your model), not to specify properties that should always hold in your model. So you didn't specify that
there will be some ~r iff x not in y and y not in x
you instead asked Alloy whether it is true that for all binary relations r, some ~r iff x not in y and y not in x [...], and Alloy answered that it is not true, and gave you a concrete example (counterexample) in which that property doesn't hold.
A couple other points
some ~r doesn't mean "r is symmetric"; it simply means that the transpose of r is non-empty, which is not the same. A binary relation is symmetric if it is equal to its transpose, so you can write r = ~r to express that;
instead of some x, y: univ | x not in y and y not in x and [...] you can equivalently write some disj x, y: univ | [...];
however, that some expression doesn't really express the symmetry property, because all it says is that "there are some x, y such that both x->y and y->x are in r"; instead, you want to say something like "for all x, y, if x->y is in r, then y->x is in r too".

Find triangle in the graph

I have a graph like this:
As part of a homework assignment I want to find the triangle (1->2->5). I have no idea how to find this.
In my case, I defined my graph:
type Graph = (Int, Int -> Int -> Bool)
g 2 3 = True
g 3 2 = True
g 1 2 = True
g 2 1 = True
g 1 1 = True
g n m = False
Answer to 2 comment.
I did this and it works, I think.
triangles :: [(Int, Int, Int)]
triangles = [(x, y, z) | x <- [1..3], y <- [1..x], z <- [1..y], isTriangle (x, y, z)]
isTriangle :: (Int, Int, Int) -> Bool
isTriangle (x, y, z) = g x y && g y z && g x z
I removed (_,g) and (n,g) (I dont understand why we need them :)
I call trinagles and it return (1,1,1) (2,1,1) (in my case). Is it right?
I guess the first Int of Graph is a bound for your nodes (like, 6 if the nodes are in [1..6]).
Therefore, you would like a function that returns the triangles of a graph, so the type might be:
triangles :: Graph -> [(Int, Int, Int)]
Now, a triangle exists whenever, for 3 nodes, say x y and z, all the combinations return True through g.
So, you might want to consider generating all these combinations (possibly avoiding the ones that are equivalent via re-ordering), and filter out only those that validate the criterion:
isTriangle :: Graph -> (Int, Int, Int) -> Bool
isTriangle (_, g) (x, y, z) == g x y && g y z && g x z
For this, you could use a list comprehension, or the function filter which has type (a -> Bool) -> [a] -> [a]
Answer to your first comment:
First, you would need to implement the triangles function, which is the reason of the error. But, as you have done in test, you could simply generate these triangles on the fly.
Now, you wrote:
test = filter (isTriangle) [(x,y,z) | x <- [1..3], y <- [1..3], z <- [1..3]]
Two things about this:
First, you wouldn't need the parentheses around isTriangle for what you wrote, but it is incorrect, since isTriangle expects a graph as its first parameter
Second, you are going to obtain a lot of duplicates, and if you want, you can prevent this by not generating them in the first place:
test = filter (isTriangle) [(x,y,z) | x <- [1..3], y <- [1..x], z <- [1..y]]
Alternatively, you can dismiss the filter function by providing a guard in the list comprehension syntax, as this:
[(x, y, z) | x <- [1..3], y <- [1..x], z <- [1..y], isTriangle yourGraph (x, y, z)]
Now, I'll let you go on with the details. You will want to make this a function that takes a graph, and to replace this 3 by the number of nodes in the graph, and yourGraph by said graph.
Since you chose to use list comprehension, forget about the generating function that I wrote about earlier, its purpose was just to generate input for filter, but with the list comprehension approach you won't necessarily need it.
Answer to your second comment:
You want to write a function:
triangles :: Graph -> [(Int, Int, Int)]
triangles (n, g) = [(x, y, z) | ...]
The ... are to be replaced with the correct things, from earlier (ranges for x, y and z, as well as the predicate isTriangle).
Alternatively, you can cut this in two functions:
allTriangles :: Int -> [(Int, Int, Int)]
allTriangles n = [(x, y, z) | ...]
graphTriangles :: Graph -> [(Int, Int, Int)]
graphTriangles (n, g) = [t | t <- allTriangles n, isGraphTriangle t]
where isGraphTriangle (x, y, z) = ...
This way, you could potentially reuse allTriangles for something else. If you don't feel the need, you can stay with the one-shot big comprehension triangles, since it's a homework you probably won't build up on it.
I try not to fill all the ... so that you can do it yourself and hopefully understand :)
Correcting your solution:
First, my mistake on the ranges, it should be x <- [1..n], y <- [x+1..n], z <- [y+1..n] where n denotes the number of nodes in your graph. This way, you only capture triples where x < y < z, which ensures that you only see one occurence of each set of three points.
Second, the reason why I put the graph as a parameter to the functions is that you might want to reuse the same function for another graph. By hardcoding g and 6 in your functions, you make them really specific to the particular graph you described, but if you want to compute triangles on a certain number of graphs, you do not want to write one function per graph!
I don’t really like your graph type but whatever. Here’s the algorithm we will use:
First find a Node x of the graph.
For every other node y see if it connects to x
If y does connect to x then for each node z, see if it connects to x and y
If so then return it.
To avoid duplicates, we require z<x<y
nodes (n,_) = [1..n]
nodesBefore (n,_) k = [1..min n (k - 1)]
edge (_,e) x y = e x y
neighboursBefore g x = [ y | y <- nodesBefore g x, edge g x y]
triangles g = [(x,y,z) | x <- nodes g, y <- neighboursBefore g x, z <- neighboursBefore g y, edge g x z]

Resources