I have been looking for the GraphX on Spark documentation and I am trying to work out how to calculate all the 2 and potentially further step connections in the graph.
If I have the following structure
A -> b
b -> C
b -> D
Then A is connected to C and D via B (A -> b -> C) and (A -> b -> D)
I was having a look at the connected components functions but not sure how you would extend it to this. In reality b will be a different vertex type but not sure if this has an effect or not.
Any suggestions would be greatly appreciated I am pretty new to GraphX
It seems you just need to use collectNeighborIds action, and then join with reversed copy of itself. I wrote some code:
val graph : Graph[Int, Int] = ...
val bros = graph.collectNeighborIds(EdgeDirection.Out)
val flat = bros.flatMap(x => x._2.map(y => (y, x._1)))
val brosofbros : RDD[(VertexId, Array[VertexId])]= flat.join(bros)
.map(x => (x._2._1, x._2._2))
.reduceByKey(_ ++ _)
Finally 'brosofbros' contains vertex id and all its second neighbors, in you example it would be [A, Array[C, D]]. (but there is not B vertex)
Related
I am trying to add an edge to a graph in Haskell where the graph data type is defined as:
data Graph a = Graph [(a, [a])]
deriving (Show)
Effectively, my data type is defined as: A Graph is a list of nodes stored as tuples, where the first element is the node value and the second element is a list of its edges (i.e., which other nodes it is connected to).
When inserting an edge between two nodes (u,v), you first have to check that both nodes exist and that the edge does not already exist. That is, that (u, [a,b...n]) and (v,[a,b...n]) their respective lists of edges does not contain u or v. Thus, I have two functions that performs this check. If this check passes and due to that my data type is defined as above mentioned, I must insert u and v in respective node's lists. After this, I must return a new Graph.
addEdge :: Eq a => Graph a -> (a, a) -> Graph a
addEdge g (u, v)
| (existsNode u g) && (existsNode v g) && (not (existsEdge (u, v) g)) = undefined
| otherwise = g
-- Check if an edge between two nodes exists
existsEdge :: Eq a => (a, a) -> Graph a -> Bool
existsEdge (u, v) g = elem u (getEdges (u, v) g) && elem v (getEdges (u, v) g)
getEdges :: Eq a => (a, a) -> Graph a -> [a]
getEdges (u, v) g =
let rightNeighbours = getNodeNeighbours (getNode u g)
leftNeihbours = getNodeNeighbours (getNode v g)
in rightNeighbours ++ leftNeihbours
-- Get a node given its node id
getNode :: Eq a => a -> Graph a -> (a, [a])
getNode y (Graph []) = (y, [])
getNode y (Graph (x : xs))
| y == fst x = x
| otherwise = getNode y (Graph xs)
I need a way to fill the part where undefined is now with something that takes each node u and v, and appends v to u's neighbors list (u,[v]) and vice versa (v, [u]), that results in returning a Graph.
For example, if we have that:
g = (Graph [(1,[2]),(2,[1]), (3,[2])])
And we want to add an edge between nodes 3 and 1. We first see that both nodes exist, and that an edge does not already exist -->
g = (Graph [(1,[2,3]),(2,[1]), (3,[2,1])])
Is there a way to do this in Haskell?
This is a nice example for a common rule of thumb:
Avoid boolean checks and then conditionally doing something. Instead write functions that attempt doing something, and fail if it's not possible.
Your task can be broken down to:
Pick out both of the nodes between which you want the edge (if they exist). You should do this in a way that allows also tweaking them, not just reading out.
Modify the edges-list of one of the nodes, inserting the new edge (if it doesn't exist).
The failure cases should be associated with a Maybe type: if an operation isn't possible, it's typically a bad idea to have it silently fail, instead you should make it clear that the update wasn't applied and then leave it to the caller to ignore it or do something else.
addEdge :: Eq a => Graph a -> (a, a) -> Maybe (Graph a)
Now, what do I mean by “do this in a way that allows also tweaking [the nodes]”? It means that you should return a view into the old version, as well as a function that, given a modified value of the node, reconstructs the corresponding modified version of the entire graph. Specifically, you want to modify the outgoing edges. (Modifying the node's key would require also updating all other nodes that may point to it; we don't need that.)
getNode :: Eq a => a -> Graph a -> Maybe ((a, [a]), [a] -> Graph a)
This signature looks a bit daunting; let's introduce some type synonyms to make it easier:
type Node a = (a, [a])
type NodeView a = (Node a, [a] -> Graph a)
getNode :: Eq a => a -> Graph a -> Maybe (NodeView a)
The main change in implementation is that you need to “remember” the nodes in the list that were skipped because they didn't match. I would do this with an auxiliary function
getNode' :: Eq a => a -> Graph a
-> [Node a] -- ^ The nodes that need to be prepended in the reconstructed graph
-> Maybe (NodeView a)
getNode' _ _ (Graph []) = Nothing -- Node was not found
getNode' y prep (Graph ((x,es):ns))
| x==y -- Node was found, now create a view to it:
= Just ( (x,es)
, \es' -> Graph $ prep ++ (x,es') : xs )
| otherwise -- Not found yet, but maybe later. Add currently tried node to the prep-list.
= getNode' y ((x,es):prep) (Graph ns)
Obs: the above makes a change to the reconstructed graph that may or may not matter for you. Can you spot what it is?
In practice, you shouldn't use getNode' directly, but always call it with empty prep list. In fact you should probably make it a local “loop body” function:
getNode :: Eq a => a -> Graph a -> Maybe (NodeView a)
getNode y = go y []
where go _ _ (Graph []) = Nothing
go y prep (Graph ((x,es):ns)) = ...
For the rest of the task, you can use the NodeView given by this function as a helper to create the whole updated graph.
The Control.Arrow.Operations.ArrowCircuit class is for:
An arrow type that can be used to interpret synchronous circuits.
I want to know what synchronous means here. I looked it up on Wikipedia, where they are speaking of digital electronics. My electronics is quite rusty, so here is the question: what is wrong (if anything is) with such an instance for the so-called asynchronous stream processors:
data StreamProcessor a b = Get (a -> StreamProcessor a b) |
Put b (StreamProcessor a b) |
Halt
instance Category StreamProcessor where
id = Get (\ x -> Put x id)
Put c bc . ab = Put c (bc . ab)
Get bbc . Put b ab = (bbc b) . ab
Get bbc . Get aab = Get $ \ a -> (Get bbc) . (aab a)
Get bbc . Halt = Halt
Halt . ab = Halt
instance Arrow StreamProcessor where
...
getThroughBlocks :: [a] -> StreamProcessor a b -> StreamProcessor a b
getThroughBlocks ~(a : input) (Get f) = getThroughBlocks input (f a)
getThroughBlocks _input putOrHalt = putOrHalt
getThroughSameArgBlocks :: a -> StreamProcessor a b -> StreamProcessor a b
getThroughSameArgBlocks = getThroughBlocks . repeat
instance ArrowLoop StreamProcessor where
loop Halt = Halt
loop (Put (c, d) bdcd') = Put c (loop bdcd')
loop (Get f) = Get $ \ b ->
let
Put (c, d) bdcd' = getThroughSameArgBlocks (b, d) (f (b, d))
in Put c (loop bdcd')
instance ArrowCircuit StreamProcessor where
delay b = Put b id
I reckon this solution to work for us as: we want someArrowCircuit >>> delay b to be someArrowCircuit delayed by one tick with b coming before anything from it. It is easy to see we get what we want:
someArrowCircuit >>> delay b
= someArrowCircuit >>> Put b id
= Put b id . someArrowCircuit
= Put b (id . someArrowCircuit)
= Put b someArrowCircuit
Are there any laws for such a class? If I made no mistake writing delay down, how does synchronous live alongside asynchronous?
The only law that I know of related to ArrowCircuit is actually for the similar ArrowInit class from Causal Commutative Arrows, which says that delay i *** delay j = delay (i,j). I'm pretty sure your version satisfies this (and it looks like a totally reasonable implementation), but it still feels a little strange considering that StreamProcessor isn't synchronous.
Particularly, synchronous circuits follow a pattern of a single input producing a single output. For example, if you have a Circuit a b and provide it a value of type a, then you will get one and only one output b. The "one-tick delay" that delay introduces is thus a delay of one output by one step.
But things are a little funky for asynchronous circuits. Let's consider an example:
runStreamProcessor :: StreamProcessor a b -> [a] -> [b]
runStreamProcessor (Put x s) xs = x : runStreamProcessor s xs
runStreamProcessor _ [] = []
runStreamProcessor Halt _ = []
runStreamProcessor (Get f) (x:xs) = runStreamProcessor (f x) xs
multiplyOneThroughFive :: StreamProcessor Int Int
multiplyOneThroughFive = Get $ \x ->
Put (x*1) $ Put (x*2) $ Put (x*3) $ Put (x*4) $ Put (x*5) multiplyOneThroughFive
Here, multiplyOneThroughFive produces 5 outputs for each input it receives. Now, consider the difference between multiplyOneThroughFive >>> delay 100 and delay 100 >>> multiplyOneThroughFive:
> runStreamProcessor (multiplyOneThroughFive >>> delay 100) [1,2]
[100,1,2,3,4,5,2,4,6,8,10]
> runStreamProcessor (delay 100 >>> multiplyOneThroughFive) [1,2]
[100,200,300,400,500,1,2,3,4,5,2,4,6,8,10]
Inserting the delay at a different point in the circuit actually caused us to produce a different number of results. Indeed, it seems as if the circuit as a whole underwent a 5-tick delay instead of just a 1-tick delay. This would definitely be unexpected behavior in a synchronous environment!
I'm still trying to grasp an intuition of pullbacks (from category theory), limits, and universal properties, and I'm not quite catching their usefulness, so maybe you could help shed some insight on that as well as verifying my trivial example?
The following is intentionally verbose, the pullback should be (p, p1, p2), and (q, q1, q2) is one example of a non-universal object to "test" the pullback against to see if things commute properly.
-- MY DIAGRAM, A -> B <- C
type A = Int
type C = Bool
type B = (A, C)
f :: A -> B
f x = (x, True)
g :: C -> B
g x = (1, x)
-- PULLBACK, (p, p1, p2)
type PL = Int
type PR = Bool
type P = (PL, PR)
p = (1, True) :: P
p1 = fst
p2 = snd
-- (g . p2) p == (f . p1) p
-- TEST CASE
type QL = Int
type QR = Bool
type Q = (QL, QR)
q = (152, False) :: Q
q1 :: Q -> A
q1 = ((+) 1) . fst
q2 :: Q -> C
q2 = ((||) True) . snd
u :: Q -> P
u (_, _) = (1, True)
-- (p2 . u == q2) && (p1 . u = q1)
I was just trying to come up with an example that fit the definition, but it doesn't seem particularly useful. When would I "look for" a pull back, or use one?
I'm not sure Haskell functions are the best context
in which to talk about pull-backs.
The pull-back of A -> B and C -> B can be identified with a subset of A x C,
and subset relationships are not directly expressible in Haskell's
type system. In your specific example the pull-back would be
the single element (1, True) because x = 1 and b = True are
the only values for which f(x) = g(b).
Some good "practical" examples of pull-backs may be found
starting on page 41 of Category Theory for Scientists
by David I. Spivak.
Relational joins are the archetypal example of pull-backs
which occur in computer science. The query:
SELECT ...
FROM A, B
WHERE A.x = B.y
selects pairs of rows (a,b) where a is a row from table A
and b is a row from table B and where some function of a
equals some other function of b. In this case the functions
being pulled back are f(a) = a.x and g(b) = b.y.
Another interesting example of a pullback is type unification in type inference. You get type constraints from several places where a variable is used, and you want to find the tightest unifying constraint. I mention this example in my blog.
i am sitting with two guys on a haskell project and we have quite a bit of trouble with the following code:
I try to give you all the necessary information:
From the module Types.hs:
class Game g s | g -> s where
findPossibleMoves :: Player -> g -> [(s,g)]
identifyWinner :: g -> Player -> Maybe Player
data Player = Player1 | Player2 deriving (Eq,Ord)
and here is the code we want to implement:
generateGameTree :: Game g s => g -> Player -> Tree (g,Player) s
generateGameTree g p = ([ Node ((snd x),p) [] | x <- (findPossibleMoves p g )])
So we try to get this thing compiled but it wont work . It might be important to know how a tree looks like so thats the definition:
data Tree v e = Node v [(e,Tree v e)] deriving (Show,Eq,Ord)
We allready know, that the returntype of the function and our returntype doesn't match, but there must be another mistake in this.
We would appreciate any help , thanks in advance
You may want to split this operation into two smaller ones:
A generic unfold function first which, given a seed of type s and a function computing one layer of tree generates a whole tree by repeatedly calling itself on the next generation of seeds. This definition would have type:
unfold :: (s -> (v, [(e,s)])) -> s -> Tree v e
unfold next seed = _fillThisIn
You can then use this function to define your generateGameTree. The intuition behind using unfold is that the seed of type s represents the state of the game and the function next computes the possible new states after one move (together with the v and e "outputs").
To whom it may interest:
This works now without compiler error:
generateGameTree g p = Node (g,p) [ ((fst x),generateGameTree (snd x) (nextPlayer p)) | x <- (findPossibleMoves p g) ]
I'm new to Haskell and want to get the values of the leafs of a self-defined Tree into a record. I started with this.
data MyTree = A Int | B Int MyTree | C Double | D Double MyTree
test = B 1 ( B 1( D 0.02(A 2)))
data MyRecord = MyRecord {A, B :: Int, C :: Double, D :: (Int,Double)}
emptyRecord = MyRecord{a = 0, b = 0, c = 0, d =(0,0)}
Now i started like this:
MyTree2MyRecord :: MyTree -> MyRecord
MyTree2MyRecord(A a1) = emptyRecord{a = a1}
MyTree2MyRecord(B b1 myTree) = emptyRecord {b = b1}
MyTree2MyRecord(C c1) = emptyRecord {c = c1}
MyTree2MyRecord(D d1 myTree) = emptyRecord {d = d1}
where mytree = MyTree2MyRecord -{dont know the recursive call to iterate through the tree and get the values of the leafs}
I understand the simple examples like sum up the leafs of a tree etc, but cant figure out a solution for this problem. I would really appreciate a small hint. Thanks guys
There are some problems with your code:
Function name should start with small case letters. Hence the record data structure should be like this:
data MyRecord = MyRecord {a, b :: Int, c :: Double, d :: (Int,Double)}
And your function name should be like this:
myTree2MyRecord :: MyTree -> MyRecord
Now going through your actual code, you seem to be in almost right path. But in order to find the solution, you have to answer some of these questions:
What exactly does the Int denote in d :: (Int, Double) ?
When you are pattern matching for data constructors like A, B etc., you seem to be assigning the values of the MyTree ADT to the record. Think about the recursive case there. What output do you want for input like this:
myTree2MyRecord (B 1 (B 1 (A 1)))
Your current implementation just discards the value of MyTree in it:
myTree2MyRecord(B b1 myTree) = emptyRecord {b = b1}
Instead of discarding think what should happen to myTree in this case. This will lead to a recursive solution.
And finally, formulate what output do you want for the input which you have given for the question:
test = B 1 ( B 1( D 0.02(A 2)))
Once you answer these question, I think you will refactor the code and solve this problem yourself.