I was asked this question on an interview. Can someone give me some insight on how to do this? I was stumped
Often, greedy algorithms they are used as heuristics. An independent set in an
undirected graph G is a set of nodes I so that no edge has both endpoints
in I. In other words, if {u,v} included in set E, then either u not included in set I or v not included in set I. The maximum independent set problem is , given G, nd an independent set
of the largest possible size.
Implement a greedy algorithm for maximum independent set based on
including nodes of smallest degree.
Your greedy strategy based on nodes degree can be the following:
I := resulting set
V := set of unused vertices, initially all vertices
while V not empty:
v := vertex in V with smallest degree
I.add(v)
for each u adjacent to v:
V.remove(u)
return I
The strategy is greedy, because a single decision depends only on the local situation.
The greedy algorithm selects the elements of the probably maximum independent set S step by step.
After each step the set of vertices of the graph G is partitioned in three sets:
the set S of vertices selected so far
the set A of vertices that are adjacent to the vertices of S and therefore cannot be selected for the set S in future steps
the set R of remaining vertices that are neither in S nor in A.
In the next step one of the vertices v of R can be removed from R and added to the set S. All vertices adjacent to v must be removed from R, too, and added to A (if they are not already members of A). But which v from R should the greedy algorithm select? It should choose a vertex v from R such that the next R ( := current R \ ({v} union {vertices adjacent to v}) ) is as large as possible.
We define G[R] as the subgraph of G induced by R, this is the subgraph of G with vertices R and all edges of G that have their vertices in R. Then v should be a vertex with minimal degree in G[R] but not necessarily the vertex with minimal degree in G.
Related
Say I have a graph with several nodes. I need to design an algorithm which randomly creates directed edges between nodes while satisfying the following conditions:
each node has exactly one edge pointing to it
each node has exactly one edge pointing away from it
no node points to itself
For example, say my graph had three nodes, the following scenarios would be acceptable:
Node A points to B, B points to C, C points to A
Node A points to C, C points to B, B points to A
Does anyone know what the most efficient way of doing this would be? I'm using nodejs btw. For argument's sake, we can say that I am starting with an array containing the names of the nodes.
Thanks
lets define you have array of vertex: V = {v}; |V| = N, now we can shuffle array of vertex by using any random shuffle algorithm.
V = [v_1, v_2, v_3,..,v_n]
Now we can define N-1 edges E, where e[i] = (v[i] to v[i + 1]), and the last vertex will be (v[N-1] to v[0])
I'm developing an optimization problem that is a variant on Traveling Salesman. In this case, you don't have to visit all the cities, there's a required start and end point, there's a min and max bound on the tour length, you can traverse each arc multiple times if you want, and you have a nonlinear objective function that is associated with the arcs traversed (and number of times you traverse each arc). Decision variables are integers, how many times you traverse each arc.
I've developed a nonlinear integer program in Pyomo and am getting results from the NEOS server. However I didn't put in subtour constraints and my results are two disconnected subtours.
I can find integer programming formulations of TSP that say how to formulate subtour constraints, but this is a little different from the standard TSP and I'm trying to figure out how to start. Any help that can be provided would be greatly appreciated.
EDIT: problem formulation
50 arcs , not exhaustive pairs between nodes. 50 Decision variables N_ab are integer >=0, corresponds to how many times you traverse from a to b. There is a length and profit associated with each N_ab . There are two constraints that the sum of length_ab * N_ab for all ab are between a min and max distance. I have a constraint that the sum of N_ab into each node is equal to the sum N_ab out of the node you can either not visit a node at all, or visit it multiple times. Objective function is nonlinear and related to the interaction between pairs of arcs (not relevant for subtour).
Subtours: looking at math.uwaterloo.ca/tsp/methods/opt/subtour.htm , the formulation isn't applicable since I am not required to visit all cities, and may not be able to. So for example, let's say I have 20 nodes and 50 arcs (all arcs length 10). Distance constraints are for a tour of exactly length 30, which means I can visit at most three nodes (start at A -> B -> C ->A = length 30). So I will not visit the other nodes at all. TSP subtour elimination would require that I have edges from node subgroup ABC to subgroup of nonvisited nodes - which isn't needed for my problem
Here is an approach that is adapted from the prize-collecting TSP (e.g., this paper). Let V be the set of all nodes. I am assuming V includes a depot node, call it node 1, that must be on the tour. (If not, you can probably add a dummy node that serves this role.)
Let x[i] be a decision variable that equals 1 if we visit node i at least once, and 0 otherwise. (You might already have such a decision variable in your model.)
Add these constraints, which define x[i]:
x[i] <= sum {j in V} N[i,j] for all i in V
M * x[i] >= N[i,j] for all i, j in V
In other words: x[i] cannot equal 1 if there are no edges coming out of node i, and x[i] must equal 1 if there are any edges coming out of node i.
(Here, N[i,j] is 1 if we go from i to j, and M is a sufficiently large number, perhaps equal to the maximum number of times you can traverse one edge.)
Here is the subtour-elimination constraint, defined for all subsets S of V such that S includes node 1, and for all nodes i in V \ S:
sum {j in S} (N[i,j] + N[j,i]) >= 2 * x[i]
In other words, if we visit node i, which is not in S, then there must be at least two edges into or out of S. (A subtour would violate this constraint for S equal to the nodes that are on the subtour that contains 1.)
We also need a constraint requiring node 1 to be on the tour:
x[1] = 1
I might be playing a little fast and loose with the directional indices, i.e., I'm not sure if your model sets N[i,j] = N[j,i] or something like that, but hopefully the idea is clear enough and you can modify my approach as necessary.
I am looking for a tool similar to graphviz that can render graphs, but that will allow me to constrain just the x coordinate of each node. Then, the tool will automatically choose y coordinates to make the graph look neat.
Basically, I want to make a timeline.
Language / platform / rendering medium are not very important.
If you want a neat-looking graph a force-directed algorithm is going to be your best bet. One of the best ones is SFDP (developed by AT&T, included in graphviz) though I can't seem to find pseudocode or an easy implementation. I don't think there are any algorithms this specialized. Thankfully, it's easy to code your own. I'll present some pseudocode mostly lifted form Wikipedia, but with suitably one-dimensional modifications. I'll assume you have n vertices and the vector of x-positions is x, subscripted by x.i.
set all vertex velocities to (0,0)
set all vertex positions to (x.i, random)
while (KE > epsilon)
KE = 0
for each vertex v
force = (0,0)
for each vertex u != v
force = force + (0, coulomb(u, v).y)
if u is incident to v
force = force + (0, hooke(u, v).y)
v.velocity = (v.velocity + timestep * force) * damping
v.position = v.position + timestep * v.velocity
KE = KE + |v.velocity| ^ 2
here the .y denotes getting the y-component of the force. This ensures that the x-components of the positions of the vertices never change from what you set them to be. The epsilon parameter is to be set by you, and should be something small compared to what you expect KE (the kinetic energy) to be. Also, |v| denotes the magnitude of the vector v (all computations are of 2-vectors in the above, except the KE). Note I set the mass of all the nodes to be 1, but you can change that if you want.
The Hooke and Coulomb functions calculate the respective forces between nodes; the first is linear in distance between vertices, the second is quadratic, so there is a guaranteed equilibrium. These functions look something like
def hooke(u, v)
return -k * |u.position - v.position|
def coulomb(u, v)
return C * |u.position - v.position|
where again most computations are in vector form. C and k have real values but experiment to get the graph you want. This isn't usually necessary because the scaling factors will, in two dimensions, pretty much expand or contract the whole graph, but here the x-distances are set so to get a good-looking graph you will have to change the values a bit.
I have some scattered 3D points (2d solution is sufficient). I want find different straight lines passing through (at least three points makes line) which are laying nearby (say for example 10 units). A single point could be part of different lines.
To determine whether 3 points (a,b,c) are in a line, use cross-products (2D or 3D):
V = (Vx, Vy, Vz)
Vab = b - a
Vac = c - a
CrossProd (V,W) = (VyWz - VzWy, VzWx - WzVx, VxWy - WxVy)
If CrossProd(Vab, Vac) is zero, then the points (a, b, c) are colinear. Actually the cross product is proportional to the area of the triangle (a, b ,c), so you can set a small non-zero tolerance if needed.
Re. tolerance.
The distance from b to the line Vac is given by:
d = length(CrossProd(Vab, Vac))/ length(Vac)
You can probably compare this with an absolute tolerance given your problem description. Alternatively you might use:
sin(theta) = length(CrossProd(Vab, Vac))/ length(Vac)/ length(Vab)
Then theta is the angle between the two vectors and can be compared with a fixed tolerance.
I'm trying to make an algorithm that will find the most efficient ordering for eliminating nodes in a small Bayesian network (represented by a DAG). All of the nodes are boolean and can take two possible states, with the exception of nodes with no successors (these nodes must have a single observed value; otherwise marginalizing them out is the same as removing them).
My original plan was that I would recursively choose a remaining variable that has no remaining predecessors and, for each of its possible states, propagate the value through the graph. This would result in all possible topological orderings.
Given a topological ordering, I wanted to find the cost of marginalizing.
For instance, this graph:
U --> V --> W --> X --> Y --> Z
has only one such ordering (U,V,W,X,Y,Z).
We can factorize the joint density g(U,V,W,X,Y,Z) = f1(U) f2(V,U) f3(W,V) f4(X,W) f5(Y,X) f6(Z,Y)
So the marginalization corresponding to this ordering will be
∑(∑(∑(∑(∑(∑(g(W,X,Y,Z),Z),Y),X),W),V),U) =
∑(∑(∑(∑(∑(∑(f1(U) f2(V,U) f3(W,V) f4(X,W) f5(Y,X) f6(Z,Y),Z),Y),X),W),V),U) =
∑(f1(U)
∑(f2(V,U)
∑(f3(W,V)
∑(f4(X,W)
∑(f5(Y,X)
∑(f6(Z,Y),Z)
,Y)
,X)
,W)
,V)
,U)
For this graph, U --> V can be turned into a symbolic function of V in 4 steps (all U x all V. Given that, V --> W can likewise be turned into a symbolic function in 4 steps. So overall, it will take 18 steps (4+4+4+4+2 because Z has only one state).
Here is my question: how can I determine the fastest number of steps that this sum can be computed for this ordering?
Thanks a lot for your help!
The number of steps to marginalize with a given elimination ordering will be roughly exponential in the largest clique produced by that ordering (times the number of nodes); therefore, the fewest number of steps will be the minimum of the exponential of the largest clique size produced by all possible orderings. This is equivalent to the treewidth of the graph.
The treewidth of the path graph in the question is 1.
http://www.cs.berkeley.edu/~jordan/papers/statsci.ps