Finding number of different paths - combinatorics

I have a game that one player X wants to pass a ball to player Y, but he can be playing with more than one player and the others players can pass the ball to Y.
I want to know how many different paths can the ball take from X to Y?
for example if he is playing with 3 players there are 5 different paths, 4 players 16 paths, if he is playing with 20 players there are 330665665962404000 paths, and 40 players 55447192200369381342665835466328897344361743780 that the ball can take.
the number max. of players that he can play with is 500.
I was thinking in using Catalan Numbers? do you think is a correct approach to solve this?
Can you give me some tips.

At first sight, I would say, that tht number of possible paths can be calculated the following way (I assume a "path" is a sequence of players with no player occuring more than once).
If you play with n+2 players, i.e. player X, player Y and n other players that could occur in the path.
Then the path can contain 0, 1, 2, 3, ... , n-1 or n "intermediate" players between player X (beginning) and player Y (end).
If you choose k (1 <= k <= n) players from n players in total, you can do this in (n choose k) ways.
For each of this subsets of intermediate players, there are k! possible arrangements of players.
So this yields sum(i=0 to n: (n choose i) * i!).
For "better" reading:
---- n / n \ ---- n n! ---- n 1
\ | | \ -------- \ ------
/ | | * i! = / (n-i)! = n! / i!
---- i=0 \ i / ---- i=0 ---- i=0
But I think that these are not the catalan numbers.

This is really a question in combinatorics, not algorithms.
Mark the number of different paths from player X to player Y as F(n), where n is the number of players including Y but not X.
Now, how many different paths are there? Player X can either pass the ball straight to Y (1 option), or pass it to one of the other players (n-1 options). If X passes to another player, we can pretend that player is the new X, where there are n-1 players in the field (since the 'old' X is no longer in the game). That's why
F(n) = 1 + (n-1)F(n-1)
and
F(1) = 1
I'm pretty sure you can reach phimuemue's answer from this one. The question is if you prefer a recursive solution or one with summation.

I'm somewhat of a noob at this kind of searching, but a quick run through the numbers demonstrates the more you can trim, cut out, filter out, the faster you can do it. The numbers you cite are BIG.
First thing that comes to mind is "Is it practical to limit your search depth?" If you can limit your search depth to say 4 (an arbitrary number), your worst case number of possibilities comes out to ...
499 * 498 * 497 * 496 = 61,258,725,024 (assuming no one gets the ball twice)
This is still large, but an exhaustive search would be far faster (though still too slow for a game) than your original set of numbers.
I'm sure others with more experience in this area would have better suggestions. Still, I hope this helps.

If X needs to pass to Y, and there could be P1, P2, ..., Pn players in between and you care about the order of passing then indeed
For 2 extra players you have paths: X-Y, X-P1-Y, X-P2-Y, X-P1-P2-Y, X-P2-P1-Y
Which gives a total of 5 different paths, similarly for 3 extra players you have 16 different paths
First try to reduce the problem to something known, and for this I would eliminate X-Y, they are common to all of the above translates to question: what is the sum of k-permutations for k from 0 to n, where n is the number of P.
This can be given as
f(n):=sum(n!/(n-i)!,i,0,n);
and I can confirm your findings for 19 and 39 (20 and 40 in your notation).
For f(499) I get
6633351524650661171514504385285373341733228850724648887634920376333901210587244906195903313708894273811624288449277006968181762616943058027258258920058014768423359811679381900054568501151839849768338994244697593758840394106353734267539926205845992860165295957099385939316593862710470512043836452624452665801937754479602741031832540175306674471495745716725509714798824661807396000105338256698426305553340786519843729411660457896089840381658295930455362209587765698327585913037665131195504013431486823990271059962837959407778393078276213331859189770016153265512805722812864376997337140529242894215031131618375899072989922780132488077015246576266246551484603286735418485007674249207286921801779414240854077425752351919182464902664206622037834736215298295580945851569079682952183639701057397376328170754187008425429164206646365285647875545882646729176997107332605851460212415526607757545366695048460341802079614840254694664267117469603856584752270653889630424848913719533359942725361985274851471687885265903663806182184272555073708882789845441094009797907518245726494471433964169680271980763830020431957658400573531564215436064984091520
Results obtained with wxMaxima

EDIT: After more clarification from the comments of the question, my answer is absolutely useless :) he definitely wants the number of possible routes, not the best one!
My first thought is why do you want to know these numbers? You're certainly never going to iterate through all the paths available to 500 people (would take far too long) and it's too big to display on a ui in any meaningful way.
I'm assuming that you're going to try to find the best route that the ball can take in which case I would consider looking into algorithms that don't care about the number of nodes in a route.
I'd try looking at the A star algorithm and Dijkstra's algorithm.

Related

Reaching nth Stair

total number of ways to reach the nth floor with following types of moves:
Type 1 in a single move you can move from i to i+1 floor – you can use the this move any number of times
Type 2 in a single move you can move from i to i+2 floor – you can use this move any number of times
Type 3 in a single move you can move from i to i+3 floor – but you can use this move at most k times
i know how to reach nth floor by following step 1 ,step 2, step 3 any number of times using dp like dp[i]=dp[i-1]+dp[i-2]+dp[i-3].i am stucking in the condition of Type 3 movement with atmost k times.
someone tell me the approach here.
While modeling any recursion or dynamic programming problem, it is important to identify the goal, constraints, states, state function, state transitions, possible state variables and initial condition aka base state. Using this information we should try to come up with a recurrence relation.
In our current problem:
Goal: Our goal here is to somehow calculate number of ways to reach floor n while beginning from floor 0.
Constraints: We can move from floor i to i+3 at most K times. We name it as a special move. So, one can perform this special move at most K times.
State: In this problem, our situation of being at a floor could be one way to model a state. The exact situation can be defined by the state variables.
State variables: State variables are properties of the state and are important to identify a state uniquely. Being at a floor i alone is not enough in itself as we also have a constraint K. So to identify a state uniquely we want to have 2 state variables: i indicating floor ranging between 0..n and k indicating number of special move used out of K (capital K).
State functions: In our current problem, we are concerned with finding number of ways to reach a floor i from floor 0. We only need to define one function number_of_ways associated with corresponding state to describe the problem. Depending on problem, we may need to define more state functions.
State Transitions: Here we identify how can we transition between states. We can come freely to floor i from floor i-1 and floor i-2 without consuming our special move. We can only come to floor i from floor i-3 while consuming a special move, if i >=3 and special moves used so far k < K.
In other words, possible state transitions are:
state[i,k] <== state[i-1,k] // doesn't consume special move k
state[i,k] <== state[i-2,k] // doesn't consume special move k
state[i,k+1] <== state[i-3, k] if only k < K and i >= 3
We should now be able to form following recurrence relation using above information. While coming up with a recurrence relation, we must ensure that all the previous states needed for computation of current state are computed first. We can ensure the order by computing our states in the topological order of directed acyclic graph (DAG) formed by defined states as its vertices and possible transitions as directed edges. It is important to note that it is only possible to have such ordering if the directed graph formed by defined states is acyclic, otherwise we need to rethink if the states are correctly defined uniquely by its state variables.
Recurrence Relation:
number_of_ways[i,k] = ((number_of_ways[i-1,k] if i >= 1 else 0)+
(number_of_ways[i-2,k] if i >= 2 else 0) +
(number_of_ways[i-3,k-1] if i >= 3 and k < K else 0)
)
Base cases:
Base cases or solutions to initial states kickstart our recurrence relation and are sufficient to compute solutions of remaining states. These are usually trivial cases or smallest subproblems that can be solved without recurrence relation.
We can have as many base conditions as we require and there is no specific limit. Ideally we would want to have a minimal set of base conditions, enough to compute solutions of all remaining states. For the current problem, after initializing all not computed solutions so far as 0,
number_of_ways[0, 0] = 1
number_of_ways[0,k] = 0 where 0 < k <= K
Our required final answer will be sum(number_of_ways[n,k], for all 0<=k<=K).
You can use two-dimensional dynamic programming:
dp[i,j] is the solution value when exactly j Type-3 steps are used. Then
dp[i,j]=dp[i-1,j]+dp[i-2,j]+dp[i-3,j-1], and the initial values are dp[0,0]=0, dp[1,0]=1, and dp[3*m,m]=m for m<=k. You can build up first the d[i,0] values, then the d[i,1] values, etc. Or you can do a different order, as long as all necessary values are already computed.
Following #LaszloLadanyi approach ,below is the code snippet in python
def solve(self, n, k):
dp=[[0 for i in range(k+1)]for _ in range(n+1)]
dp[0][0]=1
for j in range(k+1):
for i in range(1,n+1):
dp[i][j]+=dp[i-1][j]
if i>1:
dp[i][j]+=dp[i-2][j]
if i>2 and j>0:
dp[i][j]+=dp[i-3][j-1]
return sum(dp[n])

TSP / CPP variant - subtour constraint

I'm developing an optimization problem that is a variant on Traveling Salesman. In this case, you don't have to visit all the cities, there's a required start and end point, there's a min and max bound on the tour length, you can traverse each arc multiple times if you want, and you have a nonlinear objective function that is associated with the arcs traversed (and number of times you traverse each arc). Decision variables are integers, how many times you traverse each arc.
I've developed a nonlinear integer program in Pyomo and am getting results from the NEOS server. However I didn't put in subtour constraints and my results are two disconnected subtours.
I can find integer programming formulations of TSP that say how to formulate subtour constraints, but this is a little different from the standard TSP and I'm trying to figure out how to start. Any help that can be provided would be greatly appreciated.
EDIT: problem formulation
50 arcs , not exhaustive pairs between nodes. 50 Decision variables N_ab are integer >=0, corresponds to how many times you traverse from a to b. There is a length and profit associated with each N_ab . There are two constraints that the sum of length_ab * N_ab for all ab are between a min and max distance. I have a constraint that the sum of N_ab into each node is equal to the sum N_ab out of the node you can either not visit a node at all, or visit it multiple times. Objective function is nonlinear and related to the interaction between pairs of arcs (not relevant for subtour).
Subtours: looking at math.uwaterloo.ca/tsp/methods/opt/subtour.htm , the formulation isn't applicable since I am not required to visit all cities, and may not be able to. So for example, let's say I have 20 nodes and 50 arcs (all arcs length 10). Distance constraints are for a tour of exactly length 30, which means I can visit at most three nodes (start at A -> B -> C ->A = length 30). So I will not visit the other nodes at all. TSP subtour elimination would require that I have edges from node subgroup ABC to subgroup of nonvisited nodes - which isn't needed for my problem
Here is an approach that is adapted from the prize-collecting TSP (e.g., this paper). Let V be the set of all nodes. I am assuming V includes a depot node, call it node 1, that must be on the tour. (If not, you can probably add a dummy node that serves this role.)
Let x[i] be a decision variable that equals 1 if we visit node i at least once, and 0 otherwise. (You might already have such a decision variable in your model.)
Add these constraints, which define x[i]:
x[i] <= sum {j in V} N[i,j] for all i in V
M * x[i] >= N[i,j] for all i, j in V
In other words: x[i] cannot equal 1 if there are no edges coming out of node i, and x[i] must equal 1 if there are any edges coming out of node i.
(Here, N[i,j] is 1 if we go from i to j, and M is a sufficiently large number, perhaps equal to the maximum number of times you can traverse one edge.)
Here is the subtour-elimination constraint, defined for all subsets S of V such that S includes node 1, and for all nodes i in V \ S:
sum {j in S} (N[i,j] + N[j,i]) >= 2 * x[i]
In other words, if we visit node i, which is not in S, then there must be at least two edges into or out of S. (A subtour would violate this constraint for S equal to the nodes that are on the subtour that contains 1.)
We also need a constraint requiring node 1 to be on the tour:
x[1] = 1
I might be playing a little fast and loose with the directional indices, i.e., I'm not sure if your model sets N[i,j] = N[j,i] or something like that, but hopefully the idea is clear enough and you can modify my approach as necessary.

Find the closest distance between every galaxy in the data and create pairs based on closest distance between them

My task is to pair up galaxies that are closest together from a large list of galaxies. I have the RA, DEC and Z of each, and a formula to work out the distance between each one from the data given. However, I can't work out an efficient method of iterating over the whole list to find the distance between EACH galaxy and EVERY other galaxy in the list, with the intention of then matching each galaxy with its nearest neighbour.
The data has been imported in the following way:
hdulist = fits.open("documents/RADECMASSmatch.fits")
CATAID = data['CATAID_1']
Xpos_DEIMOS_1 = data['Xpos_DEIMOS_1']
z = data['Z_1']
RA = data['RA']
DEC = data['DEC']
I have tried something like:
radiff = []
for i in range(0,n):
for j in range(i+1,n):
radiff.append(abs(RA[i]-RA[j]))
to initially work out difference in RA and DEC between every galaxy, which does actually work but I feel like there must be a better way.
A friend suggested something along the lines of:
galaxy_coords = (data['RA'],data['DEC'],data['Z])
separation_matrix = np.zeros((len(galaxy_coords),len(galaxy_coords))
done = []
for i, coords1 in enumerate(galaxy_coords):
for j, coords2 in enumerate(galaxy_coords):
if (j,i) in done:
separation_matrix[i,j] += separation matrix[j,i]
continue
separation = your_formula(coords1, coords2)
separation_matrix[i,j] += separation
done.append((i,j))
But I don't really understand this so can't readily apply it. I've tried but it yields nothing useful.
Any help with this would be much appreciated, thanks
Your friend's code seems to be generating a 2D array of the distances between each pair, and taking advantage of the symmetry (distance(x,y) = distance(y,x)). It would be slightly better if it used itertools to generate combinations, and assigned your_formula(coords1, coords2) to separation_matrix[i,j] and separation_matrix[j,i] within the same iteration, rather than having separate iterations for both i,j and j,i.
Even better would probably be this package that uses a tree-based algorithm: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.KDTree.html . It seems to be focused on rectilinear coordinates, but that should be addressable in linear time.

Coefficient of Variations?

I have a list of values incrementing exponentially. I was asked to have multiple Coefficent of variations from them. You might agree with me that CV is only for the whole set of numbers and dividing the set of numbers into subgroups and calculating a CV for each subgroup seems unreasonable. Would there be any statistical idea behind multiple CVs and if there is, how histogram can be made by the CVs, I mean what would the bins of the historgram. I appreciate the answers in advance
I agree with you - it does not make sense to me to calculate multiple CVs for one dataset unless there's some inferential reason for doing so.
That being said, there might actually be a reason for considering sub-groups of a dataset. In the field of Statistics, context is everything. My first thought is to ask your colleague why they want you do proceed that way. Maybe there's a good reason, maybe they don't have as full a grasp of stats as you do, regardless, it should be an enlightening conversation to have.
If you do decide to go this route, here's some R code that might help (R is great - flexible, powerful, and free)
# first, simulating some fake data (100 values of measurement & group for 10 groups)
x <- rnorm(100, mean=10, sd=1)
group <- sample(LETTERS[1:10], 100, replace=T)
# first few values of each
head(data.frame(x, group))
x group
1 10.778480 F
2 9.274193 B
3 9.639143 G
4 9.080369 I
5 10.727895 D
6 10.850306 G
# this is the part you'd actually need...
# calculating the sd & avgs for each group
sds <- tapply(x, group, sd)
avgs <- tapply(x, group, mean)
# then the cv
cvs <- sds/avgs
cvs
A B C D E F G H I J
0.07859528 0.07570556 0.09370247 0.12552468 0.08897856 0.11044543 0.10947615 0.10323379 0.08908262 0.09729945
# and if you want a histogram, R makes it pretty easy
hist(cvs)

Finding the minimum number of swaps to convert one string to another, where the strings may have repeated characters

I was looking through a programming question, when the following question suddenly seemed related.
How do you convert a string to another string using as few swaps as follows. The strings are guaranteed to be interconvertible (they have the same set of characters, this is given), but the characters can be repeated. I saw web results on the same question, without the characters being repeated though.
Any two characters in the string can be swapped.
For instance : "aabbccdd" can be converted to "ddbbccaa" in two swaps, and "abcc" can be converted to "accb" in one swap.
Thanks!
This is an expanded and corrected version of Subhasis's answer.
Formally, the problem is, given a n-letter alphabet V and two m-letter words, x and y, for which there exists a permutation p such that p(x) = y, determine the least number of swaps (permutations that fix all but two elements) whose composition q satisfies q(x) = y. Assuming that n-letter words are maps from the set {1, ..., m} to V and that p and q are permutations on {1, ..., m}, the action p(x) is defined as the composition p followed by x.
The least number of swaps whose composition is p can be expressed in terms of the cycle decomposition of p. When j1, ..., jk are pairwise distinct in {1, ..., m}, the cycle (j1 ... jk) is a permutation that maps ji to ji + 1 for i in {1, ..., k - 1}, maps jk to j1, and maps every other element to itself. The permutation p is the composition of every distinct cycle (j p(j) p(p(j)) ... j'), where j is arbitrary and p(j') = j. The order of composition does not matter, since each element appears in exactly one of the composed cycles. A k-element cycle (j1 ... jk) can be written as the product (j1 jk) (j1 jk - 1) ... (j1 j2) of k - 1 cycles. In general, every permutation can be written as a composition of m swaps minus the number of cycles comprising its cycle decomposition. A straightforward induction proof shows that this is optimal.
Now we get to the heart of Subhasis's answer. Instances of the asker's problem correspond one-to-one with Eulerian (for every vertex, in-degree equals out-degree) digraphs G with vertices V and m arcs labeled 1, ..., m. For j in {1, ..., n}, the arc labeled j goes from y(j) to x(j). The problem in terms of G is to determine how many parts a partition of the arcs of G into directed cycles can have. (Since G is Eulerian, such a partition always exists.) This is because the permutations q such that q(x) = y are in one-to-one correspondence with the partitions, as follows. For each cycle (j1 ... jk) of q, there is a part whose directed cycle is comprised of the arcs labeled j1, ..., jk.
The problem with Subhasis's NP-hardness reduction is that arc-disjoint cycle packing on Eulerian digraphs is a special case of arc-disjoint cycle packing on general digraphs, so an NP-hardness result for the latter has no direct implications for the complexity status of the former. In very recent work (see the citation below), however, it has been shown that, indeed, even the Eulerian special case is NP-hard. Thus, by the correspondence above, the asker's problem is as well.
As Subhasis hints, this problem can be solved in polynomial time when n, the size of the alphabet, is fixed (fixed-parameter tractable). Since there are O(n!) distinguishable cycles when the arcs are unlabeled, we can use dynamic programming on a state space of size O(mn), the number of distinguishable subgraphs. In practice, that might be sufficient for (let's say) a binary alphabet, but if I were to try to try to solve this problem exactly on instances with large alphabets, then I likely would try branch and bound, obtaining bounds by using linear programming with column generation to pack cycles fractionally.
#article{DBLP:journals/corr/GutinJSW14,
author = {Gregory Gutin and
Mark Jones and
Bin Sheng and
Magnus Wahlstr{\"o}m},
title = {Parameterized Directed \$k\$-Chinese Postman Problem and \$k\$
Arc-Disjoint Cycles Problem on Euler Digraphs},
journal = {CoRR},
volume = {abs/1402.2137},
year = {2014},
ee = {http://arxiv.org/abs/1402.2137},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
You can construct the "difference" strings S and S', i.e. a string which contains the characters at the differing positions of the two strings, e.g. for acbacb and abcabc it will be cbcb and bcbc. Let us say this contains n characters.
You can now construct a "permutation graph" G which will have n nodes and an edge from i to j if S[i] == S'[j]. In the case of all unique characters, it is easy to see that the required number of swaps will be (n - number of cycles in G), which can be found out in O(n) time.
However, in the case where there are any number of duplicate characters, this reduces to the problem of finding out the largest number of cycles in a directed graph, which, I think, is NP-hard, (e.g. check out: http://www.math.ucsd.edu/~jverstra/dcig.pdf ).
In that paper a few greedy algorithms are pointed out, one of which is particularly simple:
At each step, find the minimum length cycle in the graph (e.g. Find cycle of shortest length in a directed graph with positive weights )
Delete it
Repeat until all vertexes have not been covered.
However, there may be efficient algorithms utilizing the properties of your case (the only one I can think of is that your graphs will be K-partite, where K is the number of unique characters in S). Good luck!
Edit:
Please refer to David's answer for a fuller and correct explanation of the problem.
Do an A* search (see http://en.wikipedia.org/wiki/A-star_search_algorithm for an explanation) for the shortest path through the graph of equivalent strings from one string to the other. Use the Levenshtein distance / 2 as your cost heuristic.

Resources