How to express tree structure constraints - uml

How can the following constraints be expressed:
1 - There is exactly one folder that is not a sub-directory to another directory.
(I couldn't fully understand the folder/subfolder theme and how to describe the only one possible exclusion from the folder system)
And there are also some question which comes from the first question
2 The highest nesting of folders does not exceed the number n.
3) The total number of files on your system can not exceed the number n.
4) The total number of files (subdirectory) in a given system cannot exceed the number n.

Your four constraints cannot be expressed simply using the multiplicities.
In UML theses constraints can be written using OCL, see formal/2014-02-03
Of course the constraints can be written in a class diagram, for instance see figure 7.14 Constraint in a note symbol page 37 of formal/2017-12-05.
1 - There is exactly one folder that is not a sub-directory to another directory
one way to write that is :
Folder.allInstances()->select(f | f.upfolder->isEmpty())->size() = 1
where
Folder.allInstances() return the instances of the class Folder
Folder.allInstances()->select(f | f.upfolder->isEmpty()) iterate on the instances and return the instances having no upfolder
Folder.allInstances()->select(f | f.upfolder->isEmpty())->size() = 1 then checks there is one folder without upfolder
2 The highest nesting of folders does not exceed the number n
one way is to define a function computing the depth of a folder then to check all the folder have a depth less or equals to n
context Folder
def: depth() : Integer =
if upfolder->notEmpty() then
upfolder->first().depth() + 1
else
0
Folder.allInstances()->forAll(f | f.depth() <= n)
where forAll is true if the condition depth() <= n is true for all the elements
But it is only useful to compute the depth of the folders without sub folder, so
Folder.allInstances()
->select(f | f.subfolder->isEmpty())
->forAll(f | f.depth() <= n)
3) The total number of files on your system can not exceed the number n.
4) The total number of files (subdirectory) in a given system cannot exceed the number n.
I do not understand why (subdirectory) in 4 nor why 3 says on your system and 4 says a given system while there is nothing about system in 1 and 2.
Supposing the goal is to check the total number of files is less or equals to n and the files of a folder are given by the attribute file :
Folder.allInstances()->collect(f | f.file.size()).sum() <= n
where
Folder.allInstances()->collect(f | f.file.size()) returns the collection of the number of files for all the folders
Folder.allInstances()->collect(f | f.file.size()).sum() return the total number of files

The use of allInstances() is discouraged.
It is very desirable to have some FileSystem class that has exactly one root Folder, therefore guaranteeing the constraint in a simple multiplicity.
Is easily handled by a [0..3] multiplicity declaration.
A depth() helper helpfully cached by a derived property is a good solution.
Or just: context File inv: Folder->closure(upFolder).size() < n
context Folder inv: self->closure(subfolder).File->size() < n

Related

Algorithm, that finds the k-greatest number in O(n*log(k))

was wondering, if you have given an unsorted list of arrays of any length n >= k,
what is your idea, to find the k-greatest number in O(n*log(k)) time. So the k = 2 -greatest number of an Array containing the numbers 1 to 9 would be 8 for example.
I'm trying to code this in python, if you have an idea how in that time complexity :)
My answer is not python-specific, however you should be able to implement the used concepts in python, or find libraries already implementing them.
The basic idea is to iterate over the list and store the current greatest, second greatest, ... , k-greatest number in a separate data structure. Since you will be iterating over all n entries in your array, the complexity of this is in O(n * insertion_step_complexity)
As seen above, the insertion step needs to not exceed a complexity of O(log(k)) to achieve this you can use a AVL-Tree that has a complexity of O(log(m)) for inserting and deleting items, where m is the number of items stored within the avl-tree.
An algorithm would look like this:
def find_k_greatest_number(k, array):
avl_tree = initialize AVL tree here
avl_items = 0
for number in array:
if (number > avl_tree.smallest_number()):
if (avl_itmes >= k):
avl_tree.delete_smallest_number()
else:
avl_items++
avl_tree.insert(number)
return avl_tree.smallest_number()
Finding the smallest number in a sorted tree is dependent on its height. Since the AVL tree can't exceed the height of log(k) the complexity of finding the smallest number is O(log(k)).

How to generate distinct solutions in Prolog for '8 out of 10 cats does countdown' numbers game solver?

I wrote a Prolog program to find all solutions to any '8 out of 10 cats does countdown' number sequence. I am happy with the result. However, the solutions are not unique. I tried distincts() and reduced() from the "solution sequences" library. They did not produce unique solutions.
The problem is simple. you have a given list of six numbers [n1,n2,n3,n4,n5,n6] and a target number (R). Calculate R from any arbitrary combination of n1 to n6 using only +,-,*,/. You do not have to use all numbers but you can only use each number once. If two solutions are identical, only one must be generated and the other discarded. 
Sometimes there are equivalent results with different arrangement. Such as:
(100+3)*6*75/50+25
(100+3)*75*6/50+25  
Does anyone has any suggestions to eliminate such redundancy?
Each solution is a nested operators and integers. For example +(2,*(4,-(10,5))). This solution is an unbalanced binary tree with Arithmetic Operator for root and sibling nodes and numbers for leaf nodes. In order to have unique solutions, no two trees should be equivalent.
The Code:
:- use_module(library(lists)).
:- use_module(library(solution_sequences)).
solve(L,R,OP) :-
findnsols(10,OP,solve_(L,R,OP),S),
print_solutions(S).
solve_(L,R,OP) :-
distinct(find_op(L,OP)),
R =:= OP.
find_op(L,OP) :-
select(N1,L,Ln),
select(N2,Ln,[]),
N1 > N2,
member(OP,[+(N1,N2), -(N1,N2), *(N1,N2), /(N1,N2), N1, N2]).
find_op(L,OP) :-
select(N,L,Ln),
find_op(Ln,OP_),
OP_ > N,
member(OP,[+(OP_,N), -(OP_,N), *(OP_,N), /(OP_,N), OP_]).
print_solutions([]).
print_solutions([A|B]) :-
format('~w~n',A),
print_solutions(B).
Test:
solve([25,50,75,100,6,3],952,X)
Result
(100+3)*6*75/50+25 <- s1
((100+6)*3*75-50)/25 <- s2
(100+3)*75*6/50+25 <- s1
((100+6)*75*3-50)/25 <- s2
(100+3)*75/50*6+25 <- s1
true.
This code uses select/3 from the "lists" library.
UPDATE: Generate solutions useing DCG
The following is an attempt to generate solutions using DCG.  I was able to generate a more exhaustive solution set than in previous code posted. In a way, using DCG resulted in a more correct and elegant code. However, it is much more difficult to 'guess' what the code is doing.
The issue of redundant solutions still persist.
:- use_module(library(lists)).
:- use_module(library(solution_sequences)).
s(L) --> [L].
s(+(L,Ls)) --> [L],s(Ls).
s(*(L,Ls)) --> [L],s(Ls), {L =\= 1, Ls =\= 1, Ls =\= 0}.
s(-(L,Ls)) --> [L],s(Ls), {L =\= Ls, Ls =\= 0}.
s(/(L,Ls)) --> [L],s(Ls), {Ls =\= 1, Ls =\= 0}.
s(-(Ls,L)) --> [L],s(Ls), {L =\= Ls}.
s(/(Ls,L)) --> [L],s(Ls), {L =\= 1, Ls =\=0}.
solution_list([N,H|[]],S) :-
phrase(s(S),[N,H]).
solution_list([N,H|T],S) :-
phrase(s(S),[N,H|T]);
solution_list([H|T],S).
solve(L,R,S) :-
permutation(L,X),
solution_list(X,S),
R =:= S.
Does anyone has any suggestions to eliminate such redundancy?
I suggest to define a sorting weight on each node (inner or leaf). The number resulting from reducing the child node could be used, although ties will appear. These can be broken by additionally looking at topmost operations, sorting * before + for example. Actually one would like to have a sorting operation for which "tie" means "exactly the same subtree of arithmetic operations".
Since the OP is only seeking hints to help solve the problem.
Use DCG as a generator. (SWI-Prolog) (Prolog DCG Primer)
a. For a more refined version of using DCGs as a generator look for examples that use length/2. When you understand why you might see a beam of light shining down on you for a few moments (The light beam is a video gaming thing).
Use a constraint solver (SWI-Prolog) (CLP(FD) and CLP(ℤ): Prolog Integer Arithmetic) (Understanding CLP(FD) Prolog code of N-queens problem)
Since your solutions are constrained to the 6 numbers and the operators are always binary operators (+,-,*,/) then it is possible to enumerate the unique binary trees. If you know about OEIS then you can find related links that can help you solve this problem, but you need to give OEIS a sequence. To get a sequence for use with OEIS draw the trees for N from 2 to 5 and then enter that sequence into OEIS and see what you get. e.g.
N is the number of leaf (*) nodes.
N=2 ( 1 way to draw the tree )
-
/ \
* *
N=3 ( 2 ways to draw the tree )
- -
/ \ / \
- * * -
/ \ / \
* * * *
So the sequence starts with 1,2 ...
Hint - This page (link died) shows the images of the trees to see if you have done it correctly. In the description I use N to count the number of leaves (*), but on this page they use N to count the number of internal nodes (-). If we call my N N1 and the page N N2, then the relation is N2 = N1 - 1
This might be a Hamiltonian Cycle (Wolfram World) (Hamiltonianicity of the Tower of Hanoi Problem) Remember that there is a relation between Binary Trees and the Tower of Hanoi, but in your case there are added constraints. I don't know if the constraints eliminate a solution as a Hamiltonian Cycle.
Also don't think of building the final answer from a combination of any number and operator, but instead build subsets of operators and numbers, and then use those subsets to build the answer. You constrain at the start, not at the end.
Or put another way, don't think combinations at the start, but permutations of combinations (not sure if that is the correct pattern, but in the ball park) and then using that build the tree.

TSP / CPP variant - subtour constraint

I'm developing an optimization problem that is a variant on Traveling Salesman. In this case, you don't have to visit all the cities, there's a required start and end point, there's a min and max bound on the tour length, you can traverse each arc multiple times if you want, and you have a nonlinear objective function that is associated with the arcs traversed (and number of times you traverse each arc). Decision variables are integers, how many times you traverse each arc.
I've developed a nonlinear integer program in Pyomo and am getting results from the NEOS server. However I didn't put in subtour constraints and my results are two disconnected subtours.
I can find integer programming formulations of TSP that say how to formulate subtour constraints, but this is a little different from the standard TSP and I'm trying to figure out how to start. Any help that can be provided would be greatly appreciated.
EDIT: problem formulation
50 arcs , not exhaustive pairs between nodes. 50 Decision variables N_ab are integer >=0, corresponds to how many times you traverse from a to b. There is a length and profit associated with each N_ab . There are two constraints that the sum of length_ab * N_ab for all ab are between a min and max distance. I have a constraint that the sum of N_ab into each node is equal to the sum N_ab out of the node you can either not visit a node at all, or visit it multiple times. Objective function is nonlinear and related to the interaction between pairs of arcs (not relevant for subtour).
Subtours: looking at math.uwaterloo.ca/tsp/methods/opt/subtour.htm , the formulation isn't applicable since I am not required to visit all cities, and may not be able to. So for example, let's say I have 20 nodes and 50 arcs (all arcs length 10). Distance constraints are for a tour of exactly length 30, which means I can visit at most three nodes (start at A -> B -> C ->A = length 30). So I will not visit the other nodes at all. TSP subtour elimination would require that I have edges from node subgroup ABC to subgroup of nonvisited nodes - which isn't needed for my problem
Here is an approach that is adapted from the prize-collecting TSP (e.g., this paper). Let V be the set of all nodes. I am assuming V includes a depot node, call it node 1, that must be on the tour. (If not, you can probably add a dummy node that serves this role.)
Let x[i] be a decision variable that equals 1 if we visit node i at least once, and 0 otherwise. (You might already have such a decision variable in your model.)
Add these constraints, which define x[i]:
x[i] <= sum {j in V} N[i,j] for all i in V
M * x[i] >= N[i,j] for all i, j in V
In other words: x[i] cannot equal 1 if there are no edges coming out of node i, and x[i] must equal 1 if there are any edges coming out of node i.
(Here, N[i,j] is 1 if we go from i to j, and M is a sufficiently large number, perhaps equal to the maximum number of times you can traverse one edge.)
Here is the subtour-elimination constraint, defined for all subsets S of V such that S includes node 1, and for all nodes i in V \ S:
sum {j in S} (N[i,j] + N[j,i]) >= 2 * x[i]
In other words, if we visit node i, which is not in S, then there must be at least two edges into or out of S. (A subtour would violate this constraint for S equal to the nodes that are on the subtour that contains 1.)
We also need a constraint requiring node 1 to be on the tour:
x[1] = 1
I might be playing a little fast and loose with the directional indices, i.e., I'm not sure if your model sets N[i,j] = N[j,i] or something like that, but hopefully the idea is clear enough and you can modify my approach as necessary.

Longest repeated substring with at least k occurrences correctness

The algorithms for finding the longest repeated substring is formulated as follows
1)build the suffix tree
2)find the deepest internal node with at least k leaf children
But I cannot understand why is this works,so basically what makes this algorithm correct?Also,the source where I found this algorithm says that is find the repeated substring in O(n),where n is the length of the substring,this is also not clear to me!Let's consider the following tree,here the longest repeated substring is "ru" and if we apply DFS it will find it in 5 step but not in 2
Can you explain this stuff to me?
Thanks
image
I suppose you perfectly know O(n) (Big O notation) refers to the order of growth of some quantity as a function of n, and not the equivalence of the quantity with n.
I write this becase reading the question I was in doubt...
I'm writing this as an aswer and not a comment since it's a bit too long for a comment (I suppose...)
Given a string S of N characters, building the corresponding suffix tree is O(N) (using an algorithm such as Ukkonen's).
Now, such a suffix tree can have at most 2N - 1 nodes (root and leaves included).
If you traverse your tree and compute the number of leaves reachable from a given node along with its depth, you'll find the desired result. To do so, you start from the root and explore each of its children.
Some pseudo-code:
traverse(node, depth):
nb_leaves <-- 0
if empty(children(node)):
nb_leaves <-- 1
else:
for child in children(node):
nb_leaves <-- nb_leaves + traverse(child, depth+1)
node.setdepth(depth)
node.setoccurrences(nb_leaves)
return nb_leaves
The initial call is traverse(root, 0). Since the structure is a tree, there is only one call to traverse for each node. This means the maximum number of call to traverse is 2N - 1, therefore the overall traversal is only O(N). Now you just have to keep track of the node with the maximum depth that also verifies: depth > 0 && nb_leaves >= k by adding the relevant bookkeeping mechanism. This does not hinder the overall complexity.
In the end, the complexity of the algorithm to find such a substring is O(N) where N is the length of the input string (and not the length of the matching substring!).
Note: The traversal described above is basically a DFS on the suffix tree.

Finding number of different paths

I have a game that one player X wants to pass a ball to player Y, but he can be playing with more than one player and the others players can pass the ball to Y.
I want to know how many different paths can the ball take from X to Y?
for example if he is playing with 3 players there are 5 different paths, 4 players 16 paths, if he is playing with 20 players there are 330665665962404000 paths, and 40 players 55447192200369381342665835466328897344361743780 that the ball can take.
the number max. of players that he can play with is 500.
I was thinking in using Catalan Numbers? do you think is a correct approach to solve this?
Can you give me some tips.
At first sight, I would say, that tht number of possible paths can be calculated the following way (I assume a "path" is a sequence of players with no player occuring more than once).
If you play with n+2 players, i.e. player X, player Y and n other players that could occur in the path.
Then the path can contain 0, 1, 2, 3, ... , n-1 or n "intermediate" players between player X (beginning) and player Y (end).
If you choose k (1 <= k <= n) players from n players in total, you can do this in (n choose k) ways.
For each of this subsets of intermediate players, there are k! possible arrangements of players.
So this yields sum(i=0 to n: (n choose i) * i!).
For "better" reading:
---- n / n \ ---- n n! ---- n 1
\ | | \ -------- \ ------
/ | | * i! = / (n-i)! = n! / i!
---- i=0 \ i / ---- i=0 ---- i=0
But I think that these are not the catalan numbers.
This is really a question in combinatorics, not algorithms.
Mark the number of different paths from player X to player Y as F(n), where n is the number of players including Y but not X.
Now, how many different paths are there? Player X can either pass the ball straight to Y (1 option), or pass it to one of the other players (n-1 options). If X passes to another player, we can pretend that player is the new X, where there are n-1 players in the field (since the 'old' X is no longer in the game). That's why
F(n) = 1 + (n-1)F(n-1)
and
F(1) = 1
I'm pretty sure you can reach phimuemue's answer from this one. The question is if you prefer a recursive solution or one with summation.
I'm somewhat of a noob at this kind of searching, but a quick run through the numbers demonstrates the more you can trim, cut out, filter out, the faster you can do it. The numbers you cite are BIG.
First thing that comes to mind is "Is it practical to limit your search depth?" If you can limit your search depth to say 4 (an arbitrary number), your worst case number of possibilities comes out to ...
499 * 498 * 497 * 496 = 61,258,725,024 (assuming no one gets the ball twice)
This is still large, but an exhaustive search would be far faster (though still too slow for a game) than your original set of numbers.
I'm sure others with more experience in this area would have better suggestions. Still, I hope this helps.
If X needs to pass to Y, and there could be P1, P2, ..., Pn players in between and you care about the order of passing then indeed
For 2 extra players you have paths: X-Y, X-P1-Y, X-P2-Y, X-P1-P2-Y, X-P2-P1-Y
Which gives a total of 5 different paths, similarly for 3 extra players you have 16 different paths
First try to reduce the problem to something known, and for this I would eliminate X-Y, they are common to all of the above translates to question: what is the sum of k-permutations for k from 0 to n, where n is the number of P.
This can be given as
f(n):=sum(n!/(n-i)!,i,0,n);
and I can confirm your findings for 19 and 39 (20 and 40 in your notation).
For f(499) I get
6633351524650661171514504385285373341733228850724648887634920376333901210587244906195903313708894273811624288449277006968181762616943058027258258920058014768423359811679381900054568501151839849768338994244697593758840394106353734267539926205845992860165295957099385939316593862710470512043836452624452665801937754479602741031832540175306674471495745716725509714798824661807396000105338256698426305553340786519843729411660457896089840381658295930455362209587765698327585913037665131195504013431486823990271059962837959407778393078276213331859189770016153265512805722812864376997337140529242894215031131618375899072989922780132488077015246576266246551484603286735418485007674249207286921801779414240854077425752351919182464902664206622037834736215298295580945851569079682952183639701057397376328170754187008425429164206646365285647875545882646729176997107332605851460212415526607757545366695048460341802079614840254694664267117469603856584752270653889630424848913719533359942725361985274851471687885265903663806182184272555073708882789845441094009797907518245726494471433964169680271980763830020431957658400573531564215436064984091520
Results obtained with wxMaxima
EDIT: After more clarification from the comments of the question, my answer is absolutely useless :) he definitely wants the number of possible routes, not the best one!
My first thought is why do you want to know these numbers? You're certainly never going to iterate through all the paths available to 500 people (would take far too long) and it's too big to display on a ui in any meaningful way.
I'm assuming that you're going to try to find the best route that the ball can take in which case I would consider looking into algorithms that don't care about the number of nodes in a route.
I'd try looking at the A star algorithm and Dijkstra's algorithm.

Resources