UML relationship calculation - uml

(a) If there are 6 authors, what's the minimum and the maximum number of books? What's the minimum and the maximum number of readers?
(b) If there are 6 readers, what's the minimum and the maximum number of books? What's the minimum and the maximum number of authors?

If you don't know the exact numbers, just create an instance diagram according to the class diagram. I'll show you the a) part of the questions. Minimum: You've got six authors, so draw six objects typed by an Author. Every author must have at least one book. So draw one object typed by a Book. Well, every book can have up to two authors. Well, draw two links between the book and two authors. You still have 4 authors without a book. Repeat the steps above. You will end up with 3 books and 6 links. As for the Readers: there is no need do have one connected with a book. So the minimum is zero. If you did it well you should have something like this:
The similar way you do with the maximum. You will ended up with this:
So now you are able to do the b) part of your question.

Related

uml exercise Texas'Hold'Em Poker

I'm really a beginner in dealing in Java, suprise, and we have have to make an UML class chart for a situation. I'm totally unsure about my UML-diagram and I wanted to ask, if someone with a little bit more expierience could take a look. Thank you.
This is the situation:
For poker in the Texas Hold'em variant following information is given:
For a pokergame you need between two and twelve players and a card deck, that has 52 cards. Each card has a label (e.g., "King"), a value (2 to 14) and a color (e.g., Cross). Further specified for a poker game are the sum of all inserts (the pot, e.g., 450), and five (common) cards from the deck.
For each player the name, the credit (e.g., 7592) and the current hand, two cards from the card deck and a value (e.g., "Full House?").
For each player it is also noted whether he has the dealer position or not.
Task: Design an UML class chart for the game without functionalities (the classes to be used are indicated by bold). For the classes, type the necessary instance variables, including their (Java-compliant) data types. Draw all the relationships between the classes. Pay attention to the information given above multiplicities.
This is my solution:
What is meant by functionalities? Did I do the relations right? Do I need methods?
I tried to get the information from the text into a diagram:
You could replace the enumerations by plain attributes of type 'String'. I didn't fill out all values of HandValue (the three dots are not valid UML).
It is not clear to me whether a hand always has a value. If it has, then replace multiplicity 0..1 by 1.
If I understand the text correctly, a hand has only two cards, but this seems odd to me, I think it should be five.
You could add a composition diamond on the association between CardDeck and Card, but it is not clear from the text whether this is appropriate.
You could add a constraint, that the commonCards of a PokerGame should be a subset of the cards of the cardDeck of the PokerGame; also, that the cards of the hand of each of the players of the PokerGame are a subset of the cards of the cardDeck of the PokerGame.
You could draw an open arrowhead at the end of each association (not a triangle arrowhead; this would mean generalization), i.e. at the side where I have mentioned the multiplicity and the instance variable name.
You could specify multiplicities at the other ends of the associations, but these multiplicities are not mentioned in the text.
I didn't specify any visibilities (public/private/protected), because these are not specified in the text either.
You should not so much care about aggregation. This does not add much value to a design (but only in rare cases). Firstly your arrows are wrong. They represent generalization in UML. You need simple associations. Just leave away the arrows (which need to be unfilled open triangles) except you want to express navigability to be just in one direction (which in most cases is also nothing you need absolutely). What you should do is to use roles at the ends of the associations rather then putting typed properties in the classes. Further you should follow conventions that say class names start with an upper case letter (though I'm not really familiar with Java).
The above is a partial model to show what I mean. Note that I have added a dealer association. This assures that there is exactly one dealer in the game. The {subsets players} constraint tells that the dealer must be one of the players (thanks to JimL. for the hint). Using just the flag could lead to multiple dealers with the flag set. There should be a constraint that tells isDealer is true only for the one that is linked with the dealer association.

Do you need to sort inputs for dynamic programming knapsack

In every single example I've found for a 1/0 Knapsack problem using dynamic programming where the items have weights(costs) and profits, it never explicitly says to sort the items list, but in all the examples they are sorted by increasing both weight and profit (higher weights have higher profits in the examples). So my question is when adding items in the matrix from the item array/list, can I add them in any order, or do I add the one with the smallest weight or profit? Because from multiple examples I found I'm not sure if its just a coincidence or you do in fact need to put the smallest weight/profit into the matrix each time
The dynamic programming solution is nothing but choosing all the possibilities (using brute force) in an efficient way (just by saving the values for future reference).
Note: We consider all the subsets. Even if the list is sorted or not the total number of subsets will be the same. So in the end all the subsets will get considered.
No, you don't need to sort the weights because every row gives the maximum possible value under the weight limit of that row. The maximum will come in the last column of that row.
Maybe you are looking at bottom-up Dynamic solutions. And there is one characteristic in Dynamic solution when you solve using the bottom-up method.
The second approach is the bottom-up method. This approach typically depends
on some natural notion of the “size” of a subproblem, such that solving any particular
subproblem depends only on solving “smaller” subproblems. We sort the
subproblems by size and solve them in size order, smallest first. When solving a
particular subproblem, we have already solved all of the smaller subproblems its
solution depends upon, and we have saved their solutions. We solve each subproblem
only once, and when we first see it, we have already solved all of its
prerequisite subproblems.
From: Introduction to Algorithm, CORMEN (3rd edition)
The "smaller problem" in this case is just smaller in terms of the number of available items to choose, not about the profits or weights of these items. If given a 3 item list, the sub-problems will be 2 items, and 1 item to choose from.
The smallest problem is first hardcoded (base case), then at each stage of moving from a smaller to bigger problem, the best profit is enumerated, and the max is chosen. At the end, all 2^n combinations would have been considered and the various stages of repeated max will bubble up the largest solution.
Changing the input order, or putting dominated items into the input (like higher weight and lower profit) may just change which argument of max() won at each stage, but the final max result will come from the same selection of items, albeit being selected at different stages in the algorithm for different sort orders or input characteristics.
The answer could be found by some random shuffle experiments.
which I found is: better in ascending order. correct me if i'm wrong.
gist: https://gist.github.com/whille/39cf7bf8cf5dcf6ac933063735ae54de
Problem described in "Algorithm Design", ISBN: 9780321295354, chapter 6.4.
Two methods could be used:
as the chapter used, a M cache to pre-calculated, in which not sub answer is needed.
recursive function, which is simple to understand and test. I found functools.cache of python3.5+ could be use to check how many sub caculation is need, as my gist show: ascending order of test_random() is the smallest in currsize, So it's most efficient, and could extend to float value.
results for 10 random weights(1~100) and 200 knapack:
[(13.527716157276256, 18.371888775465692), (16.18632175987168, 206.88043031085252), (20.14117982372607, 81.52793937986635), (33.28606671929836, 298.8676699147799), (49.12968642850187, 22.037638580809592), (55.279973594800225, 377.3715225559507), (56.56103181962746, 460.9161412820592), (60.38456825749498, 10.721915577913244), (67.98836121062645, 63.47478755362385), (86.49436333909377, 208.06767811169286)]: reverse: False
CacheInfo(hits=0, misses=832, maxsize=None, currsize=832)
[(86.49436333909377, 208.06767811169286), (67.98836121062645, 63.47478755362385), (60.38456825749498, 10.721915577913244), (56.56103181962746, 460.9161412820592), (55.279973594800225, 377.3715225559507), (49.12968642850187, 22.037638580809592), (33.28606671929836, 298.8676699147799), (20.14117982372607, 81.52793937986635), (16.18632175987168, 206.88043031085252), (13.527716157276256, 18.371888775465692)]: reverse: True
CacheInfo(hits=0, misses=1120, maxsize=None, currsize=1120)
Note for method2:
if capacity is much larger, all random orders have equal currsize.
callstack overflow should be avoided for large N, so recurse method should be transformed. Generally a two step method could be used, first map subproblem dependency, and calc. I'ill try later in gist.
I guess, sorting might be required in certain types of knapsack problem. For example consider the problem "Maximum Earnings From Taxi". Here, the input has to be sorted by the starting point of the riders, else we won't get the optimal result.
For example, consider the below input for the above problem:-
9
[[2,3,1],[2,9,2], [3,6,7],[2,3,6]]
If you apply a typical knapsack implementation in recursive approach, without sorting the input we won't get the optimal solution.

O(n^2) (or O(n^2lg(n)) ?)algorithm to calculate the longest common subsequence (LCS) of two 'ring' string

This is a problem appeared in today's Pacific NW Region Programming Contest during which no one solved it. It is problem B and the complete problem set is here: http://www.acmicpc-pacnw.org/icpc-statements-2011.zip. There is a well-known O(n^2) algorithm for LCS of two strings using Dynamic Programming. But when these strings are extended to rings I have no idea...
P.S. note that it is subsequence rather than substring, so the elements do not need to be adjacent to each other
P.S. It might not be O(n^2) but O(n^2lgn) or something that can give the result in 5 seconds on a common computer.
Searching the web, this appears to be covered by section 4.3 of the paper "Incremental String Comparison", by Landau, Myers, and Schmidt at cost O(ne) < O(n^2), where I think e is the edit distance. This paper also references a previous paper by Maes giving cost O(mn log m) with more general edit costs - "On a cyclic string to string correcting problem". Expecting a contestant to reproduce either of these papers seems pretty demanding to me - but as far as I can see the question does ask for the longest common subsequence on cyclic strings.
You can double the first and second string and then use the ordinary method, and later wrap the positions around.
It is a good idea to "double" the strings and apply the standard dynamic programing algorithm. The problem with it is that to get the optimal cyclic LCS one then has to "start the algorithm from multiple initial conditions". Just one initial condition (e.g. setting all Lij variables to 0 at the boundaries) will not do in general. In practice it turns out that the number of initial states that are needed are O(N) in number (they span a diagonal), so one gets back to an O(N^3) algorithm.
However, the approach does has some virtue as it can be used to design efficient O(N^2) heuristics (not exact but near exact) for CLCS.
I do not know if a true O(N^2) exist, and would be very interested if someone knows one.
The CLCS problem has quite interesting properties of "periodicity": the length of a CLCS of
p-times reapeated strings is p times the CLCS of the strings. This can be proved by adopting a geometric view off the problem.
Also, there are some additional benefits of the problem: it can be shown that if Lc(N) denotes the averaged value of the CLCS length of two random strings of length N, then
|Lc(N)-CN| is O(\sqrt{N}) where C is Chvatal-Sankoff's constant. For the averaged length L(N) of the standard LCS, the only rate result of which I know says that |L(N)-CN| is O(sqrt(Nlog N)). There could be a nice way to compare Lc(N) with L(N) but I don't know it.
Another question: it is clear that the CLCS length is not superadditive contrary to the LCS length. By this I mean it is not true that CLCS(X1X2,Y1Y2) is always greater than CLCS(X1,Y1)+CLCS(X2,Y2) (it is very easy to find counter examples with a computer).
But it seems possible that the averaged length Lc(N) is superadditive (Lc(N1+N2) greater than Lc(N1)+Lc(N2)) - though if there is a proof I don't know it.
One modest interest in this question is that the values Lc(N)/N for the first few values of N would then provide good bounds to the Chvatal-Sankoff constant (much better than L(N)/N).
As a followup to mcdowella's answer, I'd like to point out that the O(n^2 lg n) solution presented in Maes' paper is the intended solution to the contest problem (check http://www.acmicpc-pacnw.org/ProblemSet/2011/solutions.zip). The O(ne) solution in Landau et al's paper does NOT apply to this problem, as that paper is targeted at edit distance, not LCS. In particular, the solution to cyclic edit distance only applies if the edit operations (add, delete, replace) all have unit (1, 1, 1) cost. LCS, on the other hand, is equivalent to edit distances with (add, delete, replace) costs (1, 1, 2). These are not equivalent to each other; for example, consider the input strings "ABC" and "CXY" (for the acyclic case; you can construct cyclic counterexamples similarly). The LCS of the two strings is "C", but the minimum unit-cost edit is to replace each character in turn.
At 110 lines but no complex data structures, Maes' solution falls towards the upper end of what is reasonable to implement in a contest setting. Even if Landau et al's solution could be adapted to handle cyclic LCS, the complexity of the data structure makes it infeasible in a contest setting.
Last but not least, I'd like to point out that an O(n^2) solution DOES exist for CLCS, described here: http://arxiv.org/abs/1208.0396 At 60 lines, no complex data structures, and only 2 arrays, this solution is quite reasonable to implement in a contest setting. Arriving at the solution might be a different matter, though.

A math modelling interview question

What's the best way to (relative) grade a class (of 50 students) on a test (with 7 questions)?
They did not want the traditional percentile-intervals answer, but a more CS-ey one.
It's a pretty open ended question, they asked to assume the following framework:
Input
[m_1,...,m_50], where each m_i is a 7-vector for marks scored in the 7 questions for each of the 50 students.
[c_1,...,c_7], where each c_i is a vector of 'concepts' tested by each question. c_i's need not be disjoint. We can assume to have an importance ordering amongst elements of union(c_i).
Simplistic approach: Assuming that all concepts have the same value I would just sum it all up. One point for each concept everywhere.
Holistic approach: It could be that the question with more concepts is significantly harder than the question with fewer (and worth more than the sum of concepts). Concepts "interact" with each other. To alleviate this I would put a value of (N over C) to each question, where N is the size of the vector of concepts, and C is total number of concepts. And then I would sum it all up.
True holistic approach: If concepts are repeated in different questions then we should "tone down" their influence. However I'm not sure how to accomplish this. Maybe we should divide each (N over C) value with the number of repetitions of each concept involved.
I ignored the importance ordering of concepts, because I don't know how to put a value on that.

cluster short, homogeneous strings (DNA) according to common sub-patterns and extract consensus of classes

Task:
to cluster a large pool of short DNA fragments in classes that share common sub-sequence-patterns and find the consensus sequence of each class.
Pool: ca. 300 sequence fragments
8 - 20 letters per fragment
4 possible letters: a,g,t,c
each fragment is structured in three regions:
5 generic letters
8 or more positions of g's and c's
5 generic letters
(As regex that would be [gcta]{5}[gc]{8,}[gcta]{5})
Plan:
to perform a multiple alignment (i.e. withClustalW2) to find classes that share common sequences in region 2 and their consensus sequences.
Questions:
Are my fragments too short, and would it help to increase their size?
Is region 2 too homogeneous, with only two allowed letter types, for showing patterns in its sequence?
Which alternative methods or tools can you suggest for this task?
Best regards,
Simon
Yes, 300 is FAR TOO FEW considering that this is the human genome and you're essentially just looking for a particular 8-mer. There are 65,536 possible 8-mers and 3,000,000,000 unique bases in the genome (assuming you're looking at the entire genome and not just genic or coding regions). You'll find G/C containing sequences 3,000,000,000 / 65,536 * 2^8 =~ 12,000,000 times (and probably much more since the genome is full of CpG islands compared to other things). Why only choose 300?
You don't want to use regex's for this task. Just start at chromosome 1, look for the first CG or GC and extend until you get your first non-G-or-C. Then take that sequence, its context and save it (in a DB). Rinse and repeat.
For this project, Clustal may be overkill -- but I don't know your objectives so I can't be sure. If you're only interested in the GC region, then you can do some simple clustering like so:
Make a database entry for each G/C 8-mer (2^8 = 256 in all).
Take each GC-region and walk it to see which 8-mers it contains.
Tag each GC-region with the sequences it contains.
Now, for each 8-mer, you have thousands of sequences which contain it. I'll leave the analysis of the data up to your own objectives.
Your region two, with the 2 letters, may end up a bit too similar, increasing length or variability (e.g. more letters) could help.

Resources