A math modelling interview question

A math modelling interview question - modeling

What's the best way to (relative) grade a class (of 50 students) on a test (with 7 questions)?
They did not want the traditional percentile-intervals answer, but a more CS-ey one.
It's a pretty open ended question, they asked to assume the following framework:
Input
[m_1,...,m_50], where each m_i is a 7-vector for marks scored in the 7 questions for each of the 50 students.
[c_1,...,c_7], where each c_i is a vector of 'concepts' tested by each question. c_i's need not be disjoint. We can assume to have an importance ordering amongst elements of union(c_i).

Simplistic approach: Assuming that all concepts have the same value I would just sum it all up. One point for each concept everywhere.
Holistic approach: It could be that the question with more concepts is significantly harder than the question with fewer (and worth more than the sum of concepts). Concepts "interact" with each other. To alleviate this I would put a value of (N over C) to each question, where N is the size of the vector of concepts, and C is total number of concepts. And then I would sum it all up.
True holistic approach: If concepts are repeated in different questions then we should "tone down" their influence. However I'm not sure how to accomplish this. Maybe we should divide each (N over C) value with the number of repetitions of each concept involved.
I ignored the importance ordering of concepts, because I don't know how to put a value on that.

Related

How would I construct an integer optimization model corresponding to a graph

Suppose we're given some sort of graph where the feasible region of our optimization problem is given. For example: here is an image
How would I go on about constructing these constraints in an integer optimization problem? Anyone got any tips? Thanks!

Mate, I agree with the others that you should be a little more specific than that paint-ish picture ;). In particular you are neither specifying any objective/objective direction nor are you giving any context, what about this graph should be integer-variable related, except for the existence of disjunctive feasible sets, which may be modeled by MIP-techniques. It seems like your problem is formalization of what you conceptualized. However, in case you are just being lazy and are just interested in modelling disjunctive regions, you should be looking into disjunctive programming techniques, such as "big-M" (Note: big-M reformulations can be problematic). You should be aiming at some convex-hull reformulation if you can attain one (fairly easily).
Back to your picture, it is quite clear that you have a problem in two real dimensions (let's say in R^2), where the constraints bounding the feasible set are linear (the lines making up the feasible polygons).
So you know that you have two dimensions and need two real continuous variables, say x[1] and x[2], to formulate each of your linear constraints (a[i,1]*x[1]+a[i,2]<=rhs[i] for some index i corresponding to the number of lines in your graph). Additionally your variables seem to be constrained to the first orthant so x[1]>=0 and x[2]>=0 should hold. Now, to add disjunctions you want some constraints that only hold when a certain condition is true. Therefore, you can add two binary decision variables, say y[1],y[2] and an additional constraint y[1]+y[2]=1, to tell that only one set of constraints can be active at the same time. You should be able to implement this with the help of big-M by reformulating the constraints as follows:
If you bound things from above with your line:
a[i,1]*x[1]+a[i,2]-rhs[i]<=M*(1-y[1]) if i corresponds to the one polygon,
a[i,1]*x[1]+a[i,2]-rhs[i]<=M*(1-y[2]) if i corresponds to the other polygon,
and if your line bounds things from below:
-M*(1-y[1])<=-a[i,1]*x[1]-a[i,2]+rhs[i] if i corresponds to the one polygon,
-M*(1-y[1])<=-a[i,1]*x[1]-a[i,2]+rhs[i] if i corresponds to the other polygon.
It is important that M is sufficiently large, but not too large to cause numerical issues.
That being said, I am by no means an expert on these disjunctive programming techniques, so feel free to chime in, add corrections or make things clearer.
Also, a more elaborate question typically yields more elaborate and satisfying answers ;) If you had gone to the effort of making up a true small example problem you likely would have gotten a full formulation of your problem or even an executable piece of code in no time.

What is the difference between overlapping subproblems and optimal substructure?

I understand the target approach for both the methods where Optimal Substructure calculates the optimal solution based on an input n while Overlapping Subproblems targets all the solutions for the range of input say from 1 to n.
For a problem like the Rod Cutting Problem. In this case while finding the optimal cut, do we consider each cut hence it can be considered as Overlapping Subproblem and work bottom-up. Or do we consider the optimal cut for a given input n and work top-down.
Hence, while they do deal with the optimality in the end, what are the exact differences between the two approaches.
I tried referring to this Overlapping Subproblem, Optimal Substructure and this page as well.
On a side note as well, does this relate to the solving approaches of Tabulation(top-down) and Memoization(bottom-up)?
This thread makes a valid point but I'm hoping if it could be broken down easier.

To answer your main question: overlapping subproblems and optimal substructure are both different concepts/properties, a problem that has both these properties or conditions being met can be solved via Dynamic Programming. To understand the difference between them, you actually need to understand what each of these term means in regards to Dynamic Programming.
I understand the target approach for both the methods where Optimal Substructure calculates the optimal solution based on an input n while Overlapping Subproblems targets all the solutions for the range of input say from 1 to n.
This is a poorly worded statement. You need to familiarize yourself with the basics of Dynamic Programming. Hopefully following explanation will help you get started.
Let's start with defining what each of these terms, Optimal Substructure & Overlapping Subproblems, mean.
Optimal Substructure: If optimal solution to a problem, S, of size n can be calculated by JUST looking at optimal solution of a subproblem, s, with size < n and NOT ALL solutions to subproblem, AND it will also result in an optimal solution for problem S, then this problem S is considered to have optimal substructure.
Example (Shortest Path Problem): consider a undirected graph with vertices a,b,c,d,e and edges (a,b), (a,e), (b,c), (c,d), (d,a) & (e,b) then shortest path between a & c is a -- b -- c and this problem can be broken down into finding shortest path between a & b and then shortest path between b & c and this will give us a valid solution. Note that we have two ways of reaching b from a:
a -- b (Shortest path)
a -- e -- b
Longest Path Problem does not have optimal substructure. Longest path between a & d is a -- e -- b -- c -- d, but sum of longest paths between a & c (a -- e -- b -- c) and c & d (c -- b -- e -- a -- d) won't give us a valid (non-repeating vertices) longest path between a & d.
Overlapping Subproblems: If you look at this diagram from the link you shared:
You can see that subproblem fib(1) is 'overlapping' across multiple branches and thus fib(5) has overlapping subproblems (fib(1), fib(2), etc).
On a side note as well, does this relate to the solving approaches of Tabulation(top-down) and Memoization(bottom-up)?
This again is a poorly worded question. Top-down(recursive) and bottom-up(iterative) approaches are different ways of solving a DP problem using memoization. From the Wikipedia article of Memoization:
In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again.
For the given fibonacci example, if we store fib(1) in a table after it was encountered the first time, we don't need to recompute it again when we see it next time. We can reuse the stored result and hence saving us lot of computations.
When we implement an iterative solution, "table" is usually an array (or array of arrays) and when we implement a recursive solution, "table" is usually a dynamic data structure, a hashmap (dictionary).
You can further read this link for better understanding of these two approaches.

Why mergesort is not Dynamic programming

I have read these words:
There are two key attributes that a problem must have in order for dynamic programming to be applicable: optimal substructure and overlapping subproblems. If a problem can be solved by combining optimal solutions to non-overlapping subproblems, the strategy is called "divide and conquer". This is why mergesort and quicksort are not classified as dynamic programming problems.
I have the 3 questions:
Why mergesort and quicksort is not Dynamic programming？
I think mergesort also can be divided small problems and small problems then do the same thing and so on.
Is Dijkstra Algorithm using dynamic algorithm?
Are there applied examples of using Dynamic programming?

The key words here are "overlapping subproblems" and "optimal substructure". When you execute quicksort or mergesort, you are recursively breaking down your array into smaller pieces that do not overlap. You never operate over the same elements of the original array twice during any given level of the recursion. This means there is no opportunity to re-use previous calculations. On the other hand, many problems DO involve performing the same calculations over overlapping subsets, and have the useful characteristic that an optimal solution to a subproblem can be re-used when computing the optimal solution to a larger problem.
Dijkstra's algorithm is a classic example of dynamic programming, as it re-uses prior computations to discover the shortest path between two nodes A and Z. Say that A's immediate neighbors are B and C. We can find the shortest path from A to Z by summing the distance between A and B with our computed shortest path from B to Z; and do similarly for finding the shortest path from C to Z. Then the shortest path from A to Z will be the shorter of these two paths. The key insight here is that we can re-use the shortest path computations for paths of length 2 when computing the shortest paths of length 3, and so on. Doing so results in a much more efficient algorithm.
Dynamic programming can be used to solve many types of problems -- see http://en.wikipedia.org/wiki/Dynamic_programming#Examples:_Computer_algorithms for some examples.

For dynamic programming to be applicable to a problem, there should be
i. An optimal structure in the subproblems:
This means that when you break down your problem into smaller units, those smaller units also need to be broken down into yet smaller units for an optimal solution. For example, in merge sort, a array of numbers can get sorted if we divide it into two subarrays, get them sorted and combine them. While sorting these two subarrays, repeat the same process you followed in the previous sentence. So an optimal solution (a sorted array) is got when we find an optimal solution to its subproblems (we sort the subarrays and combine them). This requirement is fulfilled for merge sort. Also the subproblems must be independent for them to follow an optimal structure. This is also fulfilled by merge sort as the subproblems' solutions do not get affected by each others' solutions. For example, the solutions to the two parts of an array are not affected by each other's sortedness.
ii. Overlapping subproblems:
This means that while solving for the solution, the subproblems you formulate get repeated, and hence need only be solved once. In the case of merge sort, this requirement will be met only rarely in the normal case. An array of numbers like 2 1 3 4 9 4 2 1 3 1 9 4 may be a good candidate for overlapping subproblems for merge sort. In this case, the solution to the subproblem sort(2 1 3) can be stored in a table to be reused, because it will be called twice during the computation. But as you can see, there is a very slim chance that a random array of numbers will have this kind of a repeated contrivance. So it would only be inefficient if we used a dynamic programming technique like memoization for an algorithm like merge sort.
Yes. Dijkstra's algorithm uses dynamic programming as mentioned by #Alan in the comment. link
Yes. If I may quote Wikipedia here,
"Dynamic programming is widely used in bioinformatics for the tasks such as sequence alignment, protein folding, RNA structure prediction and protein-DNA binding." 1
1 https://en.wikipedia.org/wiki/Dynamic_programming

How Do Ranks Work?

The best way for me to understand J is emulating the interpreter. Since the language is compact and has little rules, it's been easy... with the exception of how ranks affect function evaluation.
I want to be able to see an expression and know what's J doing to get the result, step by step.
Is there a doc, or someone could give me an algorithm so I can calculate myself how a f " n m b is evaluated?
Thanks in advance.

For learning about Rank the most accessible text is probably chapter 6 of J for C Programmers. The section of Eric Iverson's Primer that begins with Atom and goes through Checkpoint E covers the topic more concisely. Chapter 7 of Learning J is another place Rank is covered. All are valuable.
The most in-depth examination of Rank is Roger Hui's essay Rank and Uniformity. Hui's paper will make better reading after you've studied the other texts on this topic. Should it come down to wanting the nitty-gritty of implementation, you could dive into the interpreter source code. Personally, I'd not do that last one. Were I wanting to look at implementation algorithms I'd build a little model, and check it against the results of a J interpreter to make sure that my understanding of Rank matches.
Rank, in my view, is the most important concept in J. It is quite abstract in that it applies across all the shapes that nouns can take. The associated concepts are important to learn. These include shape, frame, cell, and agreement. These are explained individually in the Primer, but they're explained in some manner every time the topic is dealt with in depth.
The better your understanding of the Rank conjunction, and the broader world of noun Rank and verb Rank in which it applies, the more useful you'll find the three sections of the Vocabulary that deal with this conjunction. (Those sections are m"n , u"n , and m"v u"v .)
If you do come to write any algorithms that help you examine things in a step-by-step fashion, other J programmers will enjoy seeing them, I'm sure. I don't know of anything along those lines other than the actual interpreter source code.

O(n^2) (or O(n^2lg(n)) ?)algorithm to calculate the longest common subsequence (LCS) of two 'ring' string

This is a problem appeared in today's Pacific NW Region Programming Contest during which no one solved it. It is problem B and the complete problem set is here: http://www.acmicpc-pacnw.org/icpc-statements-2011.zip. There is a well-known O(n^2) algorithm for LCS of two strings using Dynamic Programming. But when these strings are extended to rings I have no idea...
P.S. note that it is subsequence rather than substring, so the elements do not need to be adjacent to each other
P.S. It might not be O(n^2) but O(n^2lgn) or something that can give the result in 5 seconds on a common computer.

Searching the web, this appears to be covered by section 4.3 of the paper "Incremental String Comparison", by Landau, Myers, and Schmidt at cost O(ne) < O(n^2), where I think e is the edit distance. This paper also references a previous paper by Maes giving cost O(mn log m) with more general edit costs - "On a cyclic string to string correcting problem". Expecting a contestant to reproduce either of these papers seems pretty demanding to me - but as far as I can see the question does ask for the longest common subsequence on cyclic strings.

You can double the first and second string and then use the ordinary method, and later wrap the positions around.

It is a good idea to "double" the strings and apply the standard dynamic programing algorithm. The problem with it is that to get the optimal cyclic LCS one then has to "start the algorithm from multiple initial conditions". Just one initial condition (e.g. setting all Lij variables to 0 at the boundaries) will not do in general. In practice it turns out that the number of initial states that are needed are O(N) in number (they span a diagonal), so one gets back to an O(N^3) algorithm.
However, the approach does has some virtue as it can be used to design efficient O(N^2) heuristics (not exact but near exact) for CLCS.
I do not know if a true O(N^2) exist, and would be very interested if someone knows one.
The CLCS problem has quite interesting properties of "periodicity": the length of a CLCS of
p-times reapeated strings is p times the CLCS of the strings. This can be proved by adopting a geometric view off the problem.
Also, there are some additional benefits of the problem: it can be shown that if Lc(N) denotes the averaged value of the CLCS length of two random strings of length N, then
|Lc(N)-CN| is O(\sqrt{N}) where C is Chvatal-Sankoff's constant. For the averaged length L(N) of the standard LCS, the only rate result of which I know says that |L(N)-CN| is O(sqrt(Nlog N)). There could be a nice way to compare Lc(N) with L(N) but I don't know it.
Another question: it is clear that the CLCS length is not superadditive contrary to the LCS length. By this I mean it is not true that CLCS(X1X2,Y1Y2) is always greater than CLCS(X1,Y1)+CLCS(X2,Y2) (it is very easy to find counter examples with a computer).
But it seems possible that the averaged length Lc(N) is superadditive (Lc(N1+N2) greater than Lc(N1)+Lc(N2)) - though if there is a proof I don't know it.
One modest interest in this question is that the values Lc(N)/N for the first few values of N would then provide good bounds to the Chvatal-Sankoff constant (much better than L(N)/N).

As a followup to mcdowella's answer, I'd like to point out that the O(n^2 lg n) solution presented in Maes' paper is the intended solution to the contest problem (check http://www.acmicpc-pacnw.org/ProblemSet/2011/solutions.zip). The O(ne) solution in Landau et al's paper does NOT apply to this problem, as that paper is targeted at edit distance, not LCS. In particular, the solution to cyclic edit distance only applies if the edit operations (add, delete, replace) all have unit (1, 1, 1) cost. LCS, on the other hand, is equivalent to edit distances with (add, delete, replace) costs (1, 1, 2). These are not equivalent to each other; for example, consider the input strings "ABC" and "CXY" (for the acyclic case; you can construct cyclic counterexamples similarly). The LCS of the two strings is "C", but the minimum unit-cost edit is to replace each character in turn.
At 110 lines but no complex data structures, Maes' solution falls towards the upper end of what is reasonable to implement in a contest setting. Even if Landau et al's solution could be adapted to handle cyclic LCS, the complexity of the data structure makes it infeasible in a contest setting.
Last but not least, I'd like to point out that an O(n^2) solution DOES exist for CLCS, described here: http://arxiv.org/abs/1208.0396 At 60 lines, no complex data structures, and only 2 arrays, this solution is quite reasonable to implement in a contest setting. Arriving at the solution might be a different matter, though.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string