String manipulation with dynamic programming

String manipulation with dynamic programming - string

I have a problem where I have a string of length N, where (1 ≤ N ≤ 10^5). This string will only have lower case letters.
We have to rewrite the string so that it has a series of "streaks", where the same letter is included at least K (1 ≤ K ≤ N) times in a row.
It costs a_ij to change a single specific letter in the string from i to j. There are M different possible letters you can change each letter to.
Example: "abcde" is the input string. N = 5 (length of "abcde"), M = 5 (letters are A, B, C, D, E), and K = 2 (each letter must be repeated at least 2 times) Then we are given a M×M matrix of values a_ij, where a_ij is an integer in the range 0…1000 and a_ii = 0 for all i.
0 1 4 4 4
2 0 4 4 4
6 5 0 3 2
5 5 5 0 4
3 7 0 5 0
Here, it costs 0 to change from A to A, 1 to change from A to B, 4 to change from A to C, and so on. It costs 2 to change from B to A.
The optimal solution in this example is to change the a into b, change the d into e, and then change both e’s into c’s. This will take 1 + 4 + 0 + 0 = 5 moves, and the final combo string will be "bbccc".
It becomes complicated as it might take less time to switch from using button i to an intermediate button k and then from button k to button j rather than from i to j directly (or more generally, there may be a path of changes starting with i and ending with j that gives the best overall cost for switching from button i ultimately to button j).
To solve for this issue, I am treating the matrix as a graph, and then performing Floyd Warshall to find the fastest time to switch letters. This will take O(M^3) which is only 26^3.
My next step is to perform dynamic programming on each additional letter to find the answer. If someone could give me advice on how to do this, I would be thankful!

Here are some untested ideas. I'm not sure if this is efficient enough (or completely worked out) but it looks like 26 * 3 * 10^5. The recurrence could be converted to a table, although with higher Ks, memoisation might be more efficient because of reduced state possibilities.
Assume we've recorded 26 prefix arrays for conversion of the entire list to each of the characters using the best conversion schedule, using a path-finding method. This lets us calculate the cost of a conversion of a range in the string in O(1) time, using a function, cost.
A letter in the result can be one of three things: either it's the kth instance of character c, or it's before the kth, or it's after the kth. This leads to a general recurrence:
f(i, is_kth, c) ->
cost(i - k + 1, i, c) + A
where
A = min(
f(i - k, is_kth, c'),
f(i - k, is_after_kth, c')
) forall c'
A takes constant time since the alphabet is constant, assuming earlier calls to f have been tabled.
f(i, is_before_kth, c) ->
cost(i, i, c) + A
where
A = min(
f(i - 1, is_before_kth, c),
f(i - 1, is_kth, c'),
f(i - 1, is_after_kth, c')
) forall c'
Again A is constant time since the alphabet is constant.
f(i, is_after_kth, c) ->
cost(i, i, c) + A
where
A = min(
f(i - 1, is_after_kth, c),
f(i - 1, is_kth, c)
)
A is constant time in the latter. We would seek the best result of the recurrence applied to each character at the end of the string with either state is_kth or state is_after_kth.

Related

how can I find the missing integers when given a list of integers that can be repeated C times cannot add up to value V

So I am given 3 input values, C, D, and V. Where D is a list of integers and C is the number of times that any element in D can be repeated when summing to at most a value V.
Now what will always be the case is that not every combination of sums in D will equal 1 to V. Therefore, my function has to find the minimum amount of integers to include in D that will satisfy summing to at most V.
In other words, minimum number of integers to insert into D such that all integers from 1 <= x <= V can be represented by a summation of combination of numbers in D, where no number is repeated more than C times
For example,
D = [1, 5, 10, 25]
C = 2
V = 100
As you can see, the max value I can get from D and C is:
2(1+5+10+25) = 82 which is less than V = 100.
I solved it manually where I can satisfy every value from 1 to 100 when I include two new integers in D which is 12 and 2.
So my output would be 12 and 2.
Now the way I saw it was to basically cover all single digit values so since I already have 1 and 5 in D where I can repeat their sums 2 times given by C=2, the values I cannot account for are 3, 4, 8, and 9 which can be resolved by introducing 2 into D and likewise for double digits it would be 12 given C and D.
ie.
< 3=1+2, 4=2+2 or 2+1+1, 8=5+2+1, 9=5+2+2 or 5+2+1+1 >
What becomes obvious now is the fact that computationally it can be extremely expensive for large values of N, especially if I brute force it.
My strategy was this:
Find all possible combinations of sums given D and C then put into a list, and lets call it dSums.
Create a list ranging from 1 to V ie.
vList = [i for i in range(1, v+1, 1)]
iterate through dSums and remove any value in dSums from vList
so now I have a list of all integers that are missing but the problem is this.
If D = [1, 5], C=2 and V=10
vList = [3, 4, 8, 9]
I thought I can find the common integer that will satisfy all the values in vList using D and C but I am not sure how to traverse or map the lists in such a way that I ultimately arrive at my desired value which is 2.
Any thoughts?

How can I solve this classical dynamic programming problem?

There are N jewellery shop(s). Each jewellery shop has three kinds of coins - Gold, Platinum, and Diamond having worth value A, B, and C respectively. You decided to go to each of N jewellery shop and take coins from each of the shop. But to do so following conditions must satisfy -
You can take at most 1 coin from an individual shop.
You can take at most X coins of Gold type.
You can take at most Y coins of Platinum type.
You can take at most Z coins of Diamond type.
You want to collect coins from shops in such a way that worth value of coins collected is maximised.
Input Format :
The first line contains an integer N. Where N is the number of jewellery shops.
The second line contains three integers X, Y, Z. Where X, Y, Z denotes the maximum number of coins you can collect of type Gold, Platinum, and diamond respectively.
Then N lines contain three space-separated integers A, B, C. Where A, B, C is the worth value of the Gold, Platinum, and diamond coin respectively.
Output Format :
Print a single integer representing the maximum worth value you can get.
Constraints :
1
<=
N
<=
200
1
<=
X
,
Y
,
Z
<=
N
1
<=
A
,
B
,
C
<
10
9
Example : -
4
2 1 1
5 4 5
4 3 2
10 9 7
8 2 9
Answer:-
27(9+9+5+4)
I tried the obvious greedy approach but it failed :-)

Explanation of normalized edit distance formula

Based on this paper:
IEEE TRANSACTIONS ON PAITERN ANALYSIS : Computation of Normalized Edit Distance and Applications In this paper Normalized Edit Distance as followed:
Given two strings X and Y over a finite alphabet, the normalized edit
distance between X and Y, d( X , Y ) is defined as the minimum of W( P
) / L ( P )w, here P is an editing path between X and Y , W ( P ) is
the sum of the weights of the elementary edit operations of P, and
L(P) is the number of these operations (length of P).
Can i safely translate the normalized edit distance algorithm explained above as this:
normalized edit distance =
levenshtein(query 1, query 2)/max(length(query 1), length(query 2))

You are probably misunderstanding the metric. There are two issues:
The normalization step is to divide W(P) which is the weight of the edit procedure over L(P), which is the length of the edit procedure, not over the max length of the strings as you did;
Also, the paper showed that (Example 3.1) normalized edit distance cannot be simply computed with levenshtein distance. You probably need to implement their algorithm.
An explanation of Example 3.1 (c):
From aaab to abbb, the paper used the following transformations:
match a with a;
skip a in the first string;
skip a in the first string;
skip b in the second string;
skip b in the second string;
match the final bs.
These are 6 operations which is why L(P) is 6; from the matrix in (a), matching has cost 0, skipping has cost 2, thus we have total cost of 0 + 2 + 2 + 2 + 2 + 0 = 8, which is exactly W(P), and W(P) / L(P) = 1.33. Similar results can be obtained for (b), which I'll left to you as exercise :-)

The 3 in figure 2(a) refers to the cost of changing "a" to "b" or the cost of changing "b" to "a". The columns with lambdas in figure 2(a) mean that it costs 2 in order to insert or delete either an "a" or a "b".
In figure 2(b), W(P) = 6 because the algorithm does the following steps:
keep first a (cost 0)
convert first b to a (cost 3)
convert second b to a (cost 3)
keep last b (cost 0)
The sum of the costs of the steps is W(P). The number of steps is 4 which is L(P).
In figure 2(c), the steps are different:
keep first a (cost 0)
delete first b (cost 2)
delete second b (cost 2)
insert a (cost 2)
insert a (cost 2)
keep last b (cost 0)
In this path there are six steps so the L(P) is 6. The sum of the costs of the steps is 8 so W(P) is 8. Therefore the normalized edit distance is 8/6 = 4/3 which is about 1.33.

column and row join a second box data

I want to column join
┌─┬─┬─┐
│1│1│2│
│2│4│4│
│3│9│6│
└─┴─┴─┘
and I'd like to put a=.1 2 3 as the fourth row, and then put b=.1 1 1 1 as the first column to the new boxed data. How can I do this easily? Do I have to ravel the whole thing and compute the dimention on my own in order to box it again?
Also, if I want the data i.8 to be 2 rows, do I have to calculate the other dimension 4(=8/2) in order to form a matrix 2 4$i.8? And then box it ;/2 4$i.8? Can I just specify one dimension, either the number of row or columns and ask automatic boxing or forming the matrix?

The answer to your question will involve learning about &. , the 'Under' conjunction, which is tremendously useful in J.
m
┌─┬─┬─┐
│1│1│2│
│2│2│4│
│3│9│6│
└─┴─┴─┘
a=. 1 2 3
b=. 1 1 1 1
So we want to add each item of a to each boxed column of m . It would be perfect if we could unbox the column using unbox(>), append the item of a to the column using append (,) and then rebox the column using box (<). This undo, act, redo cycle is exactly what Under (&.) does. It undoes both its right and left arguments ( m and a ) using the verb to its right, then applies the verb to its left, then uses the reverse of the verb to its right on the result. In practice,
m , &. > a
┌─┬─┬─┐
│1│1│2│
│2│2│4│
│3│9│6│
│1│2│3│
└─┴─┴─┘
The fact that a is unboxed when it was never boxed to begin with means that it is not changed, while m is unboxed before (,) is applied to each a . In fact this is used so often in J that &. > is assigned the name 'each'.
m , each a
┌─┬─┬─┐
│1│1│2│
│2│2│4│
│3│9│6│
│1│2│3│
└─┴─┴─┘
Prepending a boxed version of b requires first giving it an extra dimension with laminate (,:) then transposing (|:) b and finally boxing (<) the result. The step of adding the extra dimension is required because transposing swaps the indices and b start as a one-dimensional list.
(<#|:#,:b)
┌─┐
│1│
│1│
│1│
│1│
└─┘
The rest is easy as we just use append (,) to join the boxed b with (m, each a)
(<#|:#,: b) , m , each a
┌─┬─┬─┬─┐
│1│1│1│2│
│1│2│2│4│
│1│3│9│6│
│1│1│2│3│
└─┴─┴─┴─┘
Brackets around (<#|:#,: b) are necessary to force the correct order of execution.
For the second question, you can use i. n m to create a n X m array, which may help.
i. 4 2
0 1
2 3
4 5
6 7
i. 2 4
0 1 2 3
4 5 6 7
but perhaps I am misunderstanding your intentions here.
Hope this helps, bob

append a (with rank): ,"x a
You can simply append (,) a to your unboxed (>) input but you have to be careful with the append rank. You want to append each "item" of a, so you have right rank of "0". You want to apend to a 2-cell so you have a left rank of "2". Therefore, the , you need has rank "2 0. After the append, you rebox your data to a 2-cell with <"2.
<"2(>in)(,"2 0) a
┌─┬─┬─┐
│1│1│2│
│2│4│4│
│3│9│6│
│1│2│3│
└─┴─┴─┘
prepend b: b,
If your b has the right shape you prepend it with b,. The shape you seem to use is (boxed) 4 1:
b =: < 4 1$ 1
┌─┐
│1│
│1│
│1│
│1│
└─┘
b,in
┌─┬─┬─┬─┐
│1│1│1│2│
│1│2│4│4│
│1│3│9│6│
│1│1│2│3│
└─┴─┴─┴─┘

Is there a way to optimise this program in Haskell?

I am doing project euler question 224. And whipped up this list comprehension in Haskell:
prob39 = length [ d | d <- [1..75000000], c <- [1..37500000], b <-[1..c], a <- [1..b], a+b+c == d, a^2 + b^2 == (c^2 -1)]
I compiled it with GHC and it has been running with above average kernel priority for over an hour without returning a result. What can I do to optimise this solution? It seems I am getting better at finding brute force solutions in a naive manner. Is there anything I can do about this?
EDIT: I am also unclear about the definition of 'integral length', does this just mean the side length has a magnitude which falls in the positive set of integers, i.e: 1,2,3,4,5... ?

My Haskell isn't amazing, but I think this is going to be n^5 as written.
It looks like you're saying for each n from 1 to 75 million, check every "barely obtuse" triangle with a perimiter less than or equal to 75 million to see if it has perimiter n.
Also I'm not certain if list comprehensions are smart enough to stop looking once the current value of c^2 -1 is greater than a^2 + b^2.
A simple refactor should be
prob39 = length [ (a, b, c) | c <- [1..37500000], b <-[1..c], a <- [1..b], a^2 + b^2 == (c^2 -1), (a + b + c) <= 75000000]
You can make it better, but that should literally be 75 million times faster.
Less certain about this refactoring, but it should also speed things up considerably:
prob39 = length [ (a, b, c) | a <- [1..25000000], b <-[a..(75000000 - 2*a)], c <- [b..(75000000 - a - b)], a^2 + b^2 == (c^2 -1)]
Syntax may not be 100% there. The idea is that a can only be 1 to 25 million (since a <= b <= c and a + b + c <= 75 million). b can only be between a and halfway from a to 75 million (since b <= c) and c can only be from b to 75 million - (a + b), otherwise the perimeter would be over 75 million.
Edit: updated code snippets, there were a couple of bugs in there.
Another quick suggestion, you can replace c <- [b..(75000000 - a - b)] with something along the lines of c <- [b..min((75000000 - a - b), sqrt(aa + bb) + 1)]. There's no need to bother checking any values of c greater than the ceiling of the square root of (a^2 + b^2). Can't remember if those are the correct min/sqrt function names in haskell though.
Getting OCD on this one, I have a couple more suggestions.
1) you can set the upper bound on b to be the min of the current upper bound and a^2 * 2 + 1. This is based on the principle that (x+1)^2 - x^2 = 2x + 1. b cannot be so much larger than a that we can guarantee that (a^2) + (b^2) < (b+1)^2.
2) set the lower bound of c to be max of b + 1 and floor(sqrt(a^2 + b^2) - 1). Just like the upper limit on C, no need to test values which couldn't possibly be correct.

Along with the suggestions given #patros.
I would like to share my observations on this problem.
If we print the values of a , b and c for some perimeter say 100000, then we can observe that a and b always take even values and c always take odd values. So if we optimize our code with these restrictions then almost half the checking can be skipped.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

String manipulation with dynamic programming - string

Related

how can I find the missing integers when given a list of integers that can be repeated C times cannot add up to value V

How can I solve this classical dynamic programming problem?

Explanation of normalized edit distance formula

column and row join a second box data

Is there a way to optimise this program in Haskell?

Categories

Resources