Time Complexity of String Subsequence Recursion - string

How to calculate the time complexity of the algorithm below? Can someone please explain to me briefly:
public static void print(String prefix, String remaining, int k) {
if (k == 0) {
StdOut.println(prefix);
return;
}
if (remaining.length() == 0) return;
print(prefix + remaining.charAt(0), remaining.substring(1), k-1);
print(prefix, remaining.substring(1), k);
}
public static void main(String[] args) {
String s = "abcdef";
int k = 3;
print("", s, k);
}

Suppose m is the length of prefix, and n is the length of remaining. Then the complexity is given by
T(m, n, k) = Θ(m + n) + T(m + 1, n - 1, k - 1) + T(m, n - 1, k).
The Θ(m + n) term stems from
prefix + remaining.charAt(0), remaining.substring(1)
which, in general will require creating two new strings of lengths about m and n, respectively (this might differ among various implementations).
Beyond that, it's pretty difficult to solve (at least for me), except for some very simple bounds. E.g., it's pretty clear that the complexity is at least exponential in the minimum of the length of the prefix and k, since
T(m, n, k) ≥ 2 T(m, n - 1, k - 1) ↠ T(m, n, k) = Ω(2min(n, k)).

Introduction
As the body is O(1), or at least can be rewritten as O(1), we only have to look for how many time the function is called. So the time complexity of this algorithm will be how many times the function will be called in relation to length of input word and length of output prefix.
n - length of input word
k - length of prefix being searched for
I have never done something like this before and common methods for finding time complexity for recursive methods that I know of don't work in this case. I started off by looking how many calls to the function were made depending on n and k to see if I can spot any patterns that might help me.
Gathering data
Using this code snippet (sorry for ugly code):
public static String word = "abcdefghij";
public static int wordLength = word.length();
public static int limit = 10;
public static int access = 0;
System.out.printf("Word length : %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d\n",0,1,2,3,4,5,6,7,8,9,10);
System.out.printf("-----------------------------------------------------------------------------------------------\n");
for(int k = 0; k <= limit; k++) {
System.out.printf("k : %2d - results :", k);
for(int i = 0; i <= limit; i++) {
print("", word.substring(0,i), k);
System.out.printf(", %5d", access);
access=0;
}
System.out.print("\n");
}
print(prefix, remaining, k) {
access++;
... rest of code...
}
From this I got :
Word length : 0 1 2 3 4 5 6 7 8 9 10
-----------------------------------------------------------------------------------------------
k : 0 - results :, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
k : 1 - results :, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21
k : 2 - results :, 1, 3, 7, 13, 21, 31, 43, 57, 73, 91, 111
k : 3 - results :, 1, 3, 7, 15, 29, 51, 83, 127, 185, 259, 351
k : 4 - results :, 1, 3, 7, 15, 31, 61, 113, 197, 325, 511, 771
k : 5 - results :, 1, 3, 7, 15, 31, 63, 125, 239, 437, 763, 1275
k : 6 - results :, 1, 3, 7, 15, 31, 63, 127, 253, 493, 931, 1695
k : 7 - results :, 1, 3, 7, 15, 31, 63, 127, 255, 509, 1003, 1935
k : 8 - results :, 1, 3, 7, 15, 31, 63, 127, 255, 511, 1021, 2025
k : 9 - results :, 1, 3, 7, 15, 31, 63, 127, 255, 511, 1023, 2045
k : 10 - results :, 1, 3, 7, 15, 31, 63, 127, 255, 511, 1023, 2047
At least one call is made in every case as it is the initial call. From here we see that everything on and below main diagonal is called 2min(n, k) + 1 - 1 times, like mentioned by others. That is number of nodes in binary tree.
Each time a the print method is called, two new ones will be called - making a binary tree.
Things get more confusing above the main diagonal though and I failed to see any common pattern.
Graph representation of algorithm
To make it more visual I used graphviz (online version).
Here is code snippet that generates code for given n and k for graphviz (green nodes are ones from where solution was found):
public static String word = "abcde";
public static int wordLength = word.length();
public static int limit = 3;
public static void main(String[] args) {
String rootNode = "\"prefix|remaining|k\"";
StringBuilder graph = new StringBuilder("digraph G { \n node [style=filled];");
print("", word, limit, graph, rootNode);
graph.append("\n\"prefix|remaining|k\" [shape=Mdiamond];\n}");
System.out.println(graph);
}
public static void print(String prefix, String remaining, int k, StringBuilder sb, String parent) {
String currentNode = "\"" + prefix + "|" + (remaining.isEmpty() ? 0 : remaining) + "|" + k + "\"";
sb.append("\n " + parent + "->" + currentNode + ";");
if(k == 0) {
sb.append("\n " + currentNode + "[color=darkolivegreen3];");
return;
}
if (remaining.length() == 0)return;
print(prefix + remaining.charAt(0), remaining.substring(1), k - 1, sb,currentNode);
print(prefix, remaining.substring(1), k, sb, currentNode);
}
Example of graph (n=5, k=3):
digraph G {
node [style=filled];
"prefix|remaining|k"->"|abcde|3";
"|abcde|3"->"a|bcde|2";
"a|bcde|2"->"ab|cde|1";
"ab|cde|1"->"abc|de|0";
"abc|de|0"[color=darkolivegreen3];
"ab|cde|1"->"ab|de|1";
"ab|de|1"->"abd|e|0";
"abd|e|0"[color=darkolivegreen3];
"ab|de|1"->"ab|e|1";
"ab|e|1"->"abe|0|0";
"abe|0|0"[color=darkolivegreen3];
"ab|e|1"->"ab|0|1";
"a|bcde|2"->"a|cde|2";
"a|cde|2"->"ac|de|1";
"ac|de|1"->"acd|e|0";
"acd|e|0"[color=darkolivegreen3];
"ac|de|1"->"ac|e|1";
"ac|e|1"->"ace|0|0";
"ace|0|0"[color=darkolivegreen3];
"ac|e|1"->"ac|0|1";
"a|cde|2"->"a|de|2";
"a|de|2"->"ad|e|1";
"ad|e|1"->"ade|0|0";
"ade|0|0"[color=darkolivegreen3];
"ad|e|1"->"ad|0|1";
"a|de|2"->"a|e|2";
"a|e|2"->"ae|0|1";
"a|e|2"->"a|0|2";
"|abcde|3"->"|bcde|3";
"|bcde|3"->"b|cde|2";
"b|cde|2"->"bc|de|1";
"bc|de|1"->"bcd|e|0";
"bcd|e|0"[color=darkolivegreen3];
"bc|de|1"->"bc|e|1";
"bc|e|1"->"bce|0|0";
"bce|0|0"[color=darkolivegreen3];
"bc|e|1"->"bc|0|1";
"b|cde|2"->"b|de|2";
"b|de|2"->"bd|e|1";
"bd|e|1"->"bde|0|0";
"bde|0|0"[color=darkolivegreen3];
"bd|e|1"->"bd|0|1";
"b|de|2"->"b|e|2";
"b|e|2"->"be|0|1";
"b|e|2"->"b|0|2";
"|bcde|3"->"|cde|3";
"|cde|3"->"c|de|2";
"c|de|2"->"cd|e|1";
"cd|e|1"->"cde|0|0";
"cde|0|0"[color=darkolivegreen3];
"cd|e|1"->"cd|0|1";
"c|de|2"->"c|e|2";
"c|e|2"->"ce|0|1";
"c|e|2"->"c|0|2";
"|cde|3"->"|de|3";
"|de|3"->"d|e|2";
"d|e|2"->"de|0|1";
"d|e|2"->"d|0|2";
"|de|3"->"|e|3";
"|e|3"->"e|0|2";
"|e|3"->"|0|3";
"prefix|remaining|k" [shape=Mdiamond];
}
Number of nodes cut from binary tree
From the example where n = 5 and k = 3 we can see that tree of height 3 and three trees of height 2 were cut off. As we visited each of these trees root nodes we get number of nodes cut from complete binary tree to be 1*(23 - 2) + 3*(22 - 2) = 12
If it were a full binary tree : 25 + 1 - 1 = 63
Number of nodes(calls made to function "print") then comes to 63 - 12 = 51
Result matches one we got by calculating number of calls made to function when n = 5 and k = 3.
Now we have to find how many and how big parts of tree are cut off for every n and k.
From here on I will refer to method call print(prefix + remaining.charAt(0), remaining.substring(1), k-1); as left path or left node (as it is in graphviz graphs) and to print(prefix, remaining.substring(1), k); as right path or right node.
We can see the first, and biggest tree is cut when we go left k times and the height of the tree will be n - k + 1. ( + 1 because we visit the root of the tree we cut).
We can see that every time we take left path k times we get to result no matter how many right paths we took before (or in what order). This is unless the word runs out of letters before we get the k left paths. So we can make maximum of n - k right turns.
Lets take a closer look at example where n = 5 and k = 3:
L - left path
R - right path
The first tree cut we took the paths :
LLL
The next highest trees that will be cut will be ones where we take only one right node, possible combinations are :
RLLL, LRLL, LLRL, LLLR -> three trees of height 2 cut
Here we must note that LLLR is already cut as LLL gave solution in previous step.
To get number of next trees (height 1 -> 0 nodes cut) we'll calculate the possible combinations of two rights and three lefts subtracting already visited paths.
Combinations(5,3) - Combinations(4,3) = 10 - 4 = 6 nodes of height 1
We can see the numbers match green nodes on the example graph.
C(n,k) - combinations of k from n
f(n,k) - number binary tree nodes not visited by algorithm
f(n,k) = (2n-k+1-2) + Σn-ki=1(2n-k-i+1-2)(C(k+i,k) - C(k+i-1,k))
Explanation:
(2n-k+1-2) - the highest tree cut, have to bring it out of summation or we'll have to take negative factorials
Σn-ki=1 - sum of all nodes cut, excluding highest tree as it is already added. (We start adding larger trees first)
(2n-k-i+1-2) - number of nodes cut per tree. n-k+1 is the largest tree, then working down from there up to tree with height "n-k-(n-k)+1 = 1"
(C(k+i,k) - C(k+i-1,k)) - find how many trees of given height there is. First find all possible paths (lefts and rights) and then subtract already visited ones(in previous steps).
Looks awful, but it can be simplified if we assume that k != 0 (If we don't assume that there will be factorials of negative numbers - which is undefined)
Simplified function:
f(n,k) = Σn-ki=0(2n-k-i+1-2)*C(k+i-1,k-1)
Evaluation of accurate time complexity
The time complexity of the function:
O(2n- Σn-ki=0(2n-k-i+1-2)*C(k+i-1,k-1))
Now this looks awful and doesn't give much information. I don't know how to simplify it any further. I've asked about it here. No answer so far though.
But is it even worth considering the f(n,k) part? Probably depends on particular application where it is applied. We can see from the data table that it can considerably affect the algorithms calls depending on the choice of n and k.
To see more visually how much the extra part affects complexity I plotted best time complexity and real complexity on graph.
O(2n- Σn-ki=0(2n-k-i+1-2)*C(k+i-1,k-1)) is the colorful surface.
B(2min(n,k)) is the green surface.
We can see that B(2min(n,k)) overestimates (tells it works much better than it actually does) the function complexity by quite much. It is usually useful to look algorithms worst case complexity which is W(2max(n,k))
O(2n- Σn-ki=0(2n-k-i+1-2)*C(k+i-1,k-1)) is the colorful surface.
B(2min(n,k)) is the green surface.
W(2max(n,k)) is the yellow surface.
Conclusion
Best case complexity: B(2min(n,k))
Accurate complexity : O(2n- Σn-ki=0(2n-k-i+1-2)*C(k+i-1,k-1))
Worst case complexity: W(2max(n,k)) -> often noted as O(2max(n,k))
In my opinion worst case complexity should be used to evaluate the function as accurate is too complex to understand what it means without analyzing it further. I wouldn't use best case complexity because it leaves too much to chance. Unfortunately I can't calculate the average complexity for this. Depending on application of the algorithm, using average might be better for algorithm evaluation.

Suppose m is the length of prefix, and n is the length of remaining. Then the complexity is given by
T(m, n, k) = 1 + n + 1 + T(m + 1, n - 1, k - 1) + T(m, n - 1, k)
Obviously, the function stop when n=0 or k=0. So,
T(r, n, 0) = 1 + r
T(m, 0, k) = 1 + 1 + 1 = 3
Reform equation 1, we got
T(m, n, k) - T(m, n - 1, k) = 2 + n + T(m + 1, n - 1, k - 1)
Replace n by n-1 in equation 1
T(m, n - 1, k) - T(m, n - 2, k) = 2 + (n - 1) + T(m + 1, n - 2, k - 1)
... continue ...
T(m, 1, k) - T(m, 0, k) = 2 + (1) + T(m + 1, 0, k - 1)
Sum them up
T(m, n, k) - T(m, 0, k) = 2(n) + (n-1)(n)/2 + {Summation of a from 0 to n - 1 on T(m + 1, a, k - 1)}
Reform
T(m, n, k) = n2/2 + 3n/2 +3 + {Summation of a from 0 to n - 1 on T(m + 1, a, k - 1)}
I guess we can get the answer by solving the Summation by using the last equation and the leading factor of the equation would be something like nk+1

Related

Create a dictionary of subcubes from larger cube in Python

I am examining every contiguous 8 x 8 x 8 cube within a 50 x 50 x 50 cube. I am trying to create a collection (in this case a dictionary) of the subcubes that contain the same sum and a count of how many subcubes share that same sum. So in essence, the result would look something like this:
{key = sum, value = number of cubes that have the same sum}
{256 : 3, 119 : 2, ...}
So in this example, there are 3 cubes that sum to 256 and 2 cubes that sum to 119, etc. Here is the code I have thus far, but it only sums (at least I think it does):
an_array = np.array([i for i in range(500)])
cube = np.reshape(an_array, (8, 8, 8))
c_size = 8 # cube size
sum = 0
idx = None
for i in range(cube.shape[0] - cs + 2):
for j in range(cube.shape[1] - cs + 2):
for k in range(cube.shape[2] - cs + 2):
cube_sum = np.sum(cube[i:i + cs, j:j + cs, k:k + cs])
new_list = {cube_sum : ?}
What I am trying to make this do is iterate the cube within cubes, sum all cubes then count the cubes that share the same sum. Any ideas would be appreciated.
from collections import defaultdict
an_array = np.array([i for i in range(500)])
cube = np.reshape(an_array, (8, 8, 8))
c_size = 8 # cube size
sum = 0
idx = None
result = defaultdict(int)
for i in range(cube.shape[0] - cs + 2):
for j in range(cube.shape[1] - cs + 2):
for k in range(cube.shape[2] - cs + 2):
cube_sum = np.sum(cube[i:i + cs, j:j + cs, k:k + cs])
result[cube_sum] += 1
Explanation
The defaultdict(int), can be read as a result.get(key, 0). Which means that if a key doesn't exists it will be initialized with 0. So the line result[cube_sum] += 1, will either contain 1, or add 1 to the current number of cube_sum.

How to track down the mistake in this merge function?

I can't figure out what the problem with this merge sort implementation is. I've confirmed the problem is in the merge function rather than merge_sort by replacing merge with the same function from some examples found online and it works fine, however I can't find the mistake in my implementation.
Expected result: list sorted in order from smallest to largest.
Actual result: left side of list modified (not in order) and right side unmodified.
I've tried adding print statements at various points in the program and it looks like the problem is related to rightList not being created properly but I can't figure out why.
What can I do to track down the cause of this?
Code:
def merge_sort(toSort, left, right):
# check if we have more than one remaining element
if left >= right:
return
# get middle of array, note the result needs to be an int
mid = (left + right) // 2
# call merge sort on the left and right sides of the list
merge_sort(toSort, left, mid)
merge_sort(toSort, mid+1, right)
# merge the results
merge(toSort, left, right, mid)
# merge function taking a list along with the positions
# of the start, middle and end
def merge(toSort, left, right, mid):
# split the list into two separate lists based on the mid position
leftList = toSort[left:mid+1]
rightList = toSort[mid+1:right+1]
# variables to track position in left and right lists and the sorted list
lIndex = 0
rIndex = 0
sIndex = lIndex
# while there are remaining elements in both lists
while lIndex < len(leftList) and rIndex < len(rightList):
#if the left value is less than or equal to the right value add it to position sIndex in toSort
# and move lIndex to next position
if leftList[lIndex] <= rightList[rIndex]:
toSort[sIndex] = leftList[lIndex]
lIndex = lIndex + 1
# otherwise set sIndex to right value and move rIndex to next position
else:
toSort[sIndex] = rightList[rIndex]
rIndex = rIndex + 1
sIndex = sIndex + 1
# add the remaining elements from either leftList or rightList
while lIndex < len(leftList):
toSort[sIndex] = leftList[lIndex]
lIndex = lIndex + 1
sIndex = sIndex + 1
while rIndex < len(rightList):
toSort[sIndex] = rightList[rIndex]
rIndex = rIndex + 1
sIndex = sIndex + 1
unsorted = [33, 42, 9, 37, 8, 47, 5, 29, 49, 31, 4, 48, 16, 22, 26]
print(unsorted)
merge_sort(unsorted, 0, len(unsorted) - 1)
print(unsorted)
Output:
[33, 42, 9, 37, 8, 47, 5, 29, 49, 31, 4, 48, 16, 22, 26]
[16, 22, 26, 49, 31, 4, 48, 29, 49, 31, 4, 48, 16, 22, 26]
Edit
Link to example of code in colab: https://colab.research.google.com/drive/1z5ouu_aD1QM0unthkW_ZGkDlrnPNElxm?usp=sharing
The index variable into toSort, sIndex should be initialized to left instead of 0.
Also note that it would be more readable and consistent to pass right as 1 + the index of the last element of the slice, which is consistent with the slice notation in python and would remove the +1/-1 adjustments here and there. The convention where right is included is taught in java classes, but it is error prone and does not allow for empty slices.
Using simpler index variable names would help readability, especially with more indent space.
Here is a modified version:
# sort the elements in toSort[left:right]
def merge_sort(toSort, left, right):
# check if we have more than one remaining element
if right - left < 2:
return
# get middle of array, note: the result needs to be an int
mid = (left + right) // 2
# call merge sort on the left and right sides of the list
merge_sort(toSort, left, mid)
merge_sort(toSort, mid, right)
# merge the results
merge(toSort, left, mid, right)
# merge function taking a list along with the positions
# of the start, middle and end, in this order.
def merge(toSort, left, mid, right):
# split the list into two separate lists based on the mid position
leftList = toSort[left : mid]
rightList = toSort[mid : right]
# variables to track position in left and right lists and the sorted list
i = 0
j = 0
k = left
# while there are remaining elements in both lists
while i < len(leftList) and j < len(rightList):
# if the left value is less than or equal to the right value add it to position k in toSort
# and move i to next position
if leftList[i] <= rightList[j]:
toSort[k] = leftList[i]
i += 1
# otherwise set it to right value and move j to next position
else:
toSort[k] = rightList[j]
j += 1
k += 1
# add the remaining elements from either leftList or rightList
while i < len(leftList):
toSort[k] = leftList[i]
i += 1
k += 1
while j < len(rightList):
toSort[k] = rightList[j]
j += 1
k += 1
unsorted = [33, 42, 9, 37, 8, 47, 5, 29, 49, 31, 4, 48, 16, 22, 26]
print(unsorted)
merge_sort(unsorted, 0, len(unsorted))
print(unsorted)

Optimization of CodeWars Python 3.6 code: Integers: Recreation One

I need help optimizing my python 3.6 code for the CodeWars Integers: Recreation One Kata.
We are given a range of numbers and we have to return the number and the sum of the divisors squared that is a square itself.
"Divisors of 42 are : 1, 2, 3, 6, 7, 14, 21, 42. These divisors squared are: 1, 4, 9, 36, 49, 196, 441, 1764. The sum of the squared divisors is 2500 which is 50 * 50, a square!
Given two integers m, n (1 <= m <= n) we want to find all integers between m and n whose sum of squared divisors is itself a square. 42 is such a number."
My code works for individual tests, but it times out when submitting:
def list_squared(m, n):
sqsq = []
for i in range(m, n):
divisors = [j**2 for j in range(1, i+1) if i % j == 0]
sq_divs = sum(divisors)
sq = sq_divs ** (1/2)
if int(sq) ** 2 == sq_divs:
sqsq.append([i, sq_divs])
return sqsq
You can reduce complexity of loop in list comprehension from O(N) to O(Log((N)) by setting the max range to sqrt(num)+1 instead of num.
By looping from 1 to sqrt(num)+1, we can conclude that if i (current item in the loop) is a factor of num then num divided by i must be another one.
Eg: 2 is a factor of 10, so is 5 (10/2)
The following code passes all the tests:
import math
def list_squared(m, n):
result = []
for num in range(m, n + 1):
divisors = set()
for i in range(1, int(math.sqrt(num)+1)):
if num % i == 0:
divisors.add(i**2)
divisors.add(int(num/i)**2)
total = sum(divisors)
sr = math.sqrt(total)
if sr - math.floor(sr) == 0:
result.append([num, total])
return result
It's more the math issue. Two maximum divisors for i is i itself and i/2. So you can speed up the code twice just using i // 2 + 1 as the range stop instead of i + 1. Just don't forget to increase sq_divs for i ** 2.
You may want to get some tiny performance improvements excluding sq variable and sq_divs ** (1/2).
BTW you should use n+1 stop in the first range.
def list_squared(m, n):
sqsq = []
for i in range(m, n+1):
divisors = [j * j for j in range(1, i // 2 + 1 #speed up twice
) if i % j == 0]
sq_divs = sum(divisors)
sq_divs += i * i #add i as divisor
if ((sq_divs) ** 0.5) % 1 == 0: #tiny speed up here
sqsq.append([i, sq_divs])
return sqsq
UPD: I've tried the Kata and it's still timeout. So we need even more math! If i could be divided by j then it's also could be divided by i/j so we can use sqrt(i) (int(math.sqrt(i)) + 1)) as the range stop. if i % j == 0 then append j * j to divisors array. AND if i / j != j then append (i / j) ** 2.

Dynamic programming table - Finding the minimal cost to break a string

A certain string-processing language offers a primitive operation
which splits a string into two pieces. Since this operation involves
copying the original string, it takes n units of time for a string of
length n, regardless of the location of the cut. Suppose, now, that
you want to break a string into many pieces.
The order in which the breaks are made can affect the total running
time. For example, suppose we wish to break a 20-character string (for
example "abcdefghijklmnopqrst") after characters at indices 3, 8, and
10 to obtain for substrings: "abcd", "efghi", "jk" and "lmnopqrst". If
the breaks are made in left-right order, then the first break costs 20
units of time, the second break costs 16 units of time and the third
break costs 11 units of time, for a total of 47 steps. If the breaks
are made in right-left order, the first break costs 20 units of time,
the second break costs 11 units of time, and the third break costs 9
units of time, for a total of only 40 steps. However, the optimal
solution is 38 (and the order of the cuts is 10, 3, 8).
The input is the length of the string and an ascending-sorted array with the cut indexes. I need to design a dynamic programming table to find the minimal cost to break the string and the order in which the cuts should be performed.
I can't figure out how the table structure should look (certain cells should be the answer to certain sub-problems and should be computable from other entries etc.). Instead, I've written a recursive function to find the minimum cost to break the string: b0, b1, ..., bK are the indexes for the cuts that have to be made to the (sub)string between i and j.
totalCost(i, j, {b0, b1, ..., bK}) = j - i + 1 + min {
totalCost(b0 + 1, j, {b1, b2, ..., bK}),
totalCost(i, b1, {b0 }) + totalCost(b1 + 1, j, {b2, b3, ..., bK}),
totalCost(i, b2, {b0, b1 }) + totalCost(b2 + 1, j, {b3, b4, ..., bK}),
....................................................................................
totalCost(i, bK, {b0, b1, ..., b(k - 1)})
} if k + 1 (the number of cuts) > 1,
j - i + 1 otherwise.
Please help me figure out the structure of the table, thanks!
For example we have a string of length n = 20 and we need to break it in positions cuts = [3, 8, 10]. First of all let's add two fake cuts to our array: -1 and n - 1 (to avoid edge cases), now we have cuts = [-1, 3, 8, 10, 19]. Let's fill table M, where M[i, j] is a minimum units of time to make all breaks between i-th and j-th cuts. We can fill it by rule: M[i, j] = (cuts[j] - cuts[i]) + min(M[i, k] + M[k, j]) where i < k < j. The minimum time to make all cuts will be in the cell M[0, len(cuts) - 1]. Full code in python:
# input
n = 20
cuts = [3, 8, 10]
# add fake cuts
cuts = [-1] + cuts + [n - 1]
cuts_num = len(cuts)
# init table with zeros
table = []
for i in range(cuts_num):
table += [[0] * cuts_num]
# fill table
for diff in range(2, cuts_num):
for start in range(0, cuts_num - diff):
end = start + diff
table[start][end] = 1e9
for mid in range(start + 1, end):
table[start][end] = min(table[start][end], table[
start][mid] + table[mid][end])
table[start][end] += cuts[end] - cuts[start]
# print result: 38
print(table[0][cuts_num - 1])
Just in case you may feel easier to follow when everything is 1-based (same as DPV Dasgupta Algorithm book problem 6.9, and same as UdaCity Graduate Algorithm course initiated by GaTech), following is the python code that does the equivalent thing with the previous python code by Jemshit and Aleksei. It follows the chain multiply (binary tree) pattern as taught in the video lecture.
import numpy as np
# n is string len, P is of size m where P[i] is the split pos that split string into [1,i] and [i+1,n] (1-based)
def spliting_cost(P, n):
P = [0,] + P + [n,] # make sure pos list contains both ends of string
m = len(P)
P = [0,] + P # both C and P are 1-base indexed for easy reading
C = np.full((m+1,m+1), np.inf)
for i in range(1, m+1): C[i, i:i+2] = 0 # any segment <= 2 does not need split so is zero cost
for s in range(2, m): # s is split string len
for i in range(1, m-s+1):
j = i + s
for k in range(i, j+1):
C[i,j] = min(C[i,j], P[j] - P[i] + C[i,k] + C[k,j])
return C[1,m]
spliting_cost([3, 5, 10, 14, 16, 19], 20)
The output answer is 55, same as that with split points [2, 4, 9, 13, 15, 18] in the previous algorithm.

String concatenation queries

I have a list of characters, say x in number, denoted by b[1], b[2], b[3] ... b[x]. After x,
b[x+1] is the concatenation of b[1],b[2].... b[x] in that order. Similarly,
b[x+2] is the concatenation of b[2],b[3]....b[x],b[x+1].
So, basically, b[n] will be concatenation of last x terms of b[i], taken left from right.
Given parameters as p and q as queries, how can I find out which character among b[1], b[2], b[3]..... b[x] does the qth character of b[p] corresponds to?
Note: x and b[1], b[2], b[3]..... b[x] is fixed for all queries.
I tried brute-forcing but the string length increases exponentially for large x.(x<=100).
Example:
When x=3,
b[] = a, b, c, a b c, b c abc, c abc bcabc, abc bcabc cabcbcabc, //....
//Spaces for clarity, only commas separate array elements
So for a query where p=7, q=5, answer returned would be 3(corresponding to character 'c').
I am just having difficulty figuring out the maths behind it. Language is no issue
I wrote this answer as I figured it out, so please bear with me.
As you mentioned, it is much easier to find out where the character at b[p][q] comes from among the original x characters than to generate b[p] for large p. To do so, we will use a loop to find where the current b[p][q] came from, thereby reducing p until it is between 1 and x, and q until it is 1.
Let's look at an example for x=3 to see if we can get a formula:
p N(p) b[p]
- ---- ----
1 1 a
2 1 b
3 1 c
4 3 a b c
5 5 b c abc
6 9 c abc bcabc
7 17 abc bcabc cabcbcabc
8 31 bcabc cabcbcabc abcbcabccabcbcabc
9 57 cabcbcabc abcbcabccabcbcabc bcabccabcbcabcabcbcabccabcbcabc
The sequence is clear: N(p) = N(p-1) + N(p-2) + N(p-3), where N(p) is the number of characters in the pth element of b. Given p and x, you can just brute-force compute all the N for the range [1, p]. This will allow you to figure out which prior element of b b[p][q] came from.
To illustrate, say x=3, p=9 and q=45.
The chart above gives N(6)=9, N(7)=17 and N(8)=31. Since 45>9+17, you know that b[9][45] comes from b[8][45-(9+17)] = b[8][19].
Continuing iteratively/recursively, 19>9+5, so b[8][19] = b[7][19-(9+5)] = b[7][5].
Now 5>N(4) but 5<N(4)+N(5), so b[7][5] = b[5][5-3] = b[5][2].
b[5][2] = b[3][2-1] = b[3][1]
Since 3 <= x, we have our termination condition, and b[9][45] is c from b[3].
Something like this can very easily be computed either recursively or iteratively given starting p, q, x and b up to x. My method requires p array elements to compute N(p) for the entire sequence. This can be allocated in an array or on the stack if working recursively.
Here is a reference implementation in vanilla Python (no external imports, although numpy would probably help streamline this):
def so38509640(b, p, q):
"""
p, q are integers. b is a char sequence of length x.
list, string, or tuple are all valid choices for b.
"""
x = len(b)
# Trivial case
if p <= x:
if q != 1:
raise ValueError('q={} out of bounds for p={}'.format(q, p))
return p, b[p - 1]
# Construct list of counts
N = [1] * p
for i in range(x, p):
N[i] = sum(N[i - x:i])
print('N =', N)
# Error check
if q > N[-1]:
raise ValueError('q={} out of bounds for p={}'.format(q, p))
print('b[{}][{}]'.format(p, q), end='')
# Reduce p, q until it is p < x
while p > x:
# Find which previous element character q comes from
offset = 0
for i in range(p - x - 1, p):
if i == p - 1:
raise ValueError('q={} out of bounds for p={}'.format(q, p))
if offset + N[i] >= q:
q -= offset
p = i + 1
print(' = b[{}][{}]'.format(p, q), end='')
break
offset += N[i]
print()
return p, b[p - 1]
Calling so38509640('abc', 9, 45) produces
N = [1, 1, 1, 3, 5, 9, 17, 31, 57]
b[9][45] = b[8][19] = b[7][5] = b[5][2] = b[3][1]
(3, 'c') # <-- Final answer
Similarly, for the example in the question, so38509640('abc', 7, 5) produces the expected result:
N = [1, 1, 1, 3, 5, 9, 17]
b[7][5] = b[5][2] = b[3][1]
(3, 'c') # <-- Final answer
Sorry I couldn't come up with a better function name :) This is simple enough code that it should work equally well in Py2 and 3, despite differences in the range function/class.
I would be very curious to see if there is a non-iterative solution for this problem. Perhaps there is a way of doing this using modular arithmetic or something...

Resources