What is the time complexity for repeatedly doubling a string? - string

Consider the following piece of C++ code:
string s = "a";
for (int i = 0; i < n; i++) {
s = s + s; // Concatenate s with itself.
}
Usually, when analyzing the time complexity of a piece of code, we would determine how much work the inner loop does, then multiply it by the number of times the outer loop runs. However, in this case, the amount of work done by the inner loop varies from iteration to iteration, since the string being built up gets longer and longer.
How would you analyze this code to get the big-O time complexity?

The time complexity of this function is Θ(2n). To see why this is, let's look at what the function does, then see how to analyze it.
For starters, let's trace through the loop for n = 3. Before iteration 0, the string s is the string "a". Iteration 0 doubles the length of s to make s = "aa". Iteration 1 doubles the length of s to make s = "aaaa". Iteration 2 then doubles the length of s to make s = "aaaaaaaa".
If you'll notice, after k iterations of the loop, the length of the string s is 2k. This means that each iteration of the loop will take longer and longer to complete, because it will take more and more work to concatenate the string s with itself. Specifically, the kth iteration of the loop will take time Θ(2k) to complete, because the loop iteration constructs a string of size 2k+1.
One way that we could analyze this function would be to multiply the worst-case time complexity of the inner loop by the number of loop iterations. Since each loop iteration takes time O(2n) to finish and there are n loop iterations, we would get that this code takes time O(n · 2n) to finish.
However, it turns out that this analysis is not very good, and in fact will overestimate the time complexity of this code. It is indeed true that this code runs in time O(n · 2n), but remember that big-O notation gives an upper bound on the runtime of a piece of code. This means that the growth rate of this code's runtime is no greater than the growth rate of n · 2n, but it doesn't mean that this is a precise bound. In fact, if we look at the code more precisely, we can get a better bound.
Let's begin by trying to do some better accounting for the work done. The work in this loop can be split apart into two smaller pieces:
The work done in the header of the loop, which increments i and tests whether the loop is done.
The work done in the body of the loop, which concatenates the string with itself.
Here, when accounting for the work in these two spots, we will account for the total amount of work done across all iterations, not just in one iteration.
Let's look at the first of these - the work done by the loop header. This will run exactly n times. Each time, this part of the code will do only O(1) work incrementing i, testing it against n, and deciding whether to continue with the loop. Therefore, the total work done here is Θ(n).
Now let's look at the loop body. As we saw before, iteration k creates a string of length 2k+1 on iteration k, which takes time roughly 2k+1. If we sum this up across all iterations, we get that the work done is (roughly speaking)
21 + 22 + 23 + ... + 2n+1.
So what is this sum? Previously, we got a bound of O(n · 2n) by noting that
21 + 22 + 23 + ... + 2n+1.
< 2n+1 + 2n+1 + 2n+1 + ... + 2n+1
= n · 2n+1 = 2(n · 2n) = Θ(n · 2n)
However, this is a very weak upper bound. If we're more observant, we can recognize the original sum as the sum of a geometric series, where a = 2 and r = 2. Given this, the sum of these terms can be worked out to be exactly
2n+2 - 2 = 4(2n) - 2 = Θ(2n)
In other words, the total work done by the body of the loop, across all iterations, is Θ(2n).
The total work done by the loop is given by the work done in the loop maintenance plus the work done in the body of the loop. This works out to Θ(2n) + Θ(n) = Θ(2n). Therefore, the total work done by the loop is Θ(2n). This grows very quickly, but nowhere near as rapidly as O(n · 2n), which is what our original analysis gave us.
In short, when analyzing a loop, you can always get a conservative upper bound by multiplying the number of iterations of the loop by the maximum work done on any one iteration of that loop. However, doing a more precisely analysis can often give you a much better bound.
Hope this helps!

Related

How can I find Time complexity for "while loop" and "power statement"

I'm trying to find the time complexity of two python statements:
while loops: I understand how to find the complexity class of for loops, but when it comes to while loops the case is completely different how can I stop here? the condition controls the loop...
power statement: is the time complexity affected by the pow function?
Here is an example of a program statement:
Upper=100
n =random.randrange(0, Upper)
while gcd(n, Upper) != 1:
n = random. randrange(0, Upper) # question 1
pow(c, n - 1, n) # (c ^n-1 mod n) question 2
#where n is a large prime number
The time complexity of the while loop is the number of iterations times the complexities of the loop test plus loop body. (More generally, in case these complexities vary from iteration to iteration, you must consider the sum of the complexities across iterations.)
For the pow function, there is no single answer. Using a fixed-length representation, you may assume O(1) complexity, though is some contexts it could be O(log e) where e is the exponent.

What is the time complexity of this agorithm (that solves leetcode question 650) (question 2)?

Hello I have been working on https://leetcode.com/problems/2-keys-keyboard/ and came upon this dynamic programming question.
You start with an 'A' on a blank page and you get a number n when you are done you should have n times 'A' on the page. The catch is you are allowed only 2 operations copy (and you can only copy the total amount of A's currently on the page) and paste --> find the minimum number of operations to get n 'A' on the page.
I solved this problem but then found a better solution in the discussion section of leetcode --> and I can't figure out it's time complexity.
def minSteps(self, n):
factors = 0
i=2
while i <= n:
while n % i == 0:
factors += i
n /= i
i+=1
return factors
The way this works is i is never gonna be bigger than the biggest prime factor p of n so the outer loop is O(p) and the inner while loop is basically O(logn) since we are dividing n /= i at each iteration.
But the way I look at it we are doing O(logn) divisions in total for the inner loop while the outer loop is O(p) so using aggregate analysis this function is basically O(max(p, logn)) is this correct ?
Any help is welcome.
Your reasoning is correct: O(max(p, logn)) gives the time complexity, assuming that arithmetic operations take constant time. This assumption is not true for arbitrary large n, that would not fit in the machine's fixed-size number storage, and where you would need Big-Integer operations that have non-constant time complexity. But I will ignore that.
It is still odd to express the complexity in terms of p when that is not the input (but derived from it). Your input is only n, so it makes sense to express the complexity in terms of n alone.
Worst Case
Clearly, when n is prime, the algorithm is O(n) -- the inner loop never iterates.
For a prime n, the algorithm will take more time than for n+1, as even the smallest factor of n+1 (i.e. 2), will halve the number of iterations of the outer loop, and yet only add 1 block of constant work in the inner loop.
So O(n) is the worst case.
Average Case
For the average case, we note that the division of n happens just as many times as n has prime factors (counting duplicates). For example, for n = 12, we have 3 divisions, as n = 2·2·3
The average number of prime factors for 1 < n < x approaches loglogn + B, where B is some constant. So we could say the average time complexity for the total execution of the inner loop is O(loglogn).
We need to add to that the execution of the outer loop. This corresponds to the average greatest prime factor. For 1 < n < x this average approaches C.n/logn, and so we have:
O(n/logn + loglogn)
Now n/logn is the more important term here, so this simplifies to:
O(n/logn)

How does Duval's algorithm handle odd-length strings?

Finding the Lexicographically minimal string rotation is a well known problem, for which a linear time algorithm was proposed by Jean Pierre Duval in 1983. This blog post is probably the only publicly available resource that talks about the algorithm in detail. However, Duval's algorithms is based on the idea of pairwise comparisons ("duels"), and the blog conveniently uses an even-length string as an example.
How does the algorithm work for odd-length strings, where the last character wouldn't have a competing one to duel with?
One character can get a "bye", where it wins without participating in a "duel". The correctness of the algorithm does not rely on the specific duels that you perform; given any two distinct indices i and j, you can always conclusively rule out that one of them is the start-index of the lexicographically-minimal rotation (unless both are start-indices of identical lexicographically-minimal rotations, in which case it doesn't matter which one you reject). The reason to perform the duels in a specific order is performance: to get asymptotically linear time by ensuring that half the duels only need to compare one character, half of the rest only need to compare two characters, and so on, until the last duel only needs to compare half the length of the string. But a single odd character here and there doesn't change the asymptotic complexity, it just makes the math (and implementation) a little bit more complicated. A string of length 2n+1 still requires fewer "duels" than one of length 2n+1.
OP here: I accepted ruakh's answer as it pertains to my question, but I wanted to provide my own explanation for others that might stumble across this post trying to understand Duval's algorithm.
Problem:
Lexicographically least circular substring is the problem of finding
the rotation of a string possessing the lowest lexicographical order
of all such rotations. For example, the lexicographically minimal
rotation of "bbaaccaadd" would be "aaccaaddbb".
Solution:
A O(n) time algorithm was proposed by Jean Pierre Duval (1983).
Given two indices i and j, Duval's algorithm compares string segments of length j - i starting at i and j (called a "duel"). If index + j - i is greater than the length of the string, the segment is formed by wrapping around.
For example, consider s = "baabbaba", i = 5 and j = 7. Since j - i = 2, the first segment starting at i = 5 is "ab". The second segment starting at j = 7 is constructed by wrapping around, and is also "ab".
If the strings are lexicographically equal, like in the above example, we choose the one starting at i as the winner, which is i = 5.
The above process repeated until we have a single winner. If the input string is of odd length, the last character wins without a comparison in the first iteration.
Time complexity:
The first iteration compares n strings each of length 1 (n/2 comparisons), the second iteration may compare n/2 strings of length 2 (n/2 comparisons), and so on, until the i-th iteration compares 2 strings of length n/2 (n/2 comparisons). Since the number of winners is halved each time, the height of the recursion tree is log(n), thus giving us a O(n log(n)) algorithm. For small n, this is approximately O(n).
Space complexity is O(n) too, since in the first iteration, we have to store n/2 winners, second iteration n/4 winners, and so on. (Wikipedia claims this algorithm uses constant space, I don't understand how).
Here's a Scala implementation; feel free to convert to your favorite programming language.
def lexicographicallyMinRotation(s: String): String = {
#tailrec
def duel(winners: Seq[Int]): String = {
if (winners.size == 1) s"${s.slice(winners.head, s.length)}${s.take(winners.head)}"
else {
val newWinners: Seq[Int] = winners
.sliding(2, 2)
.map {
case Seq(x, y) =>
val range = y - x
Seq(x, y)
.map { i =>
val segment = if (s.isDefinedAt(i + range - 1)) s.slice(i, i + range)
else s"${s.slice(i, s.length)}${s.take(s.length - i)}"
(i, segment)
}
.reduce((a, b) => if (a._2 <= b._2) a else b)
._1
case xs => xs.head
}
.toSeq
duel(newWinners)
}
}
duel(s.indices)
}

Fibonacci memoization order of execution

The following code is fibonacci sequence using memoization. But I do not understand the order of execution in the algorithm. If we do dynamic_fib(4), it will calculate dynamic_fib(3) + dynamic_fib(2). And left side calls first, then it calculates dynamic_fib(2) + dynamic_fib(1). But while calculating dynamic_fib(3), how does the cached answer of dynamic_fib(2) propagate up to be reused when we are not saving the result to the memory address of the dictionary like &dic[n] in C.
What I think should happen is, the answer for dynamic_fib(2) is gone because it only existed in that stack. So you have to calculate dynamic_fib(2) again when calculating dynamic_fib(4)
Am I missing something?
def dynamic_fib(n):
return fibonacci(n, {})
def fibonacci(n, dic):
if n == 0 or n == 1:
return n
if not dic.get(n, False):
dic[n] = fibonacci(n-1, dic) + fibonacci(n-2, dic)
return dic[n]
The function dynamic_fib (called once) just delegates the work to fibonacci, where the real work is done. In fibonacci you have the dictionary dic which is used to save the values of the function once it is calculated. So, for each of the values (2-n) when you call the function fibonacci for the first time, it calculates the result, but it also stores it in the dictionary, so that when the next time we ask for it, we alread have it, and we don't need to travel the whole tree again. So the complexity is linear , O(n).

What would be the big O notation for the function?

I know that big O notation is a measure of how efficint a function is but I don\t really get how to get calculate it.
def method(n)
sum = 0
for i in range(85)
sum += i * n
return sum
Would the answer be O(f(85)) ?
The complexity of this function is O(1)
in the RAM model basic mathematical functions occur in constant time. The dominate term in this function is
for i in range(85):
since 85 is a constant the complexity is represented by O(1)
you have function with 4 "actions", to calculate its big O we need to calculate big O for each action and select max:
sum = 0 - constant time, measured O(1)
for i in range(85) - constant time, 85 iterations, O(1 * complexity of #3)
sum += i*n - we can say constant time, but multiplication is actually depends on bit length of i and n, so we can either say O(1), or O(max(lenI, lenN))
return sum - constant time, measured O(1)
so, the possible max big O is #2, which is the 1 * O(#3), as soon as lenI and lenN are constant (32 or 64 bits usually), max(lenI, lenN) -> 32/64, so total complexity of your function is O(1 * 1) = O(1)
if we have big math, ie bit length of N can be very very long, then we can say O(bit length N)
NOTE: bit length N is actually log2(N)
In theory, the complexity is O(log n). As n grows, reading the number and performing the multiplication takes longer.
However, in practice, the value of n is constrained (there's a maximum value) and thus it can be read and operations can be performed on it in O(1) time. Since we repeat an O(1) operation a fixed amount of times, the complexity is still O(1).
Note that O(1) means constant time - O(85) doesn't really mean anything different. If you perform multiple constant time operations in a sequence, the result is still O(1) unless the length of the sequence depends on the size of the input. Doing a O(1) operation 1000 times is still O(1), but doing it n times is O(n).
If you want to really play it safe, just say O(∞), that's definitely a correct answer. CS teachers tend to not really appreciate it in practice though.
When talking about complexity, there always should be said what operations should be considered as constant time ones (the initial agreement). Here the integer multiplication can be considered or constant or not. Anyway, the time complexity of the example is better than O(n). But it is the teacher's trick against the students -- kind of. :)

Resources