Calculation of time and space complexity - python-3.x

I am working on this Leetcode problem - "Given a string containing digits from 2-9 inclusive, return all possible letter combinations that the number could represent. Return the answer in any order.
A mapping of digit to letters (just like on the telephone buttons) is given below.
Note that 1 does not map to any letters."
This is a recursive solution to the problem that I was able to understand, but I am not able to figure out the time and space complexity of the solution.
if not len(digits):
return []
res = []
my_dict = {
'2':'abc',
'3':'def',
'4':'ghi',
'5':'jkl',
'6':'mno',
'7':'pqrs',
'8':'tuv',
'9':'wxyz'
}
if len(digits) == 1:
return list(my_dict[digits[0]])
my_list = my_dict[digits[0]] #string - abc
for i in range(len(my_list)): # i = 0,1,2
for item in self.letterCombinations(digits[1:]):
print(item)
res.append(my_list[i] + item)
return res
Any help or explanation regarding calculating time and space complexity for this solution would be helpful. Thank you.

With certain combinatorial problems, the time and space complexity can become dominated by the size of the output. Looking at the loops and function calls, the work being done in the function is one string concatenation and one append for each element of the output. There's also up to 4 repeated recursive calls to self.letterCombinations(digits[1:]): assuming these aren't cached, we need to add in the extra repeated work being done there.
We can write a formula for the number of operations needed to solve the problem when len(digits) == n. If T(n) is the number of steps, and A(n) is the length of the answer array, we get T(n) = 4*T(n-1) + n*A(n) + O(1). We get an extra multiplicative factor of n on A(n) because string concatenation is linear time; an implementation with lists and str.join() would avoid that.
Since A(n) is upper-bounded by 4^n, and T(1) is a constant, this gives T(n) = O(n * (4^n)); the space complexity here is also O(n * (4^n)), given 4^n strings of length n.
One possibly confusing part of complexity analysis is that it's usually a worst-case analysis unless specified otherwise. That's why we use 4 instead of 3 here: if any input could give 4^n results, we use that figure, even though many digit inputs would give closer to 3^n results.

Related

Performance question about string slicing in Python

I am learning some python and in the process of it I'm doing some simple katas from codewars.
I run into https://www.codewars.com/kata/scramblies problem.
My solution went as follows:
def scramble(s1,s2):
result = True
for character in s2:
if character not in s1:
return False
i = s1.index(character)
s1 = s1[0:i] + s1[i+1:]
return result
While it was correct result, it wasn't fast enough. My solution timed out after 12000 ms.
I looked at the solutions presented by others and one involved making a set.
def scramble(s1,s2):
for letter in set(s2):
if s1.count(letter) < s2.count(letter):
return False
return True
Why is my solution so much slower than the other one? It doesn't look like it should be unless I am misunderstanding something efficiency of slicing strings. Is my approach to solving this problem flawed or not pythonic?
For this kind of online programming challenge with a limit on your program's running time, the test inputs will include some quite large examples, and the time limit is usually set so that you don't have to squeeze every last millisecond of performance out of your code, but you do have to write an algorithm of a low enough computational complexity. To answer why your algorithm times out, we can analyse it to find its computational complexity using big O notation.
First we can label each individual statement with its complexity, where n is the length of s1 and m is the length of s2:
def scramble(s1,s2):
result = True # O(1)
for character in s2: # loop runs O(m) times
if character not in s1: # O(n) to search characters in s1
return False # O(1)
i = s1.index(character) # O(n) to search characters in s1
s1 = s1[0:i] + s1[i+1:] # O(n) to build a new string
return result # O(1)
Then the total complexity is O(1 + m*(n + 1 + n + n) + 1) or more simply, O(m*n). This is not efficient for this problem.
The key to why the alternate algorithm is faster lies in the fact that set(s2) contains only the distinct characters from the string s2. This is important because the alphabet that these strings are formed from has a constant, limited size; presumably 26 for the lowercase letters. Given this, the outer loop of the alternate algorithm actually runs at most 26 times:
def scramble(s1,s2):
for letter in set(s2): # O(m) to build a set
# loop runs O(1) times
if s1.count(letter) < s2.count(letter): # O(n) + O(m) to count
# chars from s1 and s2
return False # O(1)
return True # O(1)
This means the alternate algorithm's complexity is O(m + 1*(n + m + 1) + 1) or more simply O(m + n), meaning it is asymptotically more efficient than your algorithm.
First of all, set is fast and very good at its job. For things like in, set is faster than list.
Second of all, your solution is doing way more work than the correct solution. Note how the second solution never modifies s1 or s2, whereas your solution both takes two slices of s1 and then reassigns s1. This, along with calling .index(). Slicing isn't the fastest operation, mainly because memory has to be allocated and data has to be copied. .remove() would probably be faster than the combination of .index() and slicing that you're doing.
The underlying message here is if the task can be done in fewer operations, it's obviously going to execute more quickly. Slicing is also more expensive than most other methods because allocating space and copying memory is a more expensive operation than the computational methods like .count() that the correct solution uses.

time complexity for check if string has only unique chars

This is an algorithm to determine if a string has all unique characters. What is the time complexity?
def unique(s):
d = []
for c in s:
if c not in d:
d.append(c)
else:
return False
return True
Looks like it only one for loop here so it should be O(n), however, this line
if c not in d:
does this line also cost O(n) time, if so, the time complexity for this algorithm is O(n^2) ?
Your intuition is correct, this algorithm is O(n2). The documentation for list specifies that in is an O(n) operation. In the worst case scenario, when the target element is not present in the list, every element will need to be visited.
Using a set instead of a list would improve time complexity to O(n) because set lookups would be O(1).
An easy way to take advantage of set's O(n) time complexity to test if all characters in a string are unique is to simply convert the string sequence to a set and see if its length is still the same:
def unique(s):
return len(s) == len(set(s))

Shortest uncommon prefix from a set of strings

Given a string A and a set of string S. Need to find an optimum method to find a prefix of A which is not a prefix of any of the strings in S.
Example
A={apple}
S={april,apprehend,apprehension}
Output should be "appl" and not "app" since "app" is prefix of both "apple" and "apprehension" but "appl" is not.
I know the trie approach; by making a trie of set S and then traversing in the trie for string A.
But what I want to ask is can we do it without trie?
Like can we compare every pair (A,Si), Si = ith string from set S and get the largest common prefix out of them.In this case that would be "app" , so now the required ans would be "appl".
This would take 2 loops(one for iterating through S and another for comparing Si and A).
Can we improve upon this??
Please suggest an optimum approach.
I'm not sure exactly what you had in mind, but here's one way to do it:
Keep a variable longest, initialised to 0.
Loop over all elements S[i] of S,
setting longest = max(longest, matchingPrefixLength(S[i], A)).
Return the prefix from A of length longest+1.
This uses O(1) space and takes O(length(S)*average length of S[i]) time.
This is optimal (at least for the worst case) since you can't get around needing to look at every character of every element in S.
Example:
A={apple}
S={april,apprehend,apprehension}
longest = 0
The longest prefix for S[0] and A is 2
So longest = max(0,2) = 2
The longest prefix for S[1] and A is 3
So longest = max(2,3) = 3
The longest prefix for S[2] and A is 3
So longest = max(3,3) = 3
Now we return the prefix of length longest+1 = 4, i.e. "appl"
Note that there are actually 2 trie-based approaches:
Store only A in the trie. Iterate through the trie for each element from S to eliminate prefixes.
This uses much less memory than the second approach (but still more than the approach above). At least assuming A isn't much, much longer than S[i], but you can optimise to stop at the longest element in S or construct the tree as we go to avoid this case.
Store all elements from S in the trie. Iterate through the trie with A to find the shortest non-matching prefix.
This approach is significantly faster if you have lots of A's that you want to query for a constant set S (since you only have to set up the trie once, and do a single lookup for each A, where-as you have to create a new trie and run through each S[i] for each A for the first approach).
What is your input size?
Let's model your input as being of N+1 strings whose lengths are about M characters. Your total input size is about M(N+1) character, plus some proportional amount of apparatus to encode that data in a usable format (data structure overhead).
Your algorithm ...
maxlen = 0
for i = 1 to N
for j = 1 to M
if A[j] = S[i][j] then
if j > maxlen then maxlen = j
break
print A[1...maxlen]
... performs up M x N iterations of the innermost loop, reading two characters each time, for a total of 2MN characters read.
Recall our input data size was about M(N+1) also. So our question now is whether we can solve this problem, in the worst case, looking at asymptotically less than the total input (you do a little less than looking at all the input twice, or linear in the input size). The answer is no. Consider this worst case:
length of A is M'
length of all strings in S is M'
A differs from N-1 strings in S by the last two characters
A differs from 1 string in S by only the last character
Any algorithm must look at M'-1 characters of N-1 strings, plus M' characters of 1 string, to correctly determine the answer of this problem instance is A.
(M'-1)(N'-1) + N = M'N - M' - N + 1 + N = M'N - M' + 1
For N >= 2, the dominant terms in both M'(N+1) and M'N' are both M'N, meaning that for N >= 2, both the input size and the amount of that input any correct algorithm must read is O(MN). Your algorithm is O(MN). Any other algorithm cannot be asymptotically better.

Minimum Character that needed to be deleted

Original Problem:
A word was K-good if for every two letters in the word, if the first appears x times and the second appears y times, then |x - y| ≤ K.
Given some word w, how many letters does he have to remove to make it K-good?
Problem Link.
I have solved the above problem and i not asking solution for the above
problem
I just misread the statement for first time and just thought how can we solve this problem in linear line time , which just give rise to a new problem
Modification Problem
A word was K-good if for every two consecutive letters in the word, if the first appears x times and the second appears y times, then |x - y| ≤ K.
Given some word w, how many letters does he have to remove to make it K-good?
Is this problem is solvable in linear time , i thought about it but could not find any valid solution.
Solution
My Approach: I could not approach my crush but her is my approach to this problem , try everything( from movie Zooptopia)
i.e.
for i range(0,1<<n): // n length of string
for j in range(0,n):
if(i&(1<<j) is not zero): delete the character
Now check if String is K good
For N in Range 10^5. Time Complexity: Time does not exist in that dimension.
Is there any linear solution to this problem , simple and sweet like people of stackoverflow.
For Ex:
String S = AABCBBCB and K=1
If we delete 'B' at index 5 so String S = AABCBCB which is good string
F[A]-F[A]=0
F[B]-F[A]=1
F[C]-F[B]=1
and so on
I guess this is a simple example there can me more complex example as deleting an I element makens (I-1) and (I+1) as consecutive
Is there any linear solution to this problem?
Consider the word DDDAAABBDC. This word is 3-good, becauseDandCare consecutive and card(D)-card(C)=3, and removing the lastDmakes it 1-good by makingDandCnon-consecutive.
Inversely if I consider DABABABBDC which is 2-good, removing the lastDmakes CandBconsecutive and increases the K-value of the word to 3.
This means that in the modified problem, the K-value of a word is determined by both the cardinals of each letter and the cardinals of each couple of consecutive letters.
By removing a letter, I reduce its cardinal of the letter as well as the cardinals of the pairs to which it belongs, but I also increase the cardinal of other pair (potentially creating new ones).
It is also important to notice that if in the original problem, all letters are equivalent (I can remove any indifferently), while it is no longer the case in the modified problem.
As a conclusion, I think we can safely assume that the "consecutive letters" constrain makes the problem not solvable in linear time for any alphabet/word.
Instead of finding the linear time solution, which i think doesn't exist (among others because there seem to be a multitude of alternative solutions to each K request), i'd like to preset the totally geeky solution.
Namely, take the parallel array processing language Dyalog APL and create these two tiny dynamic functions:
good←{1≥⍴⍵:¯1 ⋄ b←(⌈/a←(∪⍵)⍳⍵)⍴0 ⋄ b[a]+←1 ⋄ ⌈/|2-/b[a]}
make←{⍵,(good ⍵),a,⍺,(l-⍴a←⊃b),⍴b←(⍺=good¨b/¨⊂⍵)⌿(b←↓⍉~(l⍴2)⊤0,⍳2⊥(l←⍴⍵)⍴1)/¨⊂⍵}
good tells us the K-goodness of a string. A few examples below:
// fn" means the fn executes on each of the right args
good" 'AABCBBCB' 'DDDAAABBDC' 'DDDAAABBC' 'DABABABBDC' 'DABABABBC' 'STACKOVERFLOW'
2 3 1 2 3 1
make takes as arguments
[desired K] make [any string]
and returns
- original string
- K for original string
- reduced string for desired K
- how many characters were removed to achieve deired K
- how many possible solutions there are to achieve desired K
For example:
3 make 'DABABABBDC'
┌──────────┬─┬─────────┬─┬─┬──┐
│DABABABBDC│2│DABABABBC│3│1│46│
└──────────┴─┴─────────┴─┴─┴──┘
A little longer string:
1 make 'ABCACDAAFABBC'
┌─────────────┬─┬────────┬─┬─┬────┐
│ABCACDAAFABBC│4│ABCACDFB│1│5│3031│
└─────────────┴─┴────────┴─┴─┴────┘
It is possible to both increase and decrease the K-goodness.
Unfortunately, this is brute force. We generate the 2-base of all integers between 2^[lenght of string] and 1, for example:
0 1 0 1 1
Then we test the goodness of the substring, for example of:
0 1 0 1 1 / 'STACK' // Substring is now 'TCK'
We pick only those results (substrings) that match the desired K-good. Finally, out of the multitude of possible results, we pick the first one, which is the one with most characters left.
At least this was fun to code :-).

How can I write the following script in Python?

So the program that I wanna write is about adding two strings S1 and S2 who are made of int.
example: S1='129782004977', S2='754022234930', SUM='883804239907'
So far I've done this but still it has a problem because it does not rive me the whole SUM.
def addS1S2(S1,S2):
N=abs(len(S2)-len(S1))
if len(S1)<len(S2):
S1=N*'0'+S1
if len(S2)<len(S1):
S2=N*'0'+S2
#the first part was to make the two strings with the same len.
S=''
r=0
for i in range(len(S1)-1,-1,-1):
s=int(S1[i])+int(S2[i])+r
if s>9:
r=1
S=str(10-s)+S
if s<9:
r=0
S=str(s)+S
print(S)
if r==1:
S=str(r)+S
return S
This appears to be homework, so I will not give full code but just a few pointers.
There are three problems with your algorithm. If you fix those, then it should work.
10-s will give you negative numbers, thus all those - signs in the sum. Change it to s-10
You are missing all the 9s. Change if s<9: to if s<=9:, or even better, just else:
You should not add r to the string in every iteration, but just at the very end, after the loop.
Also, instead of using those convoluted if statements to check r and substract 10 from s you can just use division and modulo instead: r = s/10 and s = s%10, or just r, s = divmod(s, 10).
If this is not homework: Just use int(S1) + int(S2).

Resources