Space complexity O(1) storing string - string

I am a bit confused on space complexity. Is this O(1) space complexity or O(N) complexity?
Since I am creating a string of size n, my guess is the space complexity is O(N) is that correct?
## this function takes in a string and returns the string
def test(stringval):
stringval2 = ""
for x in stringval:
stringval2 = stringval2 + x
return stringval2
test("hello")}

Yes, that's correct. The space complexity of storing a new string of length n is Θ(n) because each individual character must be stored somewhere. In principle you could reduce the space usage by noticing that stringval2 ends up being a copy of stringval1 and potentially using copy-on-write or other optimizations, but in this case there's no reason to suspect that this is the case.
Hope this helps!

Related

Ocaml-What is the most efficient way to calculate hash values for all substrings in a string?

What is the most efficient way to obtain hash values for all substrings in a string. I tried to use:
let str1 = "AHTG...";;(*1000000 chars*)
let tam = 2;;
for i = 0 to String.length str1 - tam do
let st = String.sub str1 i tam in
Hashtbl.add hash_table (Hashtbl.hash st) i;
done;
to calculate all substrings with size =2 (AC,CH,TA,...) of a string with size = 1000000 and add values to hash_table but it takes a lot of time to finish the process,i think. I was wondering if there is any process more efficient and faster than the one presented above?
First of all, there are a lot of substrings of a string, around n^2/2 of them I would say. This is a big number when n = 1e6. If your hash function is a black box with no known arithmetic properties, and your string also has no known extra properties, you basically have to do O(n^2) calls to your hash function, which will take a long time.
If your hash function has interesting arithmetic properties, like say hash(a ^ b) = hash(a) + hash(b) mod K, you might be able to do a little better. On the other hand, properties like this probably make a weaker hash.
As an immediate improvement, you might consider a hash function that works directly on a substring. That will save you a lot of calls to String.sub and the associated consing and GC. (Probably this won't help a lot as OCaml has a really good GC for short-lived values.)

Performance question about string slicing in Python

I am learning some python and in the process of it I'm doing some simple katas from codewars.
I run into https://www.codewars.com/kata/scramblies problem.
My solution went as follows:
def scramble(s1,s2):
result = True
for character in s2:
if character not in s1:
return False
i = s1.index(character)
s1 = s1[0:i] + s1[i+1:]
return result
While it was correct result, it wasn't fast enough. My solution timed out after 12000 ms.
I looked at the solutions presented by others and one involved making a set.
def scramble(s1,s2):
for letter in set(s2):
if s1.count(letter) < s2.count(letter):
return False
return True
Why is my solution so much slower than the other one? It doesn't look like it should be unless I am misunderstanding something efficiency of slicing strings. Is my approach to solving this problem flawed or not pythonic?
For this kind of online programming challenge with a limit on your program's running time, the test inputs will include some quite large examples, and the time limit is usually set so that you don't have to squeeze every last millisecond of performance out of your code, but you do have to write an algorithm of a low enough computational complexity. To answer why your algorithm times out, we can analyse it to find its computational complexity using big O notation.
First we can label each individual statement with its complexity, where n is the length of s1 and m is the length of s2:
def scramble(s1,s2):
result = True # O(1)
for character in s2: # loop runs O(m) times
if character not in s1: # O(n) to search characters in s1
return False # O(1)
i = s1.index(character) # O(n) to search characters in s1
s1 = s1[0:i] + s1[i+1:] # O(n) to build a new string
return result # O(1)
Then the total complexity is O(1 + m*(n + 1 + n + n) + 1) or more simply, O(m*n). This is not efficient for this problem.
The key to why the alternate algorithm is faster lies in the fact that set(s2) contains only the distinct characters from the string s2. This is important because the alphabet that these strings are formed from has a constant, limited size; presumably 26 for the lowercase letters. Given this, the outer loop of the alternate algorithm actually runs at most 26 times:
def scramble(s1,s2):
for letter in set(s2): # O(m) to build a set
# loop runs O(1) times
if s1.count(letter) < s2.count(letter): # O(n) + O(m) to count
# chars from s1 and s2
return False # O(1)
return True # O(1)
This means the alternate algorithm's complexity is O(m + 1*(n + m + 1) + 1) or more simply O(m + n), meaning it is asymptotically more efficient than your algorithm.
First of all, set is fast and very good at its job. For things like in, set is faster than list.
Second of all, your solution is doing way more work than the correct solution. Note how the second solution never modifies s1 or s2, whereas your solution both takes two slices of s1 and then reassigns s1. This, along with calling .index(). Slicing isn't the fastest operation, mainly because memory has to be allocated and data has to be copied. .remove() would probably be faster than the combination of .index() and slicing that you're doing.
The underlying message here is if the task can be done in fewer operations, it's obviously going to execute more quickly. Slicing is also more expensive than most other methods because allocating space and copying memory is a more expensive operation than the computational methods like .count() that the correct solution uses.

time complexity for check if string has only unique chars

This is an algorithm to determine if a string has all unique characters. What is the time complexity?
def unique(s):
d = []
for c in s:
if c not in d:
d.append(c)
else:
return False
return True
Looks like it only one for loop here so it should be O(n), however, this line
if c not in d:
does this line also cost O(n) time, if so, the time complexity for this algorithm is O(n^2) ?
Your intuition is correct, this algorithm is O(n2). The documentation for list specifies that in is an O(n) operation. In the worst case scenario, when the target element is not present in the list, every element will need to be visited.
Using a set instead of a list would improve time complexity to O(n) because set lookups would be O(1).
An easy way to take advantage of set's O(n) time complexity to test if all characters in a string are unique is to simply convert the string sequence to a set and see if its length is still the same:
def unique(s):
return len(s) == len(set(s))

Python string Concatenation complexity

For the following code
str = 'abc'
str += 'd'
When the string is this small, I believe it's safe to say both time and space complexity is O(1).
But what if the string grows super long?
Should the time and space complexity be O(N) where N is the size of the string?
Thanks!

Which is faster - Strings are Equal or Replacing Strings

I think this may be overkill, but I am just curious. Generally speaking (if there is a general answer), which is faster assuming the strings are equal to each other 50% of the time:
void UpdateString1(string str1, string str2)
{
str1 = str2;
}
void UpdateString2(string str1, string str2)
{
if (str1 != str2)
{
str1 = str2;
}
}
Assuming in your hypothetical language != means "compare" and = means "copy"...
I'm going to say that UpdateString1 is always at least as fast.
Suppose the strings aren't equal. Then UpdateString2 performs the comparison as well as the assignment. So it does additional work.
Suppose the strings are equal. Then the comparison involves iterating through every single character in both strings and comparing them. So that's O(n). Similarly, copying would involve, at worst, visiting every character in one string and copying it to the second. Also O(n). So the same complexity. Also the same number of memory accesses.
However you've also got the partial comparison costs of the strings that aren't equal. Which I think tips it in favour of the copy.
Supposing != and = were just comparing or updating references by identity, not by value...
All operations are O(1) and simuilar in cost. The = is one operation 100% of the time. The !=/= is an expected 1.5 operations if strings are equal 50% of the time.
If you really wanted to check if str1 != str2, then use str1 = str2;.
It is shorter in the code and easier to follow.
Branches add more complexity than assignments. If assignment is considered 1 operation unit, the branch is probably 1.5 operation units on its own on average, and even more if your data passed in is random. See: Why is it faster to process a sorted array than an unsorted array?
It is overkill to optimize for this.

Resources