Dealing with problems where memory isn't sufficient. Dynamic programming - python-3.x

I was solving a Problem using python, here i was storing a repetitive string "abc" in a string with everytime each character getting double like "abcaabbccaaaabbbbcccc.......... , and i had to find the nth character. The constraints were n<=10^9 , Now when i tried to store this their was memory error as the string was to too large (i tried to store all the charaters till the charater 2^30 times repeated). CAn somebody help me with the approach to tackle this situation.
t=' '
for i in range(0 , 30):
t = t +'a'*(2**i)
t = t +'b'*(2**i)
t = t +'c'*(2**i)

Obviously, you can't do this the straightforward, brute-force way. Instead, you need to count along a virtual string to find where your given index appears. I'll lay this out in too much detail so you can see the logic:
n = 314159265 # Pick a large value for illustration
rem = n
for i in range(0 , 30):
size = 2**i
# print(size, rem)
rem -= size
if rem <= 0:
char = 'a'
break
rem -= size
if rem <= 0:
char = 'b'
break
rem -= size
if rem <= 0:
char = 'c'
break
print("Character", n, "is", char)
Output:
Character 314159265 is b
You can shorten this with a better loop body; I'll leave that as a further exercise. If you get insightful with your arithmetic, you can simply compute the appropriate letter from the chunk sizes you generate.

Related

Convert S to T by performing K operations (HackerRank)

I was solving a problem on HackerRank. It required me to see if it is possible to convert string s to string t by performing k operations.
https://www.hackerrank.com/challenges/append-and-delete/problem
The operations we can perform are: appending a lowercase letter to the end of s or removing a lowercase letter from the end of s. For example Ash Ashley 2 would return No since we need 3 operations, not 2.
I tried solving the problem as follows:
def appendAndDelete(s, t, k):
if len(s) > len(t):
maxs = [s,t]
else:
maxs = [t,s]
maximum = maxs[0]
minimum = maxs[1]
k -= len(maximum) - len(minimum)
substr = maximum[len(minimum): len(maximum)]
maximum = maximum.replace(substr, '')
i = 0
while i < len(maximum):
if maximum[i] != minimum[i]:
k -= (len(maximum)-i)*2
break
i += 1
if k < 0:
return 'No'
else:
return 'Yes'
However, it fails at this weird test case. y yu 2. The expected answer is No but according to my code, it would return Yes since only one operation was required. Is there something I do not understand?
Since you don't explain your idea, it's difficult for us to understand
what you mean in your code and debug it to tell you where you went wrong.
However, I would like to share my idea(I solved this on the website too)-
len1 => Length of first string s.
len2 => Length of second/target string t.
Exactly K makes it a bit tricky. So, if len1 + len2 <= k, you can blindly assume it can be accomplished and return true since we can delete empty string many times to get an empty string(as it says) and we can delete characters of one string entirely and keep appending new letters to get the another.
When we start matching s with t from left to right, this looks more like longest common prefix but this is NOT the case. Let's take an example -
aaaaaaaaa (source)
aaaa (target)
7 (k)
Here, up till aaaa it's common and looks like there are additional 5 a's in the source. So, we can delete those 5 a's and get the target but 5 != 7, hence it appears to be a No. But this ain't the case since we can delete an a from the source just like that and append it again(2 operations) just to satisfy k. So, it need not be longest common prefix all the time, however it gets us closer to the solution.
So, let's match both strings from left to right and stop when there is a mismatch. Let's assume we got this index in a variable called first_unmatched. Initialize first_unmatched = min(len(s),len(t)) at the beginning of your method itself.
Let
rem1 = len1 - first_unmatched
rem2 = len2 - first_unmatched
where rem1 is remaining substring of s and rem2 is the remaining substring of t.
Now, comes the conditions.
if(rem1 + rem2 == k) return true-
This is because rem1 characters to delete and rem2 characters to add. If both sum up to k then it's possible.
if(rem1 + rem2 > k) return false-
This is because rem1 characters to delete and rem2 characters to add. If both sum greater than k then it's not possible.
if(rem1 + rem2 < k) return (k - (rem1 + rem2)) % 2 == 0-
This is because rem1 characters to delete and rem2 characters to add. If both sum less than k, then it depends.
Here, (k - (rem1 + rem2)) will give you the extra in k. This extra can or cannot depends upon whether it's divisible by 2 or not. Here, we do %2 because we have 2 operations in our question - delete and append. If the extra k falls short of any operation, then the answer is No, else it's a Yes.
You can cross check this with above example.

Returning most used Letters while Looping - Python

How can I return the letter that is used the most in the for loop ?
My Code:
import string
def intefer_shift(encrypted_textfile, language):
index = 0
file_connector = open(encrypted_textfile,'r')
data = file_connector.read()
box =[]
data = data.lower()
file_connector.close()
times = 0
#this for loop is designed to count the number of times items appear in file
# this for loop is designed print each letter in the alphabet and tells how many times they appear
for letter in string.ascii_lowercase:
num = data.count(letter)
print(letter, ':', num)
intefer_shift('homework1.txt','English')
Simply add some kind of counter to store max_value, updated (if necessary) at every iteration of the loop.
top_letter = None
max_count = 0
for letter in string.ascii_lowercase:
num = data.count(letter)
print(letter, ':', num)
if num > max_count:
max_count = num
top_letter = letter
return top_letter
Mind the fact, that counting letters in probably big data-string one after another might be not efficient, and probably better option would be to loop all of the letters in data once and increment counter for each of them (this solution has worse space complexity, but better time complexity).

I need help converting characters in a string to numerical values in Matlab [duplicate]

I'm looking for a quick way to convert a large character array of lowercase letters, spaces and periods into a set of integers and vice-versa in MATLAB.
Usually I would use the double and char functions, but I would like to use a special set of integers to represent each letter (so that 'a' matches with '1', 'b' matches with '2'.... 'z' matches with 26, ' ' matches with 27, and '.' matches with 28)
The current method that I have is:
text = 'quick brown fox jumps over dirty dog';
alphabet ='abcdefghijklmnopqrstuvwxyz .';
converted_text = double(text);
converted_alphabet = double(alphabet);
numbers = nan(28,1)
for i = 1:28
numbers(converted_text(i)==converted_alphabet(i)) = i;
end
newtext = nan(size(numbers))
for i = 1:size(numbers,1)
newtext(numbers==i) = alphabet(i)
end
Unfortunately this takes quite a bit of time for large arrays, and I'm wondering if there is a quicker way to do this in MATLAB?
An easy way is to use ismember():
[~,pos] = ismember(text,alphabet)
Or use the implicit conversion carried out by -:
out = text - 'a' + 1;
note that blanks will have -64 and full stops -50, which means that you will need:
out(out == -64) = 27;
out(out == -50) = 28;
Speed considerations:
For small sized arrays the latter solution is slightly faster IF you are happy to leave blanks and full stops with their negative index.
For big arrays, on my machine 1e4 times longer, the latter solution is twice faster than ismember().
Going back:
alphabet(out)

Efficiently counting the number of substrings of a digit string that are divisible by k?

We are given a string which consists of digits 0-9. We have to count number of sub-strings divisible by a number k. One way is to generate all the sub-strings and check if it is divisible by k but this will take O(n^2) time. I want to solve this problem in O(n*k) time.
1 <= n <= 100000 and 2 <= k <= 1000.
I saw a similar question here. But k was fixed as 4 in that question. So, I used the property of divisibility by 4 to solve the problem.
Here is my solution to that problem:
int main()
{
string s;
vector<int> v[5];
int i;
int x;
long long int cnt = 0;
cin>>s;
x = 0;
for(i = 0; i < s.size(); i++) {
if((s[i]-'0') % 4 == 0) {
cnt++;
}
}
for(i = 1; i < s.size(); i++) {
int f = s[i-1]-'0';
int s1 = s[i] - '0';
if((10*f+s1)%4 == 0) {
cnt = cnt + (long long)(i);
}
}
cout<<cnt;
}
But I wanted a general algorithm for any value of k.
This is a really interesting problem. Rather than jumping into the final overall algorithm, I thought I'd start with a reasonable algorithm that doesn't quite cut it, then make a series of modifications to it to end up with the final, O(nk)-time algorithm.
This approach combines together a number of different techniques. The major technique is the idea of computing a rolling remainder over the digits. For example, let's suppose we want to find all prefixes of the string that are multiples of k. We could do this by listing off all the prefixes and checking whether each one is a multiple of k, but that would take time at least Θ(n2) since there are Θ(n2) different prefixes. However, we can do this in time Θ(n) by being a bit more clever. Suppose we know that we've read the first h characters of the string and we know the remainder of the number formed that way. We can use this to say something about the remainder of the first h+1 characters of the string as well, since by appending that digit we're taking the existing number, multiplying it by ten, and then adding in the next digit. This means that if we had a remainder of r, then our new remainder is (10r + d) mod k, where d is the digit that we uncovered.
Here's quick pseudocode to count up the number of prefixes of a string that are multiples of k. It runs in time Θ(n):
remainder = 0
numMultiples = 0
for i = 1 to n: // n is the length of the string
remainder = (10 * remainder + str[i]) % k
if remainder == 0
numMultiples++
return numMultiples
We're going to use this initial approach as a building block for the overall algorithm.
So right now we have an algorithm that can find the number of prefixes of our string that are multiples of k. How might we convert this into an algorithm that finds the number of substrings that are multiples of k? Let's start with an approach that doesn't quite work. What if we count all the prefixes of the original string that are multiples of k, then drop off the first character of the string and count the prefixes of what's left, then drop off the second character and count the prefixes of what's left, etc? This will eventually find every substring, since each substring of the original string is a prefix of some suffix of the string.
Here's some rough pseudocode:
numMultiples = 0
for i = 1 to n:
remainder = 0
for j = i to n:
remainder = (10 * remainder + str[j]) % k
if remainder == 0
numMultiples++
return numMultiples
For example, running this approach on the string 14917 looking for multiples of 7 will turn up these strings:
String 14917: Finds 14, 1491, 14917
String 4917: Finds 49,
String 917: Finds 91, 917
String 17: Finds nothing
String 7: Finds 7
The good news about this approach is that it will find all the substrings that work. The bad news is that it runs in time Θ(n2).
But let's take a look at the strings we're seeing in this example. Look, for example, at the substrings found by searching for prefixes of the entire string. We found three of them: 14, 1491, and 14917. Now, look at the "differences" between those strings:
The difference between 14 and 14917 is 917.
The difference between 14 and 1491 is 91
The difference between 1491 and 14917 is 7.
Notice that the difference of each of these strings is itself a substring of 14917 that's a multiple of 7, and indeed if you look at the other strings that we've matched later on in the run of the algorithm we'll find these other strings as well.
This isn't a coincidence. If you have two numbers with a common prefix that are multiples of the same number k, then the "difference" between them will also be a multiple of k. (It's a good exercise to check the math on this.)
So this suggests another route we can take. Suppose that we find all prefixes of the original string that are multiples of k. If we can find all of them, we can then figure out how many pairwise differences there are among those prefixes and potentially avoid rescanning things multiple times. This won't find everything, necessarily, but it will find all substrings that can be formed by computing the difference of two prefixes. Repeating this over all suffixes - and being careful not to double-count things - could really speed things up.
First, let's imagine that we find r different prefixes of the string that are multiples of k. How many total substrings did we just find if we include differences? Well, we've found k strings, plus one extra string for each (unordered) pair of elements, which works out to k + k(k-1)/2 = k(k+1)/2 total substrings discovered. We still need to make sure we don't double-count things, though.
To see whether we're double-counting something, we can use the following technique. As we compute the rolling remainders along the string, we'll store the remainders we find after each entry. If in the course of computing a rolling remainder we rediscover a remainder we've already computed at some point, we know that the work we're doing is redundant; some previous scan over the string will have already computed this remainder and anything we've discovered from this point forward will have already been found.
Putting these ideas together gives us this pseudocode:
numMultiples = 0
seenRemainders = array of n sets, all initially empty
for i = 1 to n:
remainder = 0
prefixesFound = 0
for j = i to n:
remainder = (10 * remainder + str[j]) % k
if seenRemainders[j] contains remainder:
break
add remainder to seenRemainders[j]
if remainder == 0
prefixesFound++
numMultiples += prefixesFound * (prefixesFound + 1) / 2
return numMultiples
So how efficient is this? At first glance, this looks like it runs in time O(n2) because of the outer loops, but that's not a tight bound. Notice that each element can only be passed over in the inner loop at most k times, since after that there aren't any remainders that are still free. Therefore, since each element is visited at most O(k) times and there are n total elements, the runtime is O(nk), which meets your runtime requirements.

MATLAB: Quickest Way To Convert Characters to A Custom Set Numbers and Back

I'm looking for a quick way to convert a large character array of lowercase letters, spaces and periods into a set of integers and vice-versa in MATLAB.
Usually I would use the double and char functions, but I would like to use a special set of integers to represent each letter (so that 'a' matches with '1', 'b' matches with '2'.... 'z' matches with 26, ' ' matches with 27, and '.' matches with 28)
The current method that I have is:
text = 'quick brown fox jumps over dirty dog';
alphabet ='abcdefghijklmnopqrstuvwxyz .';
converted_text = double(text);
converted_alphabet = double(alphabet);
numbers = nan(28,1)
for i = 1:28
numbers(converted_text(i)==converted_alphabet(i)) = i;
end
newtext = nan(size(numbers))
for i = 1:size(numbers,1)
newtext(numbers==i) = alphabet(i)
end
Unfortunately this takes quite a bit of time for large arrays, and I'm wondering if there is a quicker way to do this in MATLAB?
An easy way is to use ismember():
[~,pos] = ismember(text,alphabet)
Or use the implicit conversion carried out by -:
out = text - 'a' + 1;
note that blanks will have -64 and full stops -50, which means that you will need:
out(out == -64) = 27;
out(out == -50) = 28;
Speed considerations:
For small sized arrays the latter solution is slightly faster IF you are happy to leave blanks and full stops with their negative index.
For big arrays, on my machine 1e4 times longer, the latter solution is twice faster than ismember().
Going back:
alphabet(out)

Resources