Input as many characters as possible in notepad with fewest keyboard typings [duplicate] - string

This is an interview question from google. I am not able to solve it by myself. Can somebody shed some light?
Write a program to print the sequence of keystrokes such that it generates the maximum number of character 'A's. You are allowed to use only 4 keys: A, Ctrl+A, Ctrl+C and Ctrl+V. Only N keystrokes are allowed. All Ctrl+ characters are considered as one keystroke, so Ctrl+A is one keystroke.
For example, the sequence A, Ctrl+A, Ctrl+C, Ctrl+V generates two A's in 4 keystrokes.
Ctrl+A is Select All
Ctrl+C is Copy
Ctrl+V is Paste
I did some mathematics. For any N, using x numbers of A's , one Ctrl+A, one Ctrl+C and y Ctrl+V, we can generate max ((N-1)/2)2 number of A's. For some N > M, it is better to use as many Ctrl+A's, Ctrl+C and Ctrl+V sequences as it doubles the number of A's.
The sequence Ctrl+A, Ctrl+V, Ctrl+C will not overwrite the existing selection. It will append the copied selection to selected one.

There's a dynamic programming solution. We start off knowing 0 keys can make us 0 A's. Then we iterate through for i up to n, doing two things: pressing A once and pressing select all + copy followed by paste j times (actually j-i-1 below; note the trick here: the contents are still in the clipboard, so we can paste it multiple times without copying each time). We only have to consider up to 4 consecutive pastes, since select, copy, paste x 5 is equivalent to select, copy, paste, select, copy, paste and the latter is better since it leaves us with more in the clipboard. Once we've reached n, we have the desired result.
The complexity might appear to be O(N), but since the numbers grow at an exponential rate it is actually O(N2) due to the complexity of multiplying the large numbers. Below is a Python implementation. It takes about 0.5 seconds to calculate for N=50,000.
def max_chars(n):
dp = [0] * (n+1)
for i in xrange(n):
dp[i+1] = max(dp[i+1], dp[i]+1) # press a
for j in xrange(i+3, min(i+7, n+1)):
dp[j] = max(dp[j], dp[i]*(j-i-1)) # press select all, copy, paste x (j-i-1)
return dp[n]
In the code, j represents the total number of keys pressed after our new sequence of keypresses. We already have i keypresses at this stage, and 2 new keypresses go to select-all and copy. Therefore we're hitting paste j-i-2 times. Since pasting adds to the existing sequence of dp[i] A's, we need to add 1 making it j-i-1. This explains the j-i-1 in the 2nd-last line.
Here are some results (n => number of A's):
7 => 9
8 => 12
9 => 16
10 => 20
100 => 1391569403904
1,000 => 3268160001953743683783272702066311903448533894049486008426303248121757146615064636953144900245
174442911064952028008546304
50,000 => a very large number!
I agree with #SB that you should always state your assumptions: Mine is that you don't need to paste twice to double the number of characters. This gets the answer for 7, so unless my solution is wrong the assumption must be right.
In case someone wonders why I'm not checking sequences of the form Ctrl+A, Ctrl+C, A, Ctrl+V: The end result will always be the same as A, Ctrl+A, Ctrl+C, Ctrl+V which I do consider.

By using marcog's solution I found a pattern that starts at n=16. To illustrate this here are the keystrokes for n=24 up to n=29, I replaced ^A with S (select), ^C with C (copy), and ^V with P (paste) for readability:
24: A,A,A,A,S,C,P,P,P,S,C,P,P,P,S,C,P,P,P,S,C,P,P,P
4 * 4 * 4 * 4 * 4 = 1024
25: A,A,A,A,S,C,P,P,P,S,C,P,P,S,C,P,P,S,C,P,P,S,C,P,P
4 * 4 * 3 * 3 * 3 * 3 = 1296
26: A,A,A,A,S,C,P,P,P,S,C,P,P,P,S,C,P,P,S,C,P,P,S,C,P,P
4 * 4 * 4 * 3 * 3 * 3 = 1728
27: A,A,A,A,S,C,P,P,P,S,C,P,P,P,S,C,P,P,P,S,C,P,P,S,C,P,P
4 * 4 * 4 * 4 * 3 * 3 = 2304
28: A,A,A,A,S,C,P,P,P,S,C,P,P,P,S,C,P,P,P,S,C,P,P,P,S,C,P,P
4 * 4 * 4 * 4 * 4 * 3 = 3072
29: A,A,A,A,S,C,P,P,P,S,C,P,P,P,S,C,P,P,P,S,C,P,P,P,S,C,P,P,P
4 * 4 * 4 * 4 * 4 * 4 = 4096
After an initial 4 As, the ideal pattern is to select, copy, paste, paste, paste and repeat. This will multiply the number of As by 4 every 5 keystrokes. If this 5 keystroke pattern cannot consume the remaining keystrokes on its own some number of 4 keystroke patterns (SCPP) consume the final keystrokes, replacing SCPPP (or removing one of the pastes) as necessary. The 4 keystroke patterns multiply the total by 3 every 4 keystrokes.
Using this pattern here is some Python code that gets the same results as marcog's solution, but is O(1) edit: This is actually O(log n) due to exponentiation, thanks to IVlad for pointing that out.
def max_chars(n):
if n <= 15:
return (0, 1, 2, 3, 4, 5, 6, 9, 12, 16, 20, 27, 36, 48, 64, 81)[n]
e3 = (4 - n) % 5
e4 = n // 5 - e3
return 4 * (4 ** e4) * (3 ** e3)
Calculating e3:
There are always between 0 and 4 SCPP patterns at the end of the keystroke list, for n % 5 == 4 there are 4, n % 5 == 1 there are 3, n % 5 == 2 there are 2, n % 5 == 3 there are 1, and n % 5 == 4 there are 0. This can be simplified to (4 - n) % 5.
Calculating e4:
The total number of patterns increases by 1 whenever n % 5 == 0, as it turns out this number increases to exactly n / 5. Using floor division we can get the total number of patterns, the total number for e4 is the total number of patterns minus e3. For those unfamiliar with Python, // is the future-proof notation for floor division.

Here's how I would approach it:
assume CtrlA = select all
assume CtrlC = copy selection
assume CtrlV = paste copied selection
given some text, it takes 4 keystrokes to duplicate it:
CtrlA to select it all
CtrlC to copy it
CtrlV to paste (this will paste over the selection - STATE YOUR ASSUMPTIONS)
CtrlV to paste again which doubles it.
From there, you can consider doing 4 or 5 A's, then looping through the above. Note that doing ctrl + a, c, v, v will grow your text exponentially as you loop through. If remaining strokes < 4, just keep doing a CtrlV
The key to interviews # places like Google is to state your assumptions, and communicate your thinking. they want to know how you solve problems.

It's solveable in O(1): Like with the Fibonacci numbers, there is a formula to calculate the number of printed As (and the sequence of keystrokes):
1) We can simplify the problem description:
Having only [A],[C-a]+[C-c],[C-v] and an empty copy-paste-buffer
equals
having only [C-a]+[C-c],[C-v] and "A" in the copy-paste-buffer.
2) We can describe the sequence of keystrokes as a string of N chars out of {'*','V','v'}, where 'v' means [C-v] and '*' means [C-a] and 'V' means [C-c]. Example: "vvvv*Vvvvv*Vvvv"
The length of that string still equals N.
The product of the lengths of the Vv-words in that string equals the number of produced As.
3) Given a fixed length N for that string and a fixed number K of words, the outcome will be maximal iff all words have nearly equal lengths. Their pair-wise difference is not more than ±1.
Now, what is the optimal number K, if N is given?
4) Suppose, we want to increase the number of words by appending one single word of length L, then we have to reduce L+1 times any previous word by one 'v'. Example: "…*Vvvv*Vvvv*Vvvv*Vvvv" -> "…*Vvv*Vvv*Vvv*Vvv*Vvv"
Now, what is the optimal word length L?
(5*5*5*5*5) < (4*4*4*4*4)*4 , (4*4*4*4) > (3*3*3*3)*3
=> Optimal is L=4.
5) Suppose, we have a sufficient large N to generate a string with many words of length 4, but a few keystrokes are left; how should we use them?
If there are 5 or more left: Append another word with length 4.
If there are 0 left: Done.
If there are 4 left: We could either
a) append one word with length 3: 4*4*4*4*3=768.
b) or increase 4 words to lenght 5: 5*5*5*5=625. => Appending one word is better.
If there are 3 left: We could either
a) or append one word with length 3 by adjusting the previus word from length 4 to 3: 4*4*4*2=128 < 4*4*3*3=144.
b) increase 3 words to lenght 5: 5*5*5=125. => Appending one word is better.
If there are 2 left: We could either
a) or append one word with length 3 by adjusting the previus two words from length 4 to 3: 4*4*1=16 < 3*3*3=27.
b) increase 2 words to lenght 5: 5*5=25. => Appending one word is better.
If there is 1 left: We could either
a) or append one word with length 3 by adjusting the previus three words from length 4 to 3: 4*4*4*0=0 < 3*3*3*3=81.
b) increase one word to lenght 5: 4*4*5=80. => Appending one word is better.
6) Now, what if we don't have a "sufficient large N" to use the rules in 5)? We have to stick with plan b), if possible!
The strings for small N are:
1:"v", 2:"vv", 3:"vvv", 4:"vvvv"
5:"vvvvv" → 5 (plan b)
6:"vvvvvv" → 6 (plan b)
7:"vvv*Vvv" → 9 (plan a)
8:"vvvv*Vvv" → 12 (plan a)
9:"vvvv*Vvvv" → 16
10:"vvvv*Vvvvv" → 20 (plan b)
11:"vvv*Vvv*Vvv" → 29 (plan a)
12:"vvvv*Vvv*Vvv" → 36 (plan a)
13:"vvvv*Vvvv*Vvv" → 48 (plan a)
14:"vvvv*Vvvv*Vvvv" → 64
15:"vvv*Vvv*Vvv*Vvv" → 81 (plan a)
…
7) Now, what is the optimal number K of words in a string of length N?
If N < 7 then K=1 else if 6 < N < 11 then K=2 ; otherwise: K=ceil((N+1)/5)
Written in C/C++/Java: int K = (N<7)?(1) : (N<11)?(2) : ((N+5)/5);
And if N > 10, then the number of words with length 3 will be: K*5-1-N. With this, we can calculate the number of printed As:
If N > 10, the number of As will be: 4^{N+1-4K}·3^{5K-N-1}

Using CtrlA + CtrlC + CtrlV is an advantage only after 4 'A's.
So I would do something like this (in pseudo-BASIC-code, since you haven't specified any proper language):
// We should not use the clipboard for the first four A's:
FOR I IN 1 TO MIN(N, 4)
PRINT 'CLICK A'
NEXT
LET N1 = N - 4
// Generates the maximum number of pastes allowed:
FOR I IN 1 TO (N1 DIV 3) DO
PRINT 'CTRL-A'
PRINT 'CTRL-C'
PRINT 'CTRL-V'
LET N1 = N1 - 3
NEXT
// If we still have same keystrokes left, let's use them with simple CTRL-Vs
FOR I IN N1 TO N
PRINT 'CTRL-V'
NEXT
Edit
Back to using a single CtrlV in the main loop.
Added some comments to explain what I'm trying to do here.
Fixed an issue with the "first four A's" block.

It takes 3 keystrokes to double your number of As. It only makes sense to start doubling when you have 3 or more As already printed. You want your last allowed keystroke to be a CtrlV to make sure you are doubling the biggest number you can, so in order to align it we will fill in any extra keystrokes after the first three As at the beginning with more As.
for (i = 3 + n%3; i>0 && n>0; n--, i--) {
print("a");
}
for (; n>0; n = n-3) {
print("ctrl-a");
print("ctrl-c");
print("ctrl-v");
}
Edit:
This is terrible, I completely got ahead of myself and didn't consider multiple pastes for each copy.
Edit 2:
I believe pasting 3 times is optimal, when you have enough keystrokes to do it. In 5 keystrokes you multiply your number of As by 4. This is better than multiplying by 3 using 4 keystrokes and better than multiplying by 5 using 6 keystrokes. I compared this by giving each method the same number of keystrokes, enough so they each would finish a cycle at the same time (60), letting the 3-multiplier do 15 cycles, the 4-multiplier do 12 cycles, and the 5-multiplier do 10 cycles. 3^15 = 14,348,907, 4^12=16,777,216, and 5^10=9,765,625. If there are only 4 keystrokes left, doing a 3-multiplier is better than pasting 4 more times, essentially making the previous 4 multiplier become an 8-multiplier. If there are only 3 keystrokes left, a 2-multiplier is best.

Assume you have x characters in the clipboard and x characters in the text area; let's call it "state x".
Let's press "Paste" a few times (i denote it by m-1 for convenience), then "Select-all" and "Copy"; after this sequence, we get to "state m*x".
Here, we wasted a total of m+1 keystrokes.
So the asymptotic growth is (at least) something like f^n, where f = m^(1/(m+1)).
I believe it's the maximum possible asymptotic growth, though i cannot prove it (yet).
Trying various values of m shows that the maximum for f is obtained for m=4.
Let's use the following algorithm:
Press A a few times
Press Select-all
Press Copy
Repeat a few times:
Press Paste
Press Paste
Press Paste
Press Select-all
Press Copy
While any keystrokes left:
Press Paste
(not sure it's the optimal one).
The number of times to press A at the beginning is 3: if you press it 4 times, you miss the opportunity to double the number of A's in 3 more keystrokes.
The number of times to press Paste at the end is no more than 5: if you have 6 or more keystrokes left, you can use Paste, Paste, Paste, Select-all, Copy, Paste instead.
So, we get the following algorithm:
If (less than 6 keystrokes - special case)
While (any keystrokes left)
A
Else
First 5 keystrokes: A, A, A, Select-all, Copy
While (more than 5 keystrokes left)
Paste, Paste, Paste, Select-all, Copy
While (any keystrokes left)
Paste
(not sure it's the optimal one). The number of characters after executing this is something like
3 * pow(4, floor((n - 6) / 5)) * (2 + (n - 1) % 5).
Sample values: 1,2,3,4,5,6,9,12,15,18,24,36,48,60,72,96,144,192,240,288,...

What follows uses the OP's second edit that pasting does not replace existing text.
Notice a few things:
^A and ^C can be considered a single action that takes two keystrokes, since it never makes sense to do them individually. In fact, we can replace all instances of ^A^C with ^K^V, where ^K is a one-key "cut" operation (let's abbreviate it X). We shall see that dealing with ^K is much nicer than the two-cost ^A^C.
Let's assume that an 'A' starts in the clipboard. Then ^V (let's abbreviate it Y) is strictly superior to A and we can drop the latter from all consideration. (In the actual problem, if the clipboard starts empty, in what follows we'll just replace Y with A instead of ^V up until the first X.)
Every reasonable keystroke sequence can thus be interpreted as a group of Ys separated by Xs, for example YYYXYXYYXY. Denote by V(s) the number of 'A's produced by the sequence s. Then V(nXm) = V(n)*V(m), because X essentially replaces every Y in m with V(n) 'A's.
The copy-paste problem is thus isomorphic to the following problem: "using m+1 numbers which sum to N-m, maximimze their product." For example, when N=6, the answer is m=1 and the numbers (2,3). 6 = 2*3 = V(YYXYYY) = V(AA^A^C^V^V) (or V(YYYXYY) = V(AAA^A^C^V). )
We can make a few observations:
For a fixed value of m, the numbers to choose are ceil( (N-m)/(m+1) ) and floor( (N-m)/(m+1) ) (in whatever combination makes the sum work out; more specifically you will need (N-m) % (m+1) ceils and the rest floors). This is because, for a < b, (a+1)*(b-1) >= a*b.
Unfortunately I don't see an easy way to find the value of m. If this were my interview I would propose two solutions at this point:
Option 1. Loop over all possible m. An O(n log n) solution.
C++ code:
long long ipow(int a, int b)
{
long long val=1;
long long mul=a;
while(b>0)
{
if(b%2)
val *= mul;
mul *= mul;
b/=2;
}
return val;
}
long long trym(int N, int m)
{
int floor = (N-m)/(m+1);
int ceil = 1+floor;
int numceils = (N-m)%(m+1);
return ipow(floor, m+1-numceils) * ipow(ceil, numceils);
}
long long maxAs(int N)
{
long long maxval=0;
for(int m=0; m<N; m++)
{
maxval = std::max(maxval, trym(N,m));
}
return maxval;
}
Option 2. Allow m to attain non-integer values and find its optimal value by taking the derivative of [(N-m)/(m+1)]^m with respect to m and solving for its root. There is no analytic solution, but the root can be found using e.g. Newton's method. Then use the floor and ceiling of that root for the value of m, and choose whichever is best.

public int dp(int n)
{
int arr[] = new int[n];
for (int i = 0; i < n; i++)
arr[i] = i + 1;
for (int i = 2; i < n - 3; i++)
{
int numchars = arr[i] * 2;
int j = i + 3;
arr[j] = Math.max(arr[j], numchars);
while (j < n - 1)
{
numchars = numchars + arr[i];
arr[++j] = Math.max(arr[j], numchars);
}
}
return arr[n - 1];
}

Here is my approach and solution with code below.
Approach:
There are three distinct operations that can be performed.
Keystroke A - Outputs one character 'A'
Keystroke (Ctrl-A) + (Ctrl-C) - Outputs nothing essentially. These two keystrokes can be combined into one operation because each of these keystrokes individually make no sense. Also, this keystroke sets up the output for the next paste operation.
Keystroke (Ctrl-V) - Output for this keystroke really depends on the previous (second) operation and hence we would need to account for that in our code.
Now given the three distinct operations and their respective outputs, we have to run through all the permutations of these operations.
Assumption:
Now, some version of this problem states that the sequence of keystrokes, Ctrl+A -> Ctrl+C -> Ctrl+V, overwrite the highlighted selection. To factor in this assumption, only one line of code needs to be added to the solution below where the printed variable in case 2 is set to 0
case 2:
//Ctrl-A and then Ctrl-C
if((count+2) < maxKeys)
{
pOutput = printed;
//comment the below statement to NOT factor
//in the assumption described above
printed = 0;
}
For this solution
The code below will print a couple of sequences and the last sequence is the correct answer for any given N. e.g. for N=11 this will be the correct sequence
With the assumption
A, A, A, A, A, C, S, V, V, V, V, :20:
Without the assumption
A, A, A, C, S, V, V, C, S, V, V, :27:
I have decided to retain the assumption for this solution.
Keystroke Legend:
'A' - A
'C' - Ctrl+A
'S' - Ctrl+C
'V' - Ctrl+V
Code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void maxAprinted(int count, int maxKeys, int op, int printed, int pOutput, int *maxPrinted, char *seqArray)
{
if(count > maxKeys)
return;
if(count == maxKeys)
{
if((*maxPrinted) < printed)
{
//new sequence found which is an improvement over last sequence
(*maxPrinted) = printed;
printf("\n");
int i;
for(i=0; i<maxKeys; i++)
printf(" %c,",seqArray[i]);
}
return;
}
switch(op)
{
case 1:
//A keystroke
printed++;
seqArray[count] = 'A';
count++;
break;
case 2:
//Ctrl-A and then Ctrl-C
if((count+2) < maxKeys)
{
pOutput = printed;
//comment the below statement to NOT factor
//in the assumption described above
printed = 0;
}
seqArray[count] = 'C';
count++;
seqArray[count] = 'S';
count++;
break;
case 3:
//Ctrl-V
printed = printed + pOutput;
seqArray[count] = 'V';
count++;
break;
}
maxAprinted(count, maxKeys, 1, printed, pOutput, maxPrinted, seqArray);
maxAprinted(count, maxKeys, 2, printed, pOutput, maxPrinted, seqArray);
maxAprinted(count, maxKeys, 3, printed, pOutput, maxPrinted, seqArray);
}
int main()
{
const int keyStrokes = 11;
//this array stores the sequence of keystrokes
char *sequence;
sequence = (char*)malloc(sizeof(char)*(keyStrokes + 1));
//stores the max count for As printed for a sqeuence
//updated in the recursive call.
int printedAs = 0;
maxAprinted(0, keyStrokes, 1, 0, 0, &printedAs, sequence);
printf(" :%d:", printedAs);
return 0;
}

Using the tricks mentioned in answers above, Mathematically, Solution can be explained in one equation as,
4 + 4^[(N-4)/5] + ((N-4)%5)*4^[(N-4)/5].
where [] is greatest integer factor

There is a trade-off between printing m-A's manually, then using Ctrl+A, Ctrl+C, and N-m-2 Ctrl+V. The best solution is in the middle. If max key strokes = 10, the best solution is typing 5 A's or 4 A's.
try using this Look at this http://www.geeksforgeeks.org/how-to-print-maximum-number-of-a-using-given-four-keys/ and maybe optimize a bit looking for the results around the mid point.

Here is my solution with dynamic programming, without a nested loop, and which also prints the actual characters that you'd need to type:
N = 52
count = [0] * N
res = [[]] * N
clipboard = [0] * N
def maybe_update(i, new_count, new_res, new_clipboard):
if new_count > count[i] or (
new_count == count[i] and new_clipboard > clipboard[i]):
count[i] = new_count
res[i] = new_res
clipboard[i] = new_clipboard
for i in range(1, N):
# First option: type 'A'.
# Using list concatenation for 'res' to avoid O(n^2) string concatenation.
maybe_update(i, count[i - 1] + 1, res[i - 1] + ['A'], clipboard[i - 1])
# Second option: type 'CTRL+V'.
maybe_update(i, count[i - 1] + clipboard[i - 1], res[i - 1] + ['v'],
clipboard[i - 1])
# Third option: type 'CTRL+A, CTRL+C, CTRL+V'.
# Assumption: CTRL+V always appends.
if i >= 3:
maybe_update(i, 2 * count[i - 3], res[i - 3] + ['acv'], count[i - 3])
for i in range(N):
print '%2d %7d %6d %-52s' % (i, count[i], clipboard[i], ''.join(res[i]))
This is the output ('a' means 'CTRL+A', etc.)
0 0 0
1 1 0 A
2 2 0 AA
3 3 0 AAA
4 4 0 AAAA
5 5 0 AAAAA
6 6 3 AAAacv
7 9 3 AAAacvv
8 12 3 AAAacvvv
9 15 3 AAAacvvvv
10 18 9 AAAacvvacv
11 27 9 AAAacvvacvv
12 36 9 AAAacvvacvvv
13 45 9 AAAacvvacvvvv
14 54 27 AAAacvvacvvacv
15 81 27 AAAacvvacvvacvv
16 108 27 AAAacvvacvvacvvv
17 135 27 AAAacvvacvvacvvvv
18 162 81 AAAacvvacvvacvvacv
19 243 81 AAAacvvacvvacvvacvv
20 324 81 AAAacvvacvvacvvacvvv
21 405 81 AAAacvvacvvacvvacvvvv
22 486 243 AAAacvvacvvacvvacvvacv
23 729 243 AAAacvvacvvacvvacvvacvv
24 972 243 AAAacvvacvvacvvacvvacvvv
25 1215 243 AAAacvvacvvacvvacvvacvvvv
26 1458 729 AAAacvvacvvacvvacvvacvvacv
27 2187 729 AAAacvvacvvacvvacvvacvvacvv
28 2916 729 AAAacvvacvvacvvacvvacvvacvvv
29 3645 729 AAAacvvacvvacvvacvvacvvacvvvv
30 4374 2187 AAAacvvacvvacvvacvvacvvacvvacv
31 6561 2187 AAAacvvacvvacvvacvvacvvacvvacvv
32 8748 2187 AAAacvvacvvacvvacvvacvvacvvacvvv
33 10935 2187 AAAacvvacvvacvvacvvacvvacvvacvvvv
34 13122 6561 AAAacvvacvvacvvacvvacvvacvvacvvacv
35 19683 6561 AAAacvvacvvacvvacvvacvvacvvacvvacvv
36 26244 6561 AAAacvvacvvacvvacvvacvvacvvacvvacvvv
37 32805 6561 AAAacvvacvvacvvacvvacvvacvvacvvacvvvv
38 39366 19683 AAAacvvacvvacvvacvvacvvacvvacvvacvvacv
39 59049 19683 AAAacvvacvvacvvacvvacvvacvvacvvacvvacvv
40 78732 19683 AAAacvvacvvacvvacvvacvvacvvacvvacvvacvvv
41 98415 19683 AAAacvvacvvacvvacvvacvvacvvacvvacvvacvvvv
42 118098 59049 AAAacvvacvvacvvacvvacvvacvvacvvacvvacvvacv
43 177147 59049 AAAacvvacvvacvvacvvacvvacvvacvvacvvacvvacvv
44 236196 59049 AAAacvvacvvacvvacvvacvvacvvacvvacvvacvvacvvv
45 295245 59049 AAAacvvacvvacvvacvvacvvacvvacvvacvvacvvacvvvv
46 354294 177147 AAAacvvacvvacvvacvvacvvacvvacvvacvvacvvacvvacv
47 531441 177147 AAAacvvacvvacvvacvvacvvacvvacvvacvvacvvacvvacvv
48 708588 177147 AAAacvvacvvacvvacvvacvvacvvacvvacvvacvvacvvacvvv
49 885735 177147 AAAacvvacvvacvvacvvacvvacvvacvvacvvacvvacvvacvvvv
50 1062882 531441 AAAacvvacvvacvvacvvacvvacvvacvvacvvacvvacvvacvvacv
51 1594323 531441 AAAacvvacvvacvvacvvacvvacvvacvvacvvacvvacvvacvvacvv

If N key Strokes are allowed, then the result is N-3.
A's -> N-3
CTRL+A -> Selecting those N Characters :+1
CTRL+C -> Copying those N Characters :+1
Ctrl+V -> Pasting the N Characters. :+1
i.e., (Since we have selected the whole characters using CTRL+A) Replacing these existing N-3 characters with the copied N-3 Characters(which is overriding the same characters) and the result is N-3.

Related

Xor using bitwise operations

Say I have a fn taking two ints A,B
Let binary rep of A be a0,a1,a2....an, and that of B be b0,b1,b2...bn .
I wish to return this ((a0 * b0) ^ (a1 * b1) ^ ..... ^ (an * bn)).
But the challenge is to achieve this without bit conversions i.e. using integers. How can I achieve this?
PS: I know A & B gives me a number. When this number is converted to binary and its elements are xorred amongst each other, I would get my answer. But I do not wish to convert the anded result to binary using bin() for faster computation.
def find( int A,int B):
multiply= A & B
list= bin(multiply)[2:] #(this step I wish to avoid cuz the product can be super large and the binary string is long)
list = [int(d) for d in list]
innerprod = (reduce(lambda i, j: int(i) ^ int(j), list))
return innerprod
First bitwise "and" (&) the numbers to get a number whose bits are the bits of the input numbers, multiplied respectively.
You can use Kernighan's algorithm to count the number of bits that are set to 1 (see link for description), in this number.
Then, mod 2 the result, because XOR is just flipping the result every time a bit set to 1 is encountered (so, an even number of 1's XOR'd together is 0, and an odd number will be 1).
Example:
7 is '111' and 5 is '101'
7 & 5 is '101' (bitwise "and" operation)
Two bits in '101' are set to 1 (so count_set_bits returns 2)
2 modulo 2 is 0.
(1 * 1) ^ (1 * 0) ^ (1 * 1) is 0
def count_set_bits(n):
count = 0
while n:
n &= (n-1)
count += 1
return count
def compute_answer(a, b):
return count_set_bits(a & b) % 2
print(compute_answer(7, 5)) # 0
print(compute_answer(37, 3)) # 1

Given a string, counting the number of permutations of the string with no repetitions(and forbidden characters)

I have been hitting my head against an algorithmic problem for a couple of hours now.
The (fancy) statement of the problem is as follows:
Our garden contains a single row of flowers.You are given the current contents of the row in the String garden. Each character in garden represents one flower. Different characters represent different colors.Flowers of the same color all look the same. You may rearrange the flowers in your garden into any order you like. (Formally, you may swap any two flowers in your garden, and you can do so arbitrarily many times.) You are also given a String forbid of the same length as garden.You want to rearrange garden into a new string G that will satisfy the following conditions :
No two adjacent flowers will have the same color.Formally, for each valid i, G[i] and G[i + 1] must differ.
For each valid i, G[i] must not be equal to forbid[i].
Let X be the number of different strings G that satisfy all conditions given above.Compute and return the number(X modulo 1, 000, 000, 007).
Just to clarify with an example: X("aaabbb", "cccccc") = 2 ("ababab" and "bababa")
I have been trying by counting how many letters are in the string ( 'a'->3, 'b'->4, in the example) and then recursively counting the different possibilities (skipping if there is a repetition or a forbidden letter). Something on these lines:
using Map = std::map < char, size_t > ;
Map hist;
std::string forbid;
size_t countRecursive(std::string s, size_t len)
{
if (len == 0)
return 1;
size_t curPos = s.size() ;
size_t count(0);
for (auto &p : hist) {
auto key = p.first;
if (hist[key] == 0) continue;
if (forbid[curPos] == key) continue;
if (curPos > 0 && s[curPos - 1] == key) continue;
hist[key]--;
count += countRecursive(s + key, len - 1);
hist[key]++;
}
return count;
}
Where hist and forbid are previously initialized. However, this appears to be n! and since n can be <= 15, it really explodes in complexity.
I am not really looking for a complete solution. Only, if you had any kind of suggestion about the way I should approach the problem, I would be highly thankful!
I'd approach it as follows: your 'forbidden' string is as long as the 'garden'. This means that given an alphabet of N characters, each position G[i] can have at most N-1 possible characters (since one will be forbidden). This gives you an upper bound that's limited only by N. If that bound is less than the modulo it might lead to some interesting considerations, but let's move forward.
Now a very basic approach is counting the combinations: if the garden is K characters long, the first item G[0] will have N-1 possibilities; the second one G[1] will have N-2 possibilities if forbidden[1] is different than G[0], N-1 if forbidden[1] == G[0]. The third character G[2] will too have N-2 possibilities depending on forbidden[2] and G[1], and so on.
For clarity: N-2 comes from the fact that of the N-1 possibilities, another one must be removed that is the value of the character preceding it in the string, unless such character matches the forbidden character of the current position.
So if forbidden[i+1] is always different from G[i] you have N-1 * N-2 * N-2 * ... * N-2, K times. This is your lower bound.
Now between upper and lower bound there is a number of strings where e.g. forbidden[i+1] is equal to G[i] only for the second position; for the second and for the third; etc. So your number of strings is:
N-1 * N-2 * N-2 * N-2 ... K
N-1 * N-1 * N-2 * N-2 ... K
N-1 * N-1 * N-1 * N-2 ... K
and so on until you have a string where each character can have N-1 possibilities.
In other words,
N-1 * (N-2)^K-1
(N-1)^2 * (N-2)^K-2
(N-1)^3 * (N-2)^K-3
How many of those strings can you have? It depends how big is K, i.e. how large is your garden :)
That is, assuming I understood the problem correctly.

In Place Run Length Encoding Algorithm

I encountered an interview question:
Given a input String: aaaaabcddddee, convert it to a5b1c1d4e2.
One extra constraint is, this needs to be done in-place, means no extra space(array) should be used.
It is guaranteed that the encoded string will always fit in the original string. In other words, string like abcde will not occur, since it will be encoded to a1b1c1d1e1 which occupies more space than the original string.
One hint interviewer gave me was to traverse the string once and find the space that is saved.
Still I am stuck as some times, without using extra variables, some values in the input string may be overwritten.
Any suggestions will be appreciated?
This is a good interview question.
Key Points
There are 2 key points:
Single character must be encoded as c1;
The encoded length will always be smaller than the original array.
Since 1, we know each character requires at least 2 places to be encoded. This is to say, only single character will require more spaces to be encoded.
Simple Approach
From the key points, we notice that the single character causes us a lot problem during the encoding, because they might not have enough place to hold the encoded string. So how about we leave them first, and compressed the other characters first?
For example, we encode aaaaabcddddee from the back while leaving the single character first, we will get:
aaaaabcddddee
_____a5bcd4e2
Then we could safely start from the beginning and encoding the partly encoded sequence, given the key point 2 such that there will be enough spaces.
Analysis
Seems like we've got a solution, are we done? No. Consider this string:
aaa3dd11ee4ff666
The problem doesn't limit the range of characters, so we could use digit as well. In this case, if we still use the same approach, we will get this:
aaa3dd11ee4ff666
__a33d212e24f263
Ok, now tell me, how do you distinguish the run-length from those numbers in the original string?
Well, we need to try something else.
Let's define Encode Benefit (E) as: the length difference between the encoded sequence and the original consecutive character sequence..
For example, aa has E = 0, since aa will be encoded to a2, and they have no length difference; aaa has E = 1, since it will be encoded as a3, and the length difference between the encoded and the original is 1. Let's look at the single character case, what's its E? Yes, it's -1. From the definition, we could deduce the formula for E: E = ori_len - encoded_len.
Now let's go back to the problem. From key point 2, we know the encoded string will always be shorter than the original one. How do we use E to rephrase this key point?
Very simple: sigma(E_i) >= 0, where E_i is the Encode Benefit of the ith consecutive character substring.
For example, the sample you gave in your problem: aaaaabcddddee, can be broken down into 5 parts:
E(0) = 5 - 2 = 3 // aaaaa -> a5
E(1) = 1 - 2 = -1 // b -> b1
E(2) = 1 - 2 = -1 // c -> c1
E(3) = 4 - 2 = 2 // dddd -> d4
E(4) = 2 - 2 = 0 // ee -> e2
And the sigma will be: 3 + (-1) + (-1) + 2 + 0 = 3 > 0. This means there will be 3 spaces left after encoding.
However, from this example, we could see a potential problem: since we are doing summing, even if the final answer is bigger than 0, it's possible to get some negatives in the middle!
Yes, this is a problem, and it's quite serious. If we get E falls below 0, this means we do not have enough space to encode the current character and will overwrite some characters after it.
But but but, why do we need to sum it from the first group? Why can't we start summing from somewhere in the middle to skip the negative part? Let's look at an example:
2 0 -1 -1 -1 1 3 -1
If we sum up from the beginning, we will fall below 0 after adding the third -1 at index 4 (0-based); if we sum up from index 5, loop back to index 0 when we reach the end, we have no problem.
Algorithm
The analysis gives us an insight on the algorithm:
Start from the beginning, calculate E of the current consecutive group, and add to the total E_total;
If E_total is still non-negative (>= 0), we are fine and we could safely proceed to the next group;
If the E_total falls below 0, we need to start over from the current position, i.e. clear E_total and proceed to the next position.
If we reach the end of the sequence and E_total is still non-negative, the last starting point is a good start! This step takes O(n) time. Usually we need to loop back and check again, but since key point 2, we will definitely have a valid answer, so we could safely stop here.
Then we could go back to the starting point and start traditional run-length encoding, after we reach the end we need to go back to the beginning of the sequence to finish the first part. The tricky part is, we need to make use the remaining spaces at the end of the string. After that, we need to do some shifting just in case we have some order issues, and remove any extra white spaces, then we are finally done :)
Therefore, we have a solution (the code is just a pseudo and hasn't been verified):
// find the position first
i = j = E_total = pos = 0;
while (i < s.length) {
while (s[i] == s[j]) j ++;
E_total += calculate_encode_benefit(i, j);
if (E_total < 0) {
E_total = 0;
pos = j;
}
i = j;
}
// do run length encoding as usual:
// start from pos, end with len(s) - 1, the first available place is pos
int last_available_pos = runlength(s, pos, len(s)-1, pos);
// a tricky part here is to make use of the remaining spaces from the end!!!
int fin_pos = runlength(s, 0, pos-1, last_available_pos);
// eliminate the white
eliminate(s, fin_pos, pos);
// update last_available_pos because of elimination
last_available_pos -= pos - fin_pos < 0 ? 0 : pos - fin_pos;
// rotate back
rotate(s, last_available_pos);
Complexity
We have 4 parts in the algorithm:
Find the starting place: O(n)
Run-Length-Encoding on the whole string: O(n)
White space elimination: O(n)
In place string rotation: O(n)
Therefore we have O(n) in total.
Visualization
Suppose we need to encode this string: abccdddefggggghhhhh
First step, we need to find the starting position:
Group 1: a -> E_total += -1 -> E_total = -1 < 0 -> E_total = 0, pos = 1;
Group 2: b -> E_total += -1 -> E_total = -1 < 0 -> E_total = 0, pos = 2;
Group 3: cc -> E_total += 0 -> E_total = 0 >= 0 -> proceed;
Group 4: ddd -> E_total += 1 -> E_total = 1 >= 0 -> proceed;
Group 5: e -> E_total += -1 -> E_total = 0 >= 0 -> proceed;
Group 6: f -> E_total += -1 -> E_total = -1 < 0 -> E_total = 0, pos = 9;
Group 7: ggggg -> E_total += 3 -> E_total = 3 >= 0 -> proceed;
Group 8: hhhhh -> E_total += 3 -> E_total = 6 >= 0 -> end;
So the start position will be 9:
v this is the starting point
abccdddefggggghhhhh
abccdddefg5h5______
^ last_available_pos, we need to make use of these remaining spaces
abccdddefg5h5a1b1c2
d3e1f1___g5h5a1b1c2
^^^ remove the white space
d3e1f1g5h5a1b1c2
^ last_available_pos, rotate
a1b1c2d3e1f1g5h5
Last Words
This question is not trivial, and actually glued several traditional coding interview questions together naturally. A suggested mind flow would be:
observe the pattern and figure out the key points;
realize the reason for insufficient space is because of encoding single character;
quantize the benefit/cost of encoding on each consecutive characters group (a.k.a Encoding Benefit);
use the quantization you proposed to explain the original statement;
figure out the algorithm to find a good starting point;
figure out how to do run-length-encoding with a good starting point;
realize you need to rotate the encoded string and eliminate the white spaces;
figure out the algorithm to do in place string rotation;
figure out the algorithm to do in place white space elimination.
To be honest, it's a bit challenging for an interviewee to come up with a solid algorithm in a short time, so your analysis flow really matters. Don't say nothing, show your mind flow, this helps the interviewer to find out your current stage.
Maybe just encode it normally, but if you see that your output index overtakes the input index, just skip the "1". Then when you finish go backwards and insert 1 after all letters without a count, shifting the rest of the string back. It is O(N^2) in the worst case (no repeating letters), so I assume there might be better solutions.
EDIT: it appears I missed the part that the final string always fits into the source. With that restriction, yeah, this is not the optimal solution.
EDIT2: an O(N) version of it would be during the first pass also compute the final compressed length (which in the general case might be more than the source), set pointer p1 to it, a pointer p2 to the compressed string with 1s omitted (p2 is thus <= p1), then just keep going backwards on both pointers, copying p2 to p1 and adding 1s when necessary (when this happens the difference between p2 and p1 will decrease)
O(n) and in place
set var = 0;
Loop from 1-length and find the first non-matching character.
The count would be the difference of the indices of both characters.
Let's run through an example
s = "wwwwaaadexxxxxxywww"
add a dummy letter to s
s = s + '#'
now our string becomes
s = "wwwwaaadexxxxxxywww#"
we'll come back to this step later.
j gives the first character of the string.
j = 0 // s[j] = w
now loop through 1 - length. The first non-matching character is 'a'
print(s[j], i - j) // i = 4, j = 0
j = i // j = 4, s[j] = a
Output: w4
i becomes the next non-matching character which would be 'd'
print(s[j], i - j) // i = 7, j = 4 => a3
j = i // j = 7, s[j] = d
Output: w4a3
.
. (Skipping to the second last)
.
j = 15, s[j] = y, i = 16, s[i] = w
print(s[j], i - y) => y1
Output: w4a3d1e1x6y1
Okay so now we reached the last, assume that we didn't add any dummy letter
j = 16, s[j] = w and we cannot print it's count
because we've no 'mis-matching' character
That's why need to add a dummy letter.
Here's a C++ implementation
void compress(string s){
int j = 0;
s = s + '#';
for(int i=1; i < s.length(); i++){
if(s[i] != s[j]){
cout << s[j] << i - j;
j = i;
}
}
}
int main(){
string s = "wwwwaaadexxxxxxywww";
compress(s);
return 0;
}
Output: w4a3d1e1x6y1w3
If the use of insert and erase string functions are allowed then you can efficiently get the solution with this implementation.
#include<bits/stdc++.h>
using namespace std;
int dig(int n){
int k=0;
while(n){
k++;
n/=10;
}
return k;
}
void stringEncoding(string &n){
int i=0;
for(int i=0;i<n.size();i++){
while(n[i]==n[i+j])j++;
n.erase((i+1),(j-1));
n.insert(i+1,to_string(j));
i+=(dig(j));
}
}
int main(){
ios_base::sync_with_stdio(0), cin.tie(0);
string n="kaaaabcddedddllllllllllllllllllllllp";
stringEncoding(n);
cout<<n;
}
This will give the following output : k1a4b1c1d2e1d3l22p1

How do I convert a 4 digit number into individual digits?

I need to write logic to break down a 4 digit number into individual digits.
On a reply here at SO to a question regarding 3 digits, someone gave the math below:
int first = 321/100;
int second = (321/10)-first*10;
int third = (321/1)-first*100-second*10;
Can someone help me?
Thank you in advance!
Well, using the sample you found, we can quite easily infer a code for you.
The first line says int first = 321/100;, which returns 3 (integer division is the euclidian one). 3 is indeed the first integer in 321 so that's a good thing. However, we have a 4 digit number, let's try replacing 100 with 1000:
int first = 4321/1000;
This does return 4 !
Let's try adapting the rest of your code (plus I put your four digit number in the variable entry).
int entry = 4321;
int first = entry/1000;
int second = entry/100 - first*10;
int third = entry/10 - first*100 - second*10;
int fourth = entry - first*1000 - second*100 - third*10;
second will be entry/100 (43) minus first*10 (40), so we're okay.
third is then 432 - 400 - 30 which turns to 2. This also works till fourth.
For more-than-four digits, you may want to use a for-loop and maybe some modulos though.
This snip of code counts the number of digits input from the user
then breaks down the digits one by one:
PRINT "Enter value";
INPUT V#
X# = V#
DO
IF V# < 1 THEN
EXIT DO
END IF
D = D + 1
V# = INT(V#) / 10
LOOP
PRINT "Digits:"; D
FOR L = D - 1 TO 0 STEP -1
M = INT(X# / 10 ^ L)
PRINT M;
X# = X# - M * 10 ^ L
NEXT
END

Mapping unique combinations to numbers

I am trying to come up with a solution to a problem I thought of. I have the number of permutations of 26 characters with 6 possible spots as 26^6 = 308 915 776. I was trying to make a way so that I could map each number to a unique combination and be able to go back and forth from combination to number.
An example:
1 = aaaaaa
2 = aaaaab
27 = aaaaba
Is it possible to write a polynomial time algorithm that would convert between the two and/or are there any efficient examples of what I am trying to do.
This is just base conversion my friend.
Since you didn't specify a language, the following is pseudo-code with array indexing and string indexing starting at 0 and assignment is :=.
if you let 'a' be 0, and 'z' be 25, then to convert from base 26 to base 10:
total:= 0
loop index from 0 to 5
temp:= 'z' - input[index] // Left to right. Single base 26 digit to base 10
total:= 26 * total + temp // Shift left and add the converted digit
increment index and goto loop start
To go back to letters (base 26) is also easy:
result:= ''
loop index from 0 to 5
temp:= 'a' + input mod 26 // Input modulus 26 is the base 26 digit to add next
result:= temp + result // Append current result to the new base 26 digit
input:= input div 26 // Divide input by 26, throw away the remainder
increment index and goto loop start
If you want all a's to be 1, then add one after converting from base 26 to base 10 and subtract 1 before converting from base 10 to base 26. Personally, I'd let all a's be 0.
You could map it via pointers into a double:
char *example = "abcdef";
double d = 0;
char *p = (char *)&d;
for (int i=0; i<6; i++)
p[i] = example[i];
// d is your code
It's not so beautiful and not 100% allowed, but it works.

Resources