Computing the minimum password modifications required - string

I have spent over an hour on this question, and am struggling coming up with a valid strategy to solve this. I feel like everything I come up with overcomplicates the question.
The question is:
A password detection system accounts detects a password as similar if the number of vowels is equal to the number of consonants in the password. Passwords consist of lowercase English characters only, and vowels are ('a', 'e', 'i', 'o', 'u').
To check the strength of a password and how easily it can be hacked, some modifications can be made to the password. In one operation, any character of the string can either be incremented to decremented. For example, 'f' can be incremented to 'g' or decremented to 'e'. Note that character 'a' cannot be decremented and 'z' cannot be incremented.
Find the minimum number of operations in which the password can be made similar
Example
Consider the password = "hack". The "h" can be changed to "i" in once operation. The resultant string is "iack" which as 2 vowels ("i", "a") and 2 consonants ("c", "k") and hence the string is similar. Return 1, the minimum number of operations required.
My strategy was to count the number of vowels and consonants there are in the given password string, subtract one from the other, and see if I get a positive result or a negative, which will tell me if it's worth to attempt to change vowels or consonants. However, upon implementation, I ran into other issues and got stuck. I am not even sure if this approach would work in the first place.

Related

Palindrome Validity Proof of Correctness

Leetcode Description
Given a string of length n, you have to decide whether the string is a palindrome or not, but you may delete at most one character.
For example: "aba" is a palindrome.
"abca" is also valid, since we may remove either "b" or "c" to get a palindrome.
I have seen many solutions that take the following approach.
With two pointers left and right initialized to the start and the end characters of the string, respectively, keep incrementing left and decrementing right synchronously as long as the two characters pointed by left and right are equal.
The first time we run into a mismatch between the characters pointed by left and right, and say these are specifically indices i and j, we simply check whether string[i..j-1] or string[i+1..j] is a palindrome.
I clearly see why this works, but one thing that's bothering me is the direction of the approach that we take when we first see the mismatch.
Assuming we do not care about time efficiency and only focus on correctness, I cannot see what prevents us from trying to delete a character in string[0..i-1] or string[j+1..n-1] and try to look whether the entire resulting string can become a palindrome or not?
More specifically, if we take the first approach and see that both string[i..j-1] and string[i+1..j] are not palindromes, what prevents us from backtracking to the second approach I described and see if deleting a character from string[0..i-1] or string[j+1..n-1] will yield a palindrome instead?
Can we mathematically prove why this approach is useless or simply incorrect?

Splitting a string into words with dynamic programming

In this problem we've to split a string into meaningful words. We're given a dictionary to see If the word exists or not.
I"ve seen some other approaches here at How to split a string into words. Ex: "stringintowords" -> "String Into Words"?.
I thought of a different approach and was wondering If it would work or not.
Example- itlookslikeasentence
Algorithm
Each letter of the string corresponds to a node in a DAG.
Initialize a bool array to False.
At each node we have a choice- If the addition of the present letter to the previous subarray still produces a valid word then add it, if it does not then we will begin a new word from that letter and set bool[previous_node]=True indicating that a word ended there. In the above example bool[1] would be set to true.
This is something similar to the maximum subarray sum problem.
Would this algorithm work?
No, it wouldn't. You solution takes the longest possible word at every step, which doesn't always work.
Here is counterexample:
Let's assume that the given string is aturtle. Your algorithm will take a. Then it will take t as at is valid word. atu is not a word, so it'll split the input: at + urtle. However, there is no way to split urtle into a sequence of valid English words. The right answer would be a + turtle.
One of the possible correct solutions uses dynamic programming. We can define a function f such that f(i) = true iff it's possible to split the first i characters of the input into a valid sequence of words. Initially, f(0) = true and the rest of the values are false. There is a transition from f(l) to f(r) if s[l + 1, r] is a valid word for all valid l and r.
P.S. Other types of greedy algorithms would not work here either. For instance, if you take the shortest word instead of the longest one, it fails to work on, for instance, the input atnight: there is no way to split tnight after the a is stripped off, but at + night is clearly a valid answer.

Number of different valid passwords of length n with some constraints, at least one upper case letter

Consider the following password policy: a valid password is one where each character in the password is either a lower-case letter (a-z) or an upper case letter (A-Z), and there must be at least one upper case letter in the password. Given n ≥ 1, how many different valid passwords of length n are there?
I know for A-Z, there are 26 possibilities for each character,
and same goes for a-z.
So, if there is at least 1 upper-case letter will be 26^1 + 52^2 + ... + 52^n ?
I am not good in mathematics, and tried Google about it but still could not solve this question. I am a beginner in Computer Security module, please assist me.
Thank you in advance.
This is a mathematics question not a programming/coding question
There are 52 possibilities for each character. For n characters, that means 52^n possible combinations without the constraint. Of these, 26^n are all lower case letters and thus violate your constraint. So your answer is 52^n - 26^n.
But honestly, this question is of little practical value. 99.9% of people do not choose their passwords randomly amongst the set of possibilities. Instead, most will choose their password as having exactly one upper case letter (usually the first or the last) and the remaining lower case. So don't let the mathematics give you a false sense of confidence!

Security: longer keys versus more available characters

I apologize if this has been answered before, but I was not able to find anything. This question was inspired by a comment on another security-related question here on SO:
How to generate a random, long salt for use in hashing?
The specific comment is as follows (sixth comment of accepted answer):
...Second, and more importantly, this will only return hexadecimal
characters - i.e. 0-9 and A-F. It will never return a letter higher
than an F. You're reducing your output to just 16 possible characters
when there could be - and almost certainly are - many other valid
characters.
– AgentConundrum Oct 14 '12 at 17:19
This got me thinking. Say I had some arbitrary series of bytes, with each byte being randomly distributed over 2^(8). Let this key be A. Now suppose I transformed A into its hexadecimal string representation, key B (ex. 0xde 0xad 0xbe 0xef => "d e a d b e e f").
Some things are readily apparent:
len(B) = 2 len(A)
The symbols in B are limited to 2^(4) discrete values while the symbols in A range over 2^(8)
A and B represent the same 'quantities', just using different encoding.
My suspicion is that, in this example, the two keys will end up being equally as secure (otherwise every password cracking tool would just convert one representation to another for quicker attacks). External to this contrived example, however, I suspect there is an important security moral to take away from this; especially when selecting a source of randomness.
So, in short, which is more desirable from a security stand point: longer keys or keys whose values cover more discrete symbols?
I am really interested in the theory behind this, so an extra bonus gold star (or at least my undying admiration) to anyone who can also provide the math / proof behind their conclusion.
If the number of different symbols usable in your password is x, and the length is y, then the number of different possible passwords (and therefore the strength against brute-force attacks) is x ** y. So you want to maximize x ** y. Both adding to x or adding to y will do that, Which one makes the greater total depends on the actual numbers involved and what your practical limits are.
But generally, increasing x gives only polynomial growth while adding to y gives exponential growth. So in the long run, length wins.
Let's start with a binary string of length 8. The possible combinations are all permutations from 00000000 and 11111111. This gives us a keyspace of 2^8, or 256 possible keys. Now let's look at option A:
A: Adding one additional bit.
We now have a 9-bit string, so the possible values are between 000000000 and 111111111, which gives us a keyspace size of 2^9, or 512 keys. We also have option B, however.
B: Adding an additional value to the keyspace (NOT the keyspace size!):
Now let's pretend we have a trinary system, where the accepted numbers are 0, 1, and 2. Still assuming a string of length 8, we have 3^8, or 6561 keys...clearly much higher.
However! Trinary does not exist!
Let's look at your example. Please be aware I will be clarifying some of it, which you may have been confused about. Begin with a 4-BYTE (or 32-bit) bitstring:
11011110 10101101 10111110 11101111 (this is, btw, the bitstring equivalent to 0xDEADBEEF)
Since our possible values for each digit are 0 or 1, the base of our exponent is 2. Since there are 32 bits, we have 2^32 as the strength of this key. Now let's look at your second key, DEADBEEF. Each "digit" can be a value from 0-9, or A-F. This gives us 16 values. We have 8 "digits", so our exponent is 16^8...which also equals 2^32! So those keys are equal in strength (also, because they are the same thing).
But we're talking about REAL passwords, not just those silly little binary things. Consider an alphabetical password with only lowercase letters of length 8: we have 26 possible characters, and 8 of them, so the strength is 26^8, or 208.8 billion (takes about a minute to brute force). Adding one character to the length yields 26^9, or 5.4 trillion combinations: 20 minutes or so.
Let's go back to our 8-char string, but add a character: the space character. now we have 27^8, which is 282 billion....FAR LESS than adding an additional character!
The proper solution, of course, is to do both: for instance, 27^9 is 7.6 trillion combinations, or about half an hour of cracking. An 8-character password using upper case, lower case, numbers, special symbols, and the space character would take around 20 days to crack....still not nearly strong enough. Add another character, and it's 5 years.
As a reference, I usually make my passwords upwards of 16 characters, and they have at least one Cap, one space, one number, and one special character. Such a password at 16 characters would take several (hundred) trillion years to brute force.

Removing repeated characters in string without using recursion

You are given a string. Develop a function to remove duplicate characters from that string. String could be of any length. Your algorithm must be in space. If you wish you can use constant size extra space which is not dependent any how on string size. Your algorithm must be of complexity of O(n).
My idea was to define an integer array of size of 26 where 0th index would correspond to the letter a and the 25th index for the letter z and initialize all the elements to 0.
Thus we will travel the entire string once and and would increment the value at the desired index as and when we encounter a letter.
and then we will travel the string once again and if the value at the desired index is 1 we print out the letter otherwise we do not.
In this way the time complexity is O(n) and the space used is constant irrespective of the length of the string!!
if anyone can come up with ideas of better efficiency,it will be very helpful!!
Your solution definitely fits the criteria of O(n) time. Instead of an array, which would be very, very large if the allowed alphabet is large (Unicode has over a million characters), you could use a plain hash. Here is your algorithm in (unoptimized!) Ruby:
def undup(s)
seen = Hash.new(0)
s.each_char {|c| seen[c] += 1}
result = ""
s.each_char {|c| result << c if seen[c] == 1}
result
end
puts(undup "")
puts(undup "abc")
puts(undup "Olé")
puts(undup "asdasjhdfasjhdfasbfdasdfaghsfdahgsdfahgsdfhgt")
It makes two passes through the string, and since hash lookup is less than linear, you're good.
You can say the Hashtable (like your array) uses constant space, albeit large, because it is bounded above by the size of the alphabet. Even if the size of the alphabet is larger than that of the string, it still counts as constant space.
There are many variations to this problem, many of which are fun. To do it truly in place, you can sort first; this gives O(n log n). There are variations on merge sort where you ignore dups during the merge. In fact, this "no external hashtable" restriction appears in Algorithm: efficient way to remove duplicate integers from an array (also tagged interview question).
Another common interview question starts with a simple string, then they say, okay now a million character string, okay now a string with 100 billion characters, and so on. Things get very interesting when you start considering Big Data.
Anyway, your idea is pretty good. It can generally be tweaked as follows: Use a set, not a dictionary. Go trough the string. For each character, if it is not in the set, add it. If it is, delete it. Sets take up less space, don't need counters, and can be implemented as bitsets if the alphabet is small, and this algorithm does not need two passes.
Python implementation: http://code.activestate.com/recipes/52560-remove-duplicates-from-a-sequence/
You can also use a bitset instead of the additional array to keep track of found chars. Depending on which characters (a-z or more) are allowed you size the bitset accordingly. This requires less space than an integer array.

Resources