O(n) time complexity and O(1) space complexity way to see if two strings are permutations of each other - string

Is there an algorithm that can see if two strings are permutations of each other with O(n) time complexity and O(1) space complexity?

Yes sure there is a very nice way. You have to use count sort for this. There is no reason to generate prime numbers at all. Here is a C code snippet that describes the algorithm:
bool is_permutation(string s1, string s2) {
if(s1.length() != s2.length()) return false;
int count[256]; //assuming each character fits in one byte, also the authors sample solution seems to have this boundary
for(int i=0;i<256;i++) count[i]=0;
for(int i=0;i<s1.length();i++) { //count the digits to see if each digits occur same number of times in both strings
count[ s1[i] ]++;
count[ s2[i] ]--;
}
for(int i=0;i<256;i++) { //see if there is any digit that appeared in different frequency
if(count[i]!=0) return false;
}
return true;
}
EDIT: (I decided to add this after some comments related to order of my program)
The Lets try to calculate the time complexity of the algorithm I have used in my program:
n = max len of strings
m = max allowed different characters, assuming will having all consecutive ascii value in range [0,m-1]
Time complexity: O(max(n,m))
Memory Complexity O(m)
Now assuming m is a constant here the order becomes
Time complexity: O(n)
Memory Complexity O(1)

Here is a simple program I wrote in java that gives the answer in O(n) for time complexity and O(1) for space complexity. It works by mapping every character to a prime number and then multiplying together all of the characters in the string's prime mappings. If the two strings are permutations then they should have the same unique characters each with the same number of occurrences.
Here is some sample code that accomplishes this:
// maps keys to a corresponding unique prime
static Map<Integer, Integer> primes = generatePrimes(255); // use 255 for
// ASCII or the
// number of
// possible
// characters
public static boolean permutations(String s1, String s2) {
// both strings must be same length
if (s1.length() != s2.length())
return false;
// the corresponding primes for every char in both strings are multiplied together
int s1Product = 1;
int s2Product = 1;
for (char c : s1.toCharArray())
s1Product *= primes.get((int) c);
for (char c : s2.toCharArray())
s2Product *= primes.get((int) c);
return s1Product == s2Product;
}
private static Map<Integer, Integer> generatePrimes(int n) {
Map<Integer, Integer> primes = new HashMap<Integer, Integer>();
primes.put(0, 2);
for (int i = 2; primes.size() < n; i++) {
boolean divisible = false;
for (int v : primes.values()) {
if (i % v == 0) {
divisible = true;
break;
}
}
if (!divisible) {
primes.put(primes.size(), i);
System.out.println(i + " ");
}
}
return primes;
}

Related

Longest common prefix - comparing time complexity of two algorithms

If you comparing these two solutions the time complexity of the first solution is O(array-len*sortest-string-len) that you may shorten it to O(n*m) or even O(n^2). And the second one seems O(n * log n) as it has a sort method and then comparing the first and the last item so it would be O(n) and don't have any effect on the O.
But, what happens to the comparing the strings item in the list. Sorting a list of integer values is O(n * log n) but don't we need to compare the characters in the strings to be able to sort them? So, am I wrong if I say the time complexity of the second solution is O(n * log n * longest-string-len)?
Also, as it does not check the prefixes while it is sorting it would do the sorting (the majority of the times) anyway so its best case is far worse than the other option? Also, for the worst-case scenario if you consider the point I mentioned it would still be worse than the first solution?
public string longestCommonPrefix(List<string> input) {
if(input.Count == 0) return "";
if(input.Count == 1) return input[0];
var sb = new System.Text.StringBuilder();
for(var charIndex = 0; charIndex < input[0].Length; charIndex++)
{
for(var itemIndex = 1; itemIndex < input.Count; itemIndex++)
{
if(input[itemIndex].Length > charIndex)
return sb.ToString();
if(input[0][charIndex] != input[itemIndex][charIndex])
return sb.ToString();
}
sb.Append(input[0][charIndex]);
}
return sb.ToString();
}
static string longestCommonPrefix(String[] a)
{
int size = a.Length;
/* if size is 0, return empty string */
if (size == 0)
return "";
if (size == 1)
return a[0];
/* sort the array of strings */
Array.Sort(a);
/* find the minimum length from first
and last string */
int end = Math.Min(a[0].Length,
a[size-1].Length);
/* find the common prefix between the
first and last string */
int i = 0;
while (i < end && a[0][i] == a[size-1][i] )
i++;
string pre = a[0].Substring(0, i);
return pre;
}
First of all, unless I am missing something obvious, the first method runs in O(N * shortest-string-length); shortest, not longest.
Second, you may not reduce O(n*m) to O(n^2): the number of strings and their length are unrelated.
Finally, you are absolutely right. Sorting indeed takes O(n*log(n)*m), so in no case it would improve the performance.
As a side note, it may be beneficial to find the shortest string beforehand. This would make a input[itemIndex].Length > charIndex unnecessary.

Maximum repeating substring of size n

Find the substring of length n that repeats a maximum number of times in a given string.
Input: abbbabbbb# 2
Output: bb
My solution:
public static String mrs(String s, int m) {
int n = s.length();
String[] suffixes = new String[n-m+1];
for (int i = 0; i < n-m+1; i++) {
suffixes[i] = s.substring(i, i+m);
}
Arrays.sort(suffixes);
String ans = "", tmp=suffixes[0].substring(0,m);
int cnt = 1, max=0;
for (int i = 0; i < n-m; i++) {
if (suffixes[i].equals(suffixes[i+1])){
cnt++;
}else{
if(cnt>max){
max = cnt;
ans =tmp;
}
cnt=0;
tmp = suffixes[i];
}
}
return ans;
}
Can it be done better than the above O(nm) time and O(n) space solution?
For a string of length L and a given length k (not to mess up with n and m which the question interchanges at times), we can compute polynomial hashes of all substrings of length k in O(L) (see Wikipedia for some elaboration on this subproblem).
Now, if we map the hash values to the number of times they occur, we get the value which occurs most frequently in O(L) (with a HashMap with high probability, or in O(L log L) with a TreeMap).
After that, just take the substring which got the most frequent hash as the answer.
This solution does not take hash collisions into account.
The idea is to just reduce the probability of collisions enough for the application (if it's too high, use multiple hashes, for example).
If the application demands that we absolutely never give a wrong answer, we can check the answer in O(L) with another algorithm (KMP, for example), and re-run the whole solution with a different hash function as long as the answer turns out to be wrong.

Dynamic character generator; Generate all possible strings from a character set

I want to make a dynamic string generator that will generate all possible unique strings from a character set with a dynamic length.
I can make this very easily using for loops but then its static and not dynamic length.
// Prints all possible strings with the length of 3
for a in allowedCharacters {
for b in allowedCharacters {
for c in allowedCharacters {
println(a+b+c)
}
}
}
But when I want to make this dynamic of length so I can just call generate(length: 5) I get confused.
I found this Stackoverflow question But the accepted answer generates strings 1-maxLength length and I want maxLength on ever string.
As noted above, use recursion. Here is how it can be done with C#:
static IEnumerable<string> Generate(int length, char[] allowed_chars)
{
if (length == 1)
{
foreach (char c in allowed_chars)
yield return c.ToString();
}
else
{
var sub_strings = Generate(length - 1, allowed_chars);
foreach (char c in allowed_chars)
{
foreach (string sub in sub_strings)
{
yield return c + sub;
}
}
}
}
private static void Main(string[] args)
{
string chars = "abc";
List<string> result = Generate(3, chars.ToCharArray()).ToList();
}
Please note that the run time of this algorithm and the amount of data it returns is exponential as the length increases which means that if you have large lengths, you should expect the code to take a long time and to return a huge amount of data.
Translation of #YacoubMassad's C# code to Swift:
func generate(length: Int, allowedChars: [String]) -> [String] {
if length == 1 {
return allowedChars
}
else {
let subStrings = generate(length - 1, allowedChars: allowedChars)
var arr = [String]()
for c in allowedChars {
for sub in subStrings {
arr.append(c + sub)
}
}
return arr
}
}
println(generate(3, allowedChars: ["a", "b", "c"]))
Prints:
aaa, aab, aac, aba, abb, abc, aca, acb, acc, baa, bab, bac, bba, bbb, bbc, bca, bcb, bcc, caa, cab, cac, cba, cbb, cbc, cca, ccb, ccc
While you can (obviously enough) use recursion to solve this problem, it quite an inefficient way to do the job.
What you're really doing is just counting. In your example, with "a", "b" and "c" as the allowed characters, you're counting in base 3, and since you're allowing three character strings, they're three digit numbers.
An N-digit number in base M can represent NM different possible values, going from 0 through NM-1. So, for your case, that's limit=pow(3, 3)-1;. To generate all those values, you just count from 0 through the limit, and convert each number to base M, using the specified characters as the "digits". For example, in C++ the code can look like this:
#include <string>
#include <iostream>
int main() {
std::string letters = "abc";
std::size_t base = letters.length();
std::size_t digits = 3;
int limit = pow(base, digits);
for (int i = 0; i < limit; i++) {
int in = i;
for (int j = 0; j < digits; j++) {
std::cout << letters[in%base];
in /= base;
}
std::cout << "\t";
}
}
One minor note: as I've written it here, this produces the output in basically a little-endian format. That is, the "digit" that varies the fastest is on the left, and the one that changes the slowest is on the right.

dynamic programming for minimum cost of breaking the string

A certain string-processing language offers a primitive operation which splits a string into two pieces. Since this operation involves copying the original string, it takes n units of time for a string of length n, regardless of the location of the cut. Suppose, now, that you want to break a string into many pieces. The order in which the breaks are made can affect the total running time. For example, if you want to cut a 20-character string at positions 3 and 10, then making the first cut at position 3 incurs a total cost of 20+17=37, while doing position 10 first has a better cost of 20+10=30.
Give a dynamic programming algorithm that, given the locations of m cuts in a string of length n, finds the minimum cost of breaking the string into m + 1 pieces.
This problem is from "Algorithms" chapter6 6.9.
Since there is no answer for this problem, This is what I thought.
Define OPT(i,j,n) as the minimum cost of breaking the string, i for start index, j for end index of String and n for the remaining number of cut I can use.
Here is what I get:
OPT(i,j,n) = min {OPT(i,k,w) + OPT(k+1,j,n-w) + j-i} for i<=k<j and 0<=w<=n
Is it right or not? Please help, thx!
I think your recurrence relation can become more better. Here's what I came up with, define cost(i,j) to be the cost of cutting the string from index i to j. Then,
cost(i,j) = min {length of substring + cost(i,k) + cost(k,j) where i < k < j}
void s_cut()
{
int l,p;
int temp=0;
//ArrayList<Integer> al = new ArrayList<Integer>();
int al[];
Scanner s=new Scanner(System.in);
int table[][];
ArrayList<Integer> values[][];
int low=0,high=0;
int min=0;
l=s.nextInt();
p=s.nextInt();
System.out.println("The values are "+l+" "+p);
table= new int[l+1][l+1];
values= new ArrayList[l+1][l+1];
al= new int[p];
for(int i=0;i<p;i++)
{
al[i]=s.nextInt();
}
for(int i=0;i<=l;i++)
for(int j=0;j<=l;j++)
values[i][j]=new ArrayList<Integer>();
System.out.println();
for(int i=1;i<=l;i++)
table[i][i]=0;
//Arrays.s
Arrays.sort(al);
for(int i=0;i<p;i++)
{
System.out.print(al[i]+ " ");
}
for(int len=2;len<=l;len++)
{
//System.out.println("The length is "+len);
for(int i=1,j=i+len-1;j<=l;i++,j++)
{
high= min_index(al,j-1);
low= max_index(al,i);
System.out.println("Indices are "+low+" "+high);
if(low<=high && low!=-1 && high!=-1)
{
int cost=Integer.MAX_VALUE;;
for(int k=low;k<=high;k++)
{
//if(al[k]!=j)
temp=cost;
cost=Math.min(cost, table[i][al[k]]+table[al[k]+1][j]);
if(temp!=cost)
{
min=k;
//values[i][j].add(al[k]);
//values[i][j].addAll(values[i][al[k]]);
//values[i][j].addAll(values[al[k]+1][j]);
//values[i][j].addAll(values[i][al[k]]);
}
//else
//cost=0;
}
table[i][j]= len+cost;
values[i][j].add(al[min]);
//values[i][j].addAll(values[i][al[min]]);
values[i][j].addAll(values[al[min]+1][j]);
values[i][j].addAll(values[i][al[min]]);
}
else
table[i][j]=0;
System.out.println(" values are "+i+" "+j+" "+table[i][j]);
}
}
System.out.println(" The minimum cost is "+table[1][l]);
//temp=values[1][l];
for(int e: values[1][l])
{
System.out.print(e+"-->");
}
}
The above solution has the complexity of O(n^3).

Is there a circular hash function?

Thinking about this question on testing string rotation, I wondered: Is there was such thing as a circular/cyclic hash function? E.g.
h(abcdef) = h(bcdefa) = h(cdefab) etc
Uses for this include scalable algorithms which can check n strings against each other to see where some are rotations of others.
I suppose the essence of the hash is to extract information which is order-specific but not position-specific. Maybe something that finds a deterministic 'first position', rotates to it and hashes the result?
It all seems plausible, but slightly beyond my grasp at the moment; it must be out there already...
I'd go along with your deterministic "first position" - find the "least" character; if it appears twice, use the next character as the tie breaker (etc). You can then rotate to a "canonical" position, and hash that in a normal way. If the tie breakers run for the entire course of the string, then you've got a string which is a rotation of itself (if you see what I mean) and it doesn't matter which you pick to be "first".
So:
"abcdef" => hash("abcdef")
"defabc" => hash("abcdef")
"abaac" => hash("aacab") (tie-break between aa, ac and ab)
"cabcab" => hash("abcabc") (it doesn't matter which "a" comes first!)
Update: As Jon pointed out, the first approach doesn't handle strings with repetition very well. Problems arise as duplicate pairs of letters are encountered and the resulting XOR is 0. Here is a modification that I believe fixes the the original algorithm. It uses Euclid-Fermat sequences to generate pairwise coprime integers for each additional occurrence of a character in the string. The result is that the XOR for duplicate pairs is non-zero.
I've also cleaned up the algorithm slightly. Note that the array containing the EF sequences only supports characters in the range 0x00 to 0xFF. This was just a cheap way to demonstrate the algorithm. Also, the algorithm still has runtime O(n) where n is the length of the string.
static int Hash(string s)
{
int H = 0;
if (s.Length > 0)
{
//any arbitrary coprime numbers
int a = s.Length, b = s.Length + 1;
//an array of Euclid-Fermat sequences to generate additional coprimes for each duplicate character occurrence
int[] c = new int[0xFF];
for (int i = 1; i < c.Length; i++)
{
c[i] = i + 1;
}
Func<char, int> NextCoprime = (x) => c[x] = (c[x] - x) * c[x] + x;
Func<char, char, int> NextPair = (x, y) => a * NextCoprime(x) * x.GetHashCode() + b * y.GetHashCode();
//for i=0 we need to wrap around to the last character
H = NextPair(s[s.Length - 1], s[0]);
//for i=1...n we use the previous character
for (int i = 1; i < s.Length; i++)
{
H ^= NextPair(s[i - 1], s[i]);
}
}
return H;
}
static void Main(string[] args)
{
Console.WriteLine("{0:X8}", Hash("abcdef"));
Console.WriteLine("{0:X8}", Hash("bcdefa"));
Console.WriteLine("{0:X8}", Hash("cdefab"));
Console.WriteLine("{0:X8}", Hash("cdfeab"));
Console.WriteLine("{0:X8}", Hash("a0a0"));
Console.WriteLine("{0:X8}", Hash("1010"));
Console.WriteLine("{0:X8}", Hash("0abc0def0ghi"));
Console.WriteLine("{0:X8}", Hash("0def0abc0ghi"));
}
The output is now:
7F7D7F7F
7F7D7F7F
7F7D7F7F
7F417F4F
C796C7F0
E090E0F0
A909BB71
A959BB71
First Version (which isn't complete): Use XOR which is commutative (order doesn't matter) and another little trick involving coprimes to combine ordered hashes of pairs of letters in the string. Here is an example in C#:
static int Hash(char[] s)
{
//any arbitrary coprime numbers
const int a = 7, b = 13;
int H = 0;
if (s.Length > 0)
{
//for i=0 we need to wrap around to the last character
H ^= (a * s[s.Length - 1].GetHashCode()) + (b * s[0].GetHashCode());
//for i=1...n we use the previous character
for (int i = 1; i < s.Length; i++)
{
H ^= (a * s[i - 1].GetHashCode()) + (b * s[i].GetHashCode());
}
}
return H;
}
static void Main(string[] args)
{
Console.WriteLine(Hash("abcdef".ToCharArray()));
Console.WriteLine(Hash("bcdefa".ToCharArray()));
Console.WriteLine(Hash("cdefab".ToCharArray()));
Console.WriteLine(Hash("cdfeab".ToCharArray()));
}
The output is:
4587590
4587590
4587590
7077996
You could find a deterministic first position by always starting at the position with the "lowest" (in terms of alphabetical ordering) substring. So in your case, you'd always start at "a". If there were multiple "a"s, you'd have to take two characters into account etc.
I am sure that you could find a function that can generate the same hash regardless of character position in the input, however, how will you ensure that h(abc) != h(efg) for every conceivable input? (Collisions will occur for all hash algorithms, so I mean, how do you minimize this risk.)
You'd need some additional checks even after generating the hash to ensure that the strings contain the same characters.
Here's an implementation using Linq
public string ToCanonicalOrder(string input)
{
char first = input.OrderBy(x => x).First();
string doubledForRotation = input + input;
string canonicalOrder
= (-1)
.GenerateFrom(x => doubledForRotation.IndexOf(first, x + 1))
.Skip(1) // the -1
.TakeWhile(x => x < input.Length)
.Select(x => doubledForRotation.Substring(x, input.Length))
.OrderBy(x => x)
.First();
return canonicalOrder;
}
assuming generic generator extension method:
public static class TExtensions
{
public static IEnumerable<T> GenerateFrom<T>(this T initial, Func<T, T> next)
{
var current = initial;
while (true)
{
yield return current;
current = next(current);
}
}
}
sample usage:
var sequences = new[]
{
"abcdef", "bcdefa", "cdefab",
"defabc", "efabcd", "fabcde",
"abaac", "cabcab"
};
foreach (string sequence in sequences)
{
Console.WriteLine(ToCanonicalOrder(sequence));
}
output:
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef
aacab
abcabc
then call .GetHashCode() on the result if necessary.
sample usage if ToCanonicalOrder() is converted to an extension method:
sequence.ToCanonicalOrder().GetHashCode();
One possibility is to combine the hash functions of all circular shifts of your input into one meta-hash which does not depend on the order of the inputs.
More formally, consider
for(int i=0; i<string.length; i++) {
result^=string.rotatedBy(i).hashCode();
}
Where you could replace the ^= with any other commutative operation.
More examply, consider the input
"abcd"
to get the hash we take
hash("abcd") ^ hash("dabc") ^ hash("cdab") ^ hash("bcda").
As we can see, taking the hash of any of these permutations will only change the order that you are evaluating the XOR, which won't change its value.
I did something like this for a project in college. There were 2 approaches I used to try to optimize a Travelling-Salesman problem. I think if the elements are NOT guaranteed to be unique, the second solution would take a bit more checking, but the first one should work.
If you can represent the string as a matrix of associations so abcdef would look like
a b c d e f
a x
b x
c x
d x
e x
f x
But so would any combination of those associations. It would be trivial to compare those matrices.
Another quicker trick would be to rotate the string so that the "first" letter is first. Then if you have the same starting point, the same strings will be identical.
Here is some Ruby code:
def normalize_string(string)
myarray = string.split(//) # split into an array
index = myarray.index(myarray.min) # find the index of the minimum element
index.times do
myarray.push(myarray.shift) # move stuff from the front to the back
end
return myarray.join
end
p normalize_string('abcdef').eql?normalize_string('defabc') # should return true
Maybe use a rolling hash for each offset (RabinKarp like) and return the minimum hash value? There could be collisions though.

Resources