Valid binary tree from array of string - string

I am having an array of strings.Each character in a string can be r or l only.
I have to check if it is valid or not as
1. {rlr,l,r,lr, rl}
*
/ \
l r
\ /
r l
\
r
A valid tree as all nodes are present.
2. {ll, r, rl, rr}
*
/ \
- r
/ /\
l l r
Invalid tree as there is no l node.
From a give input I have to determine if it is creating a valid tree or not.
I have come up with two solutions.
1.Using trie to store input and marking each node as valid or not while insertion.
2.Sort the input array according to the length.
So for the first case it will be { l, r, lr, rl, rlr}
And I will create a set of strings to put all input.
If a string is having length more then 1(for rlr :: r, rl) I will consider all its prefix from index 0 and check in set.if any of the prefix in not present in set then I will return false.
I am wondering if there is a more optimal solution or any modification in the above methods.

A recursive approach with test cases,
public static void main(String[] args) {
System.out.println(new Main().isValid(new String[]{"LRL", "LRR", "LL", "LR"}));
System.out.println(new Main().isValid(new String[]{"LRL", "LRR", "LL", "LR", "L"}));
System.out.println(new Main().isValid(new String[]{"LR", "L"}));
System.out.println(new Main().isValid(new String[]{"L", "R", "LL", "LR"}));
}
public boolean isValid(String[] strs) {
Set<String> set = new HashSet<>();
int maxLength = 0;
for (String str : strs) {
set.add(str);
maxLength = Math.max(str.length(), maxLength);
}
helper(set, "L", 1, maxLength);
helper(set, "R", 1, maxLength);
return set.isEmpty();
}
private void helper(Set<String> set, String current, int len, int maxLength) {
if (!set.contains(current) || current.length() > maxLength) {
return;
}
if (set.contains(current))
set.remove(current);
helper(set, current + "L", len + 1, maxLength);
helper(set, current + "R", len + 1, maxLength);
}

Another possible solution is actually building the tree (or trie) and maintain a set of nodes that are incomplete yet.
If you finish iterating over the list and you still have incomplete nodes then the tree isn't valid.
If the set is empty then the tree is valid.
For example, in the second tree you gave, for node ll you will create also node l but you will add it to the incomplete set. If one of the later nodes is l then you will erase it from the set. If not, you will end the iteration with a non empty set that contains you missing nodes.

Related

Shortest string containing the most occurrence of strings in a set

Define the degree of a string M be the number of times it appears in another string S. For example M = "aba" and S="ababa", the degree of M is 2. Given a set of strings and an integer N, find the string of the minimum length so that the sum of degrees of all strings in the set is at least N.
For example a set {"ab", "bd", "abd" "babd", "abc"}, N = 4, the answer will be "babd". It contains "ab", "abd", "babd" and "bd" one time.
N <= 100, M <= 100, length of every string in the set <= 100. Strings in the set only consist of uppercase and lowercase letters.
How to solve this problem? This looks similar to the shortest superstring problems which has a dynamic programming solution that has exponential complexity. However, the constraint in this problem is much larger and the same idea also won't work here. Is there some string data structure that can be applied here?
I have a polynomial time algorithm, which I'm too lazy to code. But I'll describe it for you.
First, make each string in the set plus the empty string be the nodes of a graph. The empty string is connected to each other string, and vice versa. If the end of one string overlaps with the start of another, they also connect. If two can overlap by different amounts, they get multiple edges. (So it is not exactly a graph...)
Each edge gets a cost and a value. The cost is how many characters you have to extend the string you are building by to move from the old end to the new end. (In other words the length of the second string minus the length of the overlap.) to having this one. The value is how many new strings you completed that cross the barrier between the former and the latter string.
Your example was {"ab", "bd", "abd" "babd", "abc"}. Here are the (cost, value) pairs for each transition.
from -> to : (value, cost)
"" -> "ab": ( 1, 2)
"" -> "bd": ( 1, 2)
"" -> "abd": ( 3, 3) # we added "ab", "bd" and "abd"
"" -> "babd": ( 4, 4) # we get "ab", "bd", "abd" and "babd"
"" -> "abc": ( 2, 3) # we get "ab" and "abc"
"ab" -> "": ( 0, 0)
"ab" -> "bd": ( 2, 1) # we added "abd" and "bd" for 1 character
"ab" -> "abd": ( 2, 1) # ditto
"ab" -> "abc": ( 1, 1) # we only added "abc"
"bd" -> "": ( 0, 0) # only empty, nothing else starts "bd"
"abd" -> "": ( 0, 0)
"babd" -> "": ( 0, 0)
"babd" -> "abd": ( 0, 0) # overlapped, but added nothing.
"abc" -> "": ( 0, 0)
OK, all of that is setup. Why did we want this graph?
Well note that if we start at "" with a cost of 0 and a value of 0, then take a path through the graph, that constructs a string. It correctly states the cost, and provides a lower bound on the value. The value can be higher. For example if your set were {"ab", "bc", "cd", "abcd"} then the path "" -> "ab" -> "bc" -> "cd" would lead to the string "abcd" with a cost of 4 and a predicted value of 3. But that value estimate missed the fact that we matched "abcd".
However for any given string made up only of substrings from the set, there is a path through the graph that has the correct cost and the correct value. (At each choice you want to pick the earliest starting matching string that you have not yet counted, and of those pick the longest of them. Then you never miss any matches.)
So we've turned our problem from constructing strings to constructing paths through a graph. What we want to do is build up the following data structure:
for each (value, node) combination:
(best cost, previous node, previous value)
Filling in that data structure is a dynamic programming problem. Once filled in we can just trace back through it to find what path in the graph got us to that value with that cost. Given that path, we can figure out the string that did it.
How fast is it? If our set has K strings then we only need to fill in K * N values, each of which we can give a maximum of K candidates for new values. Which makes the path finding a O(K^2 * N) problem.
So here is my approach. At first iteration we construct a pool out of the initial strings.
After that:
We select out of the pool a string having minimal length and sum of degrees=N. If we found such a string we just return it.
We filter out of the pool all strings with degree less than maximal. We work only with the best possible string combinations.
We construct all variants out of the current pool and the initial strings. Here we need to take into consideration that strings can overlap. Say a string "aba" and "ab"(from initial strings) could produce: ababa, abab, abaab (we do not include "aba" because we already had it in our pool and we need to move further).
We filter out duplicates and this is our next pool.
Repeat everything from the point 1.
The FindTarget() method accepts the target sum as a parameter. FindTarget(4) will solve the sample task.
public class Solution
{
/// <summary>
/// The initial strings.
/// </summary>
string[] stringsSet;
Tuple<string, string>[][] splits;
public Solution(string[] strings)
{
stringsSet = strings;
splits = stringsSet.Select(s => ProduceItemSplits(s)).ToArray();
}
/// <summary>
/// Find the optimal string.
/// </summary>
/// <param name="N">Target degree.</param>
/// <returns></returns>
public string FindTarget(int N)
{
var pool = stringsSet;
while (true)
{
var poolWithDegree = pool.Select(s => new { str = s, degree = GetN(s) })
.ToArray();
var maxDegree = poolWithDegree.Max(m => m.degree);
var optimalString = poolWithDegree
.Where(w => w.degree >= N)
.OrderBy(od => od.str.Length)
.FirstOrDefault();
if (optimalString != null) return optimalString.str; // We found it
var nextPool = poolWithDegree.Where(w => w.degree == maxDegree)
.SelectMany(sm => ExpandString(sm.str))
.Distinct()
.ToArray();
pool = nextPool;
}
}
/// <summary>
/// Get degree.
/// </summary>
/// <param name="candidate"></param>
/// <returns></returns>
public int GetN(string candidate)
{
var N = stringsSet.Select(s =>
{
var c = Regex.Matches(candidate, s).Count();
return c;
}).Sum();
return N;
}
public Tuple<string, string>[] ProduceItemSplits(string item)
{
var substings = Enumerable.Range(0, item.Length + 1)
.Select((i) => new Tuple<string, string>(item.Substring(0, i), item.Substring(i, item.Length - i))).ToArray();
return substings;
}
private IEnumerable<string> ExpandStringWithOneItem(string str, int index)
{
var item = stringsSet[index];
var itemSplits = splits[index];
var startAttachments = itemSplits.Where(w => str.StartsWith(w.Item2) && w.Item1.Length > 0)
.Select(s => s.Item1 + str);
var endAttachments = itemSplits.Where(w => str.EndsWith(w.Item1) && w.Item2.Length > 0)
.Select(s => str + s.Item2);
return startAttachments.Union(endAttachments);
}
public IEnumerable<string> ExpandString(string str)
{
var r = Enumerable.Range(0, splits.Length - 1)
.Select(s => ExpandStringWithOneItem(str, s))
.SelectMany(s => s);
return r;
}
}
static void Main(string[] args)
{
var solution = new Solution(new string[] { "ab", "bd", "abd", "babd", "abc" });
var s = solution.FindTarget(150);
Console.WriteLine(s);
}

Generate string permutations recursively; each character appears n times

I'm trying to write an algorithm that will generate all strings of length nm, with exactly n of each number 1, 2, ... m,
For instance all strings of length 6, with exactly two 1's, two 2's and two 3's e.g. 112233, 121233,
I managed to do this with just 1's and 2's using a recursive method, but can't seem to get something that works when I introduce 3's.
When m = 2, the algorithm I have is:
generateAllStrings(int len, int K, String str)
{
if(len == 0)
{
output(str);
}
if(K > 0)
{
generateAllStrings(len - 1, K - 1, str + '2');
}
if(len > K)
{
generateAllStrings(len - 1, K, str + '1');
}
}
I've tried inserting similar conditions for the third number but the algorithm doesn't give a correct output. After that I wouldn't even know how to generalise for 4 numbers and above.
Is recursion the right thing to do? Any help would be appreciated.
One option would be to list off all distinct permutations of the string 111...1222...2...nnn....n. There are nice algorithms for enumerating all distinct permutations of a string in time proportional to the length of the string, and they'd probably be a good way to go about solving this problem.
To use a simple recursive algorithm, give each recursion the permutation so far (variable perm), and the number of occurances of each digit that is still available (array count).
Run the code snippet to generate all unique permutations for n=2 and m=4 (set: 11223344).
function permutations(n, m) {
var perm = "", count = []; // start with empty permutation
for (var i = 0; i < m; i++) count[i] = n; // set available number for each digit = n
permute(perm, count); // start recursion with "" and [n,n,n...]
function permute(perm, count) {
var done = true;
for (var i = 0; i < count.length; i++) { // iterate over all digits
if (count[i] > 0) { // more instances of digit i available
var c = count.slice(); // create hard copy of count array
--c[i]; // decrement count of digit i
permute(perm + (i + 1), c); // add digit to permutation and recurse
done = false; // digits left over: not the last step
}
}
if (done) document.write(perm + "<BR>"); // no digits left: complete permutation
}
}
permutations(2, 4);
You can easily do this using DFS (or BFS alternatively). We can define an graph such that each node contains one string and a node is connected to any node that holds a string with a pair of int swaped in comparison to the original string. This graph is connected, thus we can easily generate a set of all nodes; which will contain all strings that are searched:
set generated_strings
list nodes
nodes.add(generateInitialString(N , M))
generated_strings.add(generateInitialString(N , M))
while(!nodes.empty())
string tmp = nodes.remove(0)
for (int i in [0 , N * M))
for (int j in distinct([0 , N * M) , i))
string new = swap(tmp , i , j)
if (!generated_strings.contains(new))
nodes.add(new)
generated_strings.add(new)
//generated_strings now contains all strings that can possibly be generated.

Reduce a string using grammar-like rules

I'm trying to find a suitable DP algorithm for simplifying a string. For example I have a string a b a b and a list of rules
a b -> b
a b -> c
b a -> a
c c -> b
The purpose is to get all single chars that can be received from the given string using these rules. For this example it will be b, c. The length of the given string can be up to 200 symbols. Could you please prompt an effective algorithm?
Rules always are 2 -> 1. I've got an idea of creating a tree, root is given string and each child is a string after one transform, but I'm not sure if it's the best way.
If you read those rules from right to left, they look exactly like the rules of a context free grammar, and have basically the same meaning. You could apply a bottom-up parsing algorithm like the Earley algorithm to your data, along with a suitable starting rule; something like
start <- start a
| start b
| start c
and then just examine the parse forest for the shortest chain of starts. The worst case remains O(n^3) of course, but Earley is fairly effective, these days.
You can also produce parse forests when parsing with derivatives. You might be able to efficiently check them for short chains of starts.
For a DP problem, you always need to understand how you can construct the answer for a big problem in terms of smaller sub-problems. Assume you have your function simplify which is called with an input of length n. There are n-1 ways to split the input in a first and a last part. For each of these splits, you should recursively call your simplify function on both the first part and the last part. The final answer for the input of length n is the set of all possible combinations of answers for the first and for the last part, which are allowed by the rules.
In Python, this can be implemented like so:
rules = {'ab': set('bc'), 'ba': set('a'), 'cc': set('b')}
all_chars = set(c for cc in rules.values() for c in cc)
# memoize
def simplify(s):
if len(s) == 1: # base case to end recursion
return set(s)
possible_chars = set()
# iterate over all the possible splits of s
for i in range(1, len(s)):
head = s[:i]
tail = s[i:]
# check all possible combinations of answers of sub-problems
for c1 in simplify(head):
for c2 in simplify(tail):
possible_chars.update(rules.get(c1+c2, set()))
# speed hack
if possible_chars == all_chars: # won't get any bigger
return all_chars
return possible_chars
Quick check:
In [53]: simplify('abab')
Out[53]: {'b', 'c'}
To make this fast enough for large strings (to avoiding exponential behavior), you should use a memoize decorator. This is a critical step in solving DP problems, otherwise you are just doing a brute-force calculation. A further tiny speedup can be obtained by returning from the function as soon as possible_chars == set('abc'), since at that point, you are already sure that you can generate all possible outcomes.
Analysis of running time: for an input of length n, there are 2 substrings of length n-1, 3 substrings of length n-2, ... n substrings of length 1, for a total of O(n^2) subproblems. Due to the memoization, the function is called at most once for every subproblem. Maximum running time for a single sub-problem is O(n) due to the for i in range(len(s)), so the overall running time is at most O(n^3).
Let N - length of given string and R - number of rules.
Expanding a tree in a top down manner yields computational complexity O(NR^N) in the worst case (input string of type aaa... and rules aa -> a).
Proof:
Root of the tree has (N-1)R children, which have (N-1)R^2 children, ..., which have (N-1)R^N children (leafs). So, the total complexity is O((N-1)R + (N-1)R^2 + ... (N-1)R^N) = O(N(1 + R^2 + ... + R^N)) = (using binomial theorem) = O(N(R+1)^N) = O(NR^N).
Recursive Java implementation of this naive approach:
public static void main(String[] args) {
Map<String, Character[]> rules = new HashMap<String, Character[]>() {{
put("ab", new Character[]{'b', 'c'});
put("ba", new Character[]{'a'});
put("cc", new Character[]{'b'});
}};
System.out.println(simplify("abab", rules));
}
public static Set<String> simplify(String in, Map<String, Character[]> rules) {
Set<String> result = new HashSet<String>();
simplify(in, rules, result);
return result;
}
private static void simplify(String in, Map<String, Character[]> rules, Set<String> result) {
if (in.length() == 1) {
result.add(in);
}
for (int i = 0; i < in.length() - 1; i++) {
String two = in.substring(i, i + 2);
Character[] rep = rules.get(two);
if (rep != null) {
for (Character c : rep) {
simplify(in.substring(0, i) + c + in.substring(i + 2, in.length()), rules, result);
}
}
}
}
Bas Swinckels's O(RN^3) Java implementation (with HashMap as a memoization cache):
public static Set<String> simplify2(final String in, Map<String, Character[]> rules) {
Map<String, Set<String>> cache = new HashMap<String, Set<String>>();
return simplify2(in, rules, cache);
}
private static Set<String> simplify2(final String in, Map<String, Character[]> rules, Map<String, Set<String>> cache) {
final Set<String> cached = cache.get(in);
if (cached != null) {
return cached;
}
Set<String> ret = new HashSet<String>();
if (in.length() == 1) {
ret.add(in);
return ret;
}
for (int i = 1; i < in.length(); i++) {
String head = in.substring(0, i);
String tail = in.substring(i, in.length());
for (String c1 : simplify2(head, rules)) {
for (String c2 : simplify2(tail, rules, cache)) {
Character[] rep = rules.get(c1 + c2);
if (rep != null) {
for (Character c : rep) {
ret.add(c.toString());
}
}
}
}
}
cache.put(in, ret);
return ret;
}
Output in both approaches:
[b, c]

Determine number of char movement to get word

Suppose you are given a word
"sunflower"
You can perform only one operation type on it, pick a character and move it to the front.
So for instance if you picked 'f', the word would be "fsunlower".
You can have a series of these operations.
fsunlower (moved f to front)
wfsunloer (moved w to front)
fwsunloer (moved f to front again)
The problem is to get the minimum number of operations required, given the derived word and the original word. So if input strings are "fwsunloer", "sunflower", the output would be 3.
This problem is equivalent to : given String A and B, find the longest suffix of string A that is a sub-sequence of String B. Because, if we know which n - characters need to be moved, we will only need n steps. So what we need to find is the maximum number of character that don't need to be moved, which is equivalent to the longest suffix in A.
So for the given example, the longest suffix is sunlor
Java code:
public static void main(String[] args) {
System.out.println(minOp("ewfsunlor", "sunflower"));
}
public static int minOp(String A, String B) {
int n = A.length() - 1;//Start from the end of String A;
int pos = B.length();
int result = 0;
while (n >= 0) {
int nxt = -1;
for (int i = pos - 1; i >= 0; i--) {
if (B.charAt(i) == A.charAt(n)) {
nxt = i;
break;
}
}
if (nxt == -1) {
break;
}
result++;
pos = nxt;
n--;
}
return B.length() - result;
}
Result:
3
Time complexity O(n) with n is length of String A.
Note: this algorithm is based on an assumption that A and B contains same set of character. Otherwise, you need to check for that before using the function

Is there a circular hash function?

Thinking about this question on testing string rotation, I wondered: Is there was such thing as a circular/cyclic hash function? E.g.
h(abcdef) = h(bcdefa) = h(cdefab) etc
Uses for this include scalable algorithms which can check n strings against each other to see where some are rotations of others.
I suppose the essence of the hash is to extract information which is order-specific but not position-specific. Maybe something that finds a deterministic 'first position', rotates to it and hashes the result?
It all seems plausible, but slightly beyond my grasp at the moment; it must be out there already...
I'd go along with your deterministic "first position" - find the "least" character; if it appears twice, use the next character as the tie breaker (etc). You can then rotate to a "canonical" position, and hash that in a normal way. If the tie breakers run for the entire course of the string, then you've got a string which is a rotation of itself (if you see what I mean) and it doesn't matter which you pick to be "first".
So:
"abcdef" => hash("abcdef")
"defabc" => hash("abcdef")
"abaac" => hash("aacab") (tie-break between aa, ac and ab)
"cabcab" => hash("abcabc") (it doesn't matter which "a" comes first!)
Update: As Jon pointed out, the first approach doesn't handle strings with repetition very well. Problems arise as duplicate pairs of letters are encountered and the resulting XOR is 0. Here is a modification that I believe fixes the the original algorithm. It uses Euclid-Fermat sequences to generate pairwise coprime integers for each additional occurrence of a character in the string. The result is that the XOR for duplicate pairs is non-zero.
I've also cleaned up the algorithm slightly. Note that the array containing the EF sequences only supports characters in the range 0x00 to 0xFF. This was just a cheap way to demonstrate the algorithm. Also, the algorithm still has runtime O(n) where n is the length of the string.
static int Hash(string s)
{
int H = 0;
if (s.Length > 0)
{
//any arbitrary coprime numbers
int a = s.Length, b = s.Length + 1;
//an array of Euclid-Fermat sequences to generate additional coprimes for each duplicate character occurrence
int[] c = new int[0xFF];
for (int i = 1; i < c.Length; i++)
{
c[i] = i + 1;
}
Func<char, int> NextCoprime = (x) => c[x] = (c[x] - x) * c[x] + x;
Func<char, char, int> NextPair = (x, y) => a * NextCoprime(x) * x.GetHashCode() + b * y.GetHashCode();
//for i=0 we need to wrap around to the last character
H = NextPair(s[s.Length - 1], s[0]);
//for i=1...n we use the previous character
for (int i = 1; i < s.Length; i++)
{
H ^= NextPair(s[i - 1], s[i]);
}
}
return H;
}
static void Main(string[] args)
{
Console.WriteLine("{0:X8}", Hash("abcdef"));
Console.WriteLine("{0:X8}", Hash("bcdefa"));
Console.WriteLine("{0:X8}", Hash("cdefab"));
Console.WriteLine("{0:X8}", Hash("cdfeab"));
Console.WriteLine("{0:X8}", Hash("a0a0"));
Console.WriteLine("{0:X8}", Hash("1010"));
Console.WriteLine("{0:X8}", Hash("0abc0def0ghi"));
Console.WriteLine("{0:X8}", Hash("0def0abc0ghi"));
}
The output is now:
7F7D7F7F
7F7D7F7F
7F7D7F7F
7F417F4F
C796C7F0
E090E0F0
A909BB71
A959BB71
First Version (which isn't complete): Use XOR which is commutative (order doesn't matter) and another little trick involving coprimes to combine ordered hashes of pairs of letters in the string. Here is an example in C#:
static int Hash(char[] s)
{
//any arbitrary coprime numbers
const int a = 7, b = 13;
int H = 0;
if (s.Length > 0)
{
//for i=0 we need to wrap around to the last character
H ^= (a * s[s.Length - 1].GetHashCode()) + (b * s[0].GetHashCode());
//for i=1...n we use the previous character
for (int i = 1; i < s.Length; i++)
{
H ^= (a * s[i - 1].GetHashCode()) + (b * s[i].GetHashCode());
}
}
return H;
}
static void Main(string[] args)
{
Console.WriteLine(Hash("abcdef".ToCharArray()));
Console.WriteLine(Hash("bcdefa".ToCharArray()));
Console.WriteLine(Hash("cdefab".ToCharArray()));
Console.WriteLine(Hash("cdfeab".ToCharArray()));
}
The output is:
4587590
4587590
4587590
7077996
You could find a deterministic first position by always starting at the position with the "lowest" (in terms of alphabetical ordering) substring. So in your case, you'd always start at "a". If there were multiple "a"s, you'd have to take two characters into account etc.
I am sure that you could find a function that can generate the same hash regardless of character position in the input, however, how will you ensure that h(abc) != h(efg) for every conceivable input? (Collisions will occur for all hash algorithms, so I mean, how do you minimize this risk.)
You'd need some additional checks even after generating the hash to ensure that the strings contain the same characters.
Here's an implementation using Linq
public string ToCanonicalOrder(string input)
{
char first = input.OrderBy(x => x).First();
string doubledForRotation = input + input;
string canonicalOrder
= (-1)
.GenerateFrom(x => doubledForRotation.IndexOf(first, x + 1))
.Skip(1) // the -1
.TakeWhile(x => x < input.Length)
.Select(x => doubledForRotation.Substring(x, input.Length))
.OrderBy(x => x)
.First();
return canonicalOrder;
}
assuming generic generator extension method:
public static class TExtensions
{
public static IEnumerable<T> GenerateFrom<T>(this T initial, Func<T, T> next)
{
var current = initial;
while (true)
{
yield return current;
current = next(current);
}
}
}
sample usage:
var sequences = new[]
{
"abcdef", "bcdefa", "cdefab",
"defabc", "efabcd", "fabcde",
"abaac", "cabcab"
};
foreach (string sequence in sequences)
{
Console.WriteLine(ToCanonicalOrder(sequence));
}
output:
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef
aacab
abcabc
then call .GetHashCode() on the result if necessary.
sample usage if ToCanonicalOrder() is converted to an extension method:
sequence.ToCanonicalOrder().GetHashCode();
One possibility is to combine the hash functions of all circular shifts of your input into one meta-hash which does not depend on the order of the inputs.
More formally, consider
for(int i=0; i<string.length; i++) {
result^=string.rotatedBy(i).hashCode();
}
Where you could replace the ^= with any other commutative operation.
More examply, consider the input
"abcd"
to get the hash we take
hash("abcd") ^ hash("dabc") ^ hash("cdab") ^ hash("bcda").
As we can see, taking the hash of any of these permutations will only change the order that you are evaluating the XOR, which won't change its value.
I did something like this for a project in college. There were 2 approaches I used to try to optimize a Travelling-Salesman problem. I think if the elements are NOT guaranteed to be unique, the second solution would take a bit more checking, but the first one should work.
If you can represent the string as a matrix of associations so abcdef would look like
a b c d e f
a x
b x
c x
d x
e x
f x
But so would any combination of those associations. It would be trivial to compare those matrices.
Another quicker trick would be to rotate the string so that the "first" letter is first. Then if you have the same starting point, the same strings will be identical.
Here is some Ruby code:
def normalize_string(string)
myarray = string.split(//) # split into an array
index = myarray.index(myarray.min) # find the index of the minimum element
index.times do
myarray.push(myarray.shift) # move stuff from the front to the back
end
return myarray.join
end
p normalize_string('abcdef').eql?normalize_string('defabc') # should return true
Maybe use a rolling hash for each offset (RabinKarp like) and return the minimum hash value? There could be collisions though.

Resources