Finding the binary composition of a binary number - c#-4.0

Very new to C#, so this could be a silly question.
I am working with alot of UInt64's. These are expressed as hex right? If we look at its binary representation, can we return such an array that if we apply the 'or' operation to, we will arrive back at the original UInt64?
For example, let's say
x = 1011
Then, I am looking for an efficient way to arrive at,
f(x) = {1000, 0010, 0001}
Where these numbers are in hex, rather than binary. Sorry, I am new to hex too.
I have a method already, but it feels inefficient. I first convert to a binary string, and loop over that string to find each '1'. I then add the corresponding binary number to an array.
Any thoughts?
Here is a better example. I have a hexadecimal number x, in the form of,
UInt64 x = 0x00000000000000FF
Where the binary representation of x is
0000000000000000000000000000000000000000000000000000000011111111
I wish to find an array consisting of hexadecimal numbers (UInt64??) such that the or operation applied to all members of that array would result in x again. For example,
f(x) = {0x0000000000000080, // 00000....10000000
0x0000000000000040, // 00000....01000000
0x0000000000000020, // 00000....00100000
0x0000000000000010, // 00000....00010000
0x0000000000000008, // 00000....00001000
0x0000000000000004, // 00000....00000100
0x0000000000000002, // 00000....00000010
0x0000000000000001 // 00000....00000001
}
I think the question comes down to finding an efficient way to find the index of the '1's in the binary expansion...
public static UInt64[] findOccupiedSquares(UInt64 pieces){
UInt64[] toReturn = new UInt64[BitOperations.PopCount(pieces)];
if (BitOperations.PopCount(pieces) == 1){
toReturn[0] = pieces;
}
else{
int i = 0;
int index = 0;
while (pieces != 0){
i += 1;
pieces = pieces >> 1;
if (BitOperations.TrailingZeroCount(pieces) == 0){ // One
int rank = (int)(i / 8);
int file = i - (rank * 8);
toReturn[index] = LUTable.MaskRank[rank] & LUTable.MaskFile[file];
index += 1;
}
}
}
return toReturn;
}

Your question still confuses me as you seem to be mixing the concepts of numbers and number representations. i.e. There is an integer and then there is a hexadecimal representation of that integer.
You can very simply break any integer into its base-2 components.
ulong input = 16094009876; // example input
ulong x = 1;
var bits = new List<ulong>();
do
{
if ((input & x) == x)
{
bits.Add(x);
}
x <<= 1;
} while (x != 0);
bits is now a list of integers which each represent one of the binary 1 bits within the input. This can be verified by adding (or ORing - same thing) all the values. So this expression is true:
bits.Aggregate((a, b) => a | b) == input
If you want hexadecimal representations of those integers in the list, you can simply use ToString():
var hexBits = bits.Select(b => b.ToString("X16"));
If you want the binary representations of the integers, you can use Convert:
var binaryBits = bits.Select(b => Convert.ToString((long)b, 2).PadLeft(64, '0'));

Related

Return two numbers in Q Sharp (Q#) (Quantum Development Kit)

So, basically, I did the tutorial to create a random number on the website of Microsoft Azure and now I am trying to add some functionalities, including their suggestion add a minimum number.
The initial code to generate just one number, max, is:
operation SampleRandomNumberInRange(max : Int) : Int {
// mutable means variables that can change during computation
mutable output = 0;
// repeat loop to generate random numbers until it generates one that is less or equal to max
repeat {
mutable bits = new Result[0];
for idxBit in 1..BitSizeI(max) {
set bits += [GenerateRandomBit()];
}
// ResultArrayAsInt is from Microsoft.Quantum.Convert library, converts string to positive integer
set output = ResultArrayAsInt(bits);
} until (output <= max);
return output;
}
#EntryPoint()
operation SampleRandomNumber() : Int {
// let declares var which don't change during computation
let max = 50;
Message($"Sampling a random number between 0 and {max}: ");
return SampleRandomNumberInRange(max);
}
Everything works well. Now, I want to generate two numbers so I would like to create a function TwoSampleRandomNumbersInRange but I can't figure out how to make the function return a result such as "Int, Int", I tried a few things including the follow:
operation TwoSampleRandomNumbersInRange(min: Int, max : Int) : Int {
// mutable means variables that can change during computation
mutable output = 0;
// repeat loop to generate random numbers until it generates one that is less or equal to max
repeat {
mutable bits = new Result[0];
for idxBit in 1..BitSizeI(max) {
set bits += [GenerateRandomBit()];
}
for idxBit in 1..BitSizeI(min) {
set bits += [GenerateRandomBit()];
}
// ResultArrayAsInt is from Microsoft.Quantum.Convert library, converts string to positive integer
set output = ResultArrayAsInt(bits);
} until (output >= min and output <= max);
return output;
}
To generate two numbers, I tried this:
operation TwoSampleRandomNumbersInRange(min: Int, max : Int) : Int, Int {
//code here
}
...but the syntax for the output isn't right.
I also need the output:
set output = ResultArrayAsInt(bits);
to have two numbers but ResultArrayAsInt, as the name says, just returns an Int. I need to return two integers.
Any help appreciated, thanks!
The return of an operation has to be a data type, in this case to represent a pair of integers you need a tuple of integers: (Int, Int).
So the signature of your operation and the return statement will be
operation TwoSampleRandomNumbersInRange(min: Int, max : Int) : (Int, Int) {
// code here
return (integer1, integer2);
}
I found the answer to my own question, all I had to do was:
operation SampleRandomNumberInRange(min: Int, max : Int) : Int {
// mutable means variables that can change during computation
mutable output = 0;
// repeat loop to generate random numbers until it generates one that is less or equal to max
repeat {
mutable bits = new Result[0];
for idxBit in 1..BitSizeI(max) {
set bits += [GenerateRandomBit()];
}
// ResultArrayAsInt is from Microsoft.Quantum.Convert library, converts string to positive integer
set output = ResultArrayAsInt(bits);
} until (output >= min and output <= max);
return output;
}
#EntryPoint()
operation SampleRandomNumber() : Int {
// let declares var which don't change during computation
let max = 50;
let min = 10;
Message($"Sampling a random number between {min} and {max}: ");
return SampleRandomNumberInRange(min, max);
}
}

find the number of ways you can form a string on size N, given an unlimited number of 0s and 1s

The below question was asked in the atlassian company online test ,I don't have test cases , this is the below question I took from this link
find the number of ways you can form a string on size N, given an unlimited number of 0s and 1s. But
you cannot have D number of consecutive 0s and T number of consecutive 1s. N, D, T were given as inputs,
Please help me on this problem,any approach how to proceed with it
My approach for the above question is simply I applied recursion and tried for all possiblity and then I memoized it using hash map
But it seems to me there must be some combinatoric approach that can do this question in less time and space? for debugging purposes I am also printing the strings generated during recursion, if there is flaw in my approach please do tell me
#include <bits/stdc++.h>
using namespace std;
unordered_map<string,int>dp;
int recurse(int d,int t,int n,int oldd,int oldt,string s)
{
if(d<=0)
return 0;
if(t<=0)
return 0;
cout<<s<<"\n";
if(n==0&&d>0&&t>0)
return 1;
string h=to_string(d)+" "+to_string(t)+" "+to_string(n);
if(dp.find(h)!=dp.end())
return dp[h];
int ans=0;
ans+=recurse(d-1,oldt,n-1,oldd,oldt,s+'0')+recurse(oldd,t-1,n-1,oldd,oldt,s+'1');
return dp[h]=ans;
}
int main()
{
int n,d,t;
cin>>n>>d>>t;
dp.clear();
cout<<recurse(d,t,n,d,t,"")<<"\n";
return 0;
}
You are right, instead of generating strings, it is worth to consider combinatoric approach using dynamic programming (a kind of).
"Good" sequence of length K might end with 1..D-1 zeros or 1..T-1 of ones.
To make a good sequence of length K+1, you can add zero to all sequences except for D-1, and get 2..D-1 zeros for the first kind of precursors and 1 zero for the second kind
Similarly you can add one to all sequences of the first kind, and to all sequences of the second kind except for T-1, and get 1 one for the first kind of precursors and 2..T-1 ones for the second kind
Make two tables
Zeros[N][D] and Ones[N][T]
Fill the first row with zero counts, except for Zeros[1][1] = 1, Ones[1][1] = 1
Fill row by row using the rules above.
Zeros[K][1] = Sum(Ones[K-1][C=1..T-1])
for C in 2..D-1:
Zeros[K][C] = Zeros[K-1][C-1]
Ones[K][1] = Sum(Zeros[K-1][C=1..T-1])
for C in 2..T-1:
Ones[K][C] = Ones[K-1][C-1]
Result is sum of the last row in both tables.
Also note that you really need only two active rows of the table, so you can optimize size to Zeros[2][D] after debugging.
This can be solved using dynamic programming. I'll give a recursive solution to the same. It'll be similar to generating a binary string.
States will be:
i: The ith character that we need to insert to the string.
cnt: The number of consecutive characters before i
bit: The character which was repeated cnt times before i. Value of bit will be either 0 or 1.
Base case will: Return 1, when we reach n since we are starting from 0 and ending at n-1.
Define the size of dp array accordingly. The time complexity will be 2 x N x max(D,T)
#include<bits/stdc++.h>
using namespace std;
int dp[1000][1000][2];
int n, d, t;
int count(int i, int cnt, int bit) {
if (i == n) {
return 1;
}
int &ans = dp[i][cnt][bit];
if (ans != -1) return ans;
ans = 0;
if (bit == 0) {
ans += count(i+1, 1, 1);
if (cnt != d - 1) {
ans += count(i+1, cnt + 1, 0);
}
} else {
// bit == 1
ans += count(i+1, 1, 0);
if (cnt != t-1) {
ans += count(i+1, cnt + 1, 1);
}
}
return ans;
}
signed main() {
ios_base::sync_with_stdio(false), cin.tie(nullptr);
cin >> n >> d >> t;
memset(dp, -1, sizeof dp);
cout << count(0, 0, 0);
return 0;
}

Convert binary ( integer and fraction) from VHDL to decimal, negative value in C code

I have a 14-bit data that is fed from FPGA in vhdl, The NIos II processor reads the 14-bit data from FPGA and do some processing tasks, where Nios II system is programmed in C code
The 14-bit data can be positive, zero or negative. In Altera compiler, I can only define the data to be 8,16 or 32. So I define this to be 16 bit data.
First, I need to check if the data is negative, if it is negative, I need to pad the first two MSB to be bit '1' so the system detects it as negative value instead of positive value.
Second, I need to compute the real value of this binary representation into a decimal value of BOTH integer and fraction.
I learned from this link (Correct algorithm to convert binary floating point "1101.11" into decimal (13.75)?) that I could convert a binary (consists of both integer and fraction) to decimal values.
To be specified, I am able to use this code quoted from this link (Correct algorithm to convert binary floating point "1101.11" into decimal (13.75)?) , reproduced as below:
#include <stdio.h>
#include <math.h>
double convert(const char binary[]){
int bi,i;
int len = 0;
int dot = -1;
double result = 0;
for(bi = 0; binary[bi] != '\0'; bi++){
if(binary[bi] == '.'){
dot = bi;
}
len++;
}
if(dot == -1)
dot=len;
for(i = dot; i >= 0 ; i--){
if (binary[i] == '1'){
result += (double) pow(2,(dot-i-1));
}
}
for(i=dot; binary[i] != '\0'; i++){
if (binary[i] == '1'){
result += 1.0/(double) pow(2.0,(double)(i-dot));
}
}
return result;
}
int main()
{
char bin[] = "1101.11";
char bin1[] = "1101";
char bin2[] = "1101.";
char bin3[] = ".11";
printf("%s -> %f\n",bin, convert(bin));
printf("%s -> %f\n",bin1, convert(bin1));
printf("%s -> %f\n",bin2, convert(bin2));
printf("%s -> %f\n",bin3, convert(bin3));
return 0;
}
I am wondering if this code can be used to check for negative value? I did try with a binary string of 11111101.11 and it gives the output of 253.75...
I have two questions:
What are the modifications I need to do in order to read a negative value?
I know that I can do the bit shift (as below) to check if the msb is 1, if it is 1, I know it is negative value...
if (14bit_data & 0x2000) //if true, it is negative value
The issue is, since it involves fraction part (but not only integer), it confused me a bit if the method still works...
If the binary number is originally not in string format, is there any way I could convert it to string? The binary number is originally fed from a fpga block written in VHDL say, 14 bits, with msb as the sign bit, the following 6 bits are the magnitude for integer and the last 6 bits are the magnitude for fractional part. I need the decimal value in C code for Altera Nios II processor.
OK so I m focusing on the fact that you want to reuse the algorithm you mention at the beginning of your question and assume that the binary representation you have for your signed number is Two's complement but I`m not really sure according to your comments that the input you have is the same than the one used by the algorithm
First pad the 2 MSB to have a 16 bit representation
16bit_data = (14_bit_data & 0x2000) ? ( 14_bit_data | 0xC000) : 14_bit_data ;
In case value is positive then value will remained unchanged and if negative this will be the correct two`s complement representation on 16bits.
For fractionnal part everything is the same compared to algorithm you mentionned in your question.
For integer part everything is the same except the treatment of MSB.
For unsigned number MSB (ie bit[15]) represents pow(2,15-6) ( 6 is the width of frationnal part ) whereas for signed number in Two`s complement representation it represents -pow(2,15-6) meaning that algorithm become
/* integer part operation */
while(p >= 1)
{
rem = (int)fmod(p, 10);
p = (int)(p / 10);
dec = dec + rem * pow(2, t) * (9 != t ? 1 : -1);
++t;
}
or said differently if you don`t want * operator
/* integer part operation */
while(p >= 1)
{
rem = (int)fmod(p, 10);
p = (int)(p / 10);
if( 9 != t)
{
dec = dec + rem * pow(2, t);
}
else
{
dec = dec - rem * pow(2, t);
}
++t;
}
For the second algorithm that you mention, considering you format if dot == 11 and i == 0 we are at MSB ( 10 integer bits followed by dot) so the code become
for(i = dot - 1; i >= 0 ; i--)
{
if (binary[i] == '1')
{
if(11 != dot || i)
{
result += (double) pow(2,(dot-i-1));
}
else
{
// result -= (double) pow(2,(dot-i-1));
// Due to your number format i == 0 and dot == 11 so
result -= 512
}
}
}
WARNING : in brice algorithm the input is character string like "11011.101" whereas according to your description you have an integer input so I`m not sure that this algorithm is suited to your case
I think this should work:
float convert14BitsToFloat(int16_t in)
{
/* Sign-extend in, since it is 14 bits */
if (in & 0x2000) in |= 0xC000;
/* convert to float with 6 decimal places (64 = 2^6) */
return (float)in / 64.0f;
}
To convert any number to string, I would use sprintf. Be aware it may significantly increase the size of your application. If you don't need the float and what to keep a small application, you should make your own conversion function.

Google Interview : Find Crazy Distance Between Strings

This Question was asked to me at the Google interview. I could do it O(n*n) ... Can I do it in better time.
A string can be formed only by 1 and 0.
Definition:
X & Y are strings formed by 0 or 1
D(X,Y) = Remove the things common at the start from both X & Y. Then add the remaining lengths from both the strings.
For e.g.
D(1111, 1000) = Only First alphabet is common. So the remaining string is 111 & 000. Therefore the result length("111") & length("000") = 3 + 3 = 6
D(101, 1100) = Only First two alphabets are common. So the remaining string is 01 & 100. Therefore the result length("01") & length("100") = 2 + 3 = 5
It is pretty that obvious that do find out such a crazy distance is going to be linear. O(m).
Now the question is
given n input, say like
1111
1000
101
1100
Find out the maximum crazy distance possible.
n is the number of input strings.
m is the max length of any input string.
The solution of O(n2 * m) is pretty simple. Can it be done in a better way?
Let's assume that m is fixed. Can we do this in better than O(n^2) ?
Put the strings into a tree, where 0 means go left and 1 means go right. So for example
1111
1000
101
1100
would result in a tree like
Root
1
0 1
0 1* 0 1
0* 0* 1*
where the * means that an element ends there. Constructing this tree clearly takes O(n m).
Now we have to find the diameter of the tree (the longest path between two nodes, which is the same thing as the "crazy distance"). The optimized algorithm presented there hits each node in the tree once. There are at most min(n m, 2^m) such nodes.
So if n m < 2^m, then the the algorithm is O(n m).
If n m > 2^m (and we necessarily have repeated inputs), then the algorithm is still O(n m) from the first step.
This also works for strings with a general alphabet; for an alphabet with k letters build a k-ary tree, in which case the runtime is still O(n m) by the same reasoning, though it takes k times as much memory.
I think this is possible in O(nm) time by creating a binary tree where each bit in a string encodes the path (0 left, 1 right). Then finding the maximum distance between nodes of the tree which can be done in O(n) time.
This is my solution, I think it works:
Create a binary tree from all strings. The tree will be constructed in this way:
at every round, select a string and add it to the tree. so for your example, the tree will be:
<root>
<1> <empty>
<1> <0>
<1> <0> <1> <0>
<1> <0> <0>
So each path from root to a leaf will represent a string.
Now the distance between each two leaves is the distance between two strings. To find the crazy distance, you must find the diameter of this graph, that you can do it easily by dfs or bfs.
The total complexity of this algorithm is:
O(n*m) + O(n*m) = O(n*m).
I think this problem is something like "find prefix for two strings", you can use trie(http://en.wikipedia.org/wiki/Trie) to accerlate searching
I have a google phone interview 3 days before, but maybe I failed...
Best luck to you
To get an answer in O(nm) just iterate across the characters of all string (this is an O(n) operation). We will compare at most m characters, so this will be done O(m). This gives a total of O(nm). Here's a C++ solution:
int max_distance(char** strings, int numstrings, int &distance) {
distance = 0;
// loop O(n) for initialization
for (int i=0; i<numstrings; i++)
distance += strlen(strings[i]);
int max_prefix = 0;
bool done = false;
// loop max O(m)
while (!done) {
int c = -1;
// loop O(n)
for (int i=0; i<numstrings; i++) {
if (strings[i][max_prefix] == 0) {
done = true; // it is enough to reach the end of one string to be done
break;
}
int new_element = strings[i][max_prefix] - '0';
if (-1 == c)
c = new_element;
else {
if (c != new_element) {
done = true; // mismatch
break;
}
}
}
if (!done) {
max_prefix++;
distance -= numstrings;
}
}
return max_prefix;
}
void test_misc() {
char* strings[] = {
"10100",
"10101110",
"101011",
"101"
};
std::cout << std::endl;
int distance = 0;
std::cout << "max_prefix = " << max_distance(strings, sizeof(strings)/sizeof(strings[0]), distance) << std::endl;
}
Not sure why use trees when iteration gives you the same big O computational complexity without the code complexity. anyway here is my version of it in javascript O(mn)
var len = process.argv.length -2; // in node first 2 arguments are node and program file
var input = process.argv.splice(2);
var current;
var currentCount = 0;
var currentCharLoc = 0;
var totalCount = 0;
var totalComplete = 0;
var same = true;
while ( totalComplete < len ) {
current = null;
currentCount = 0;
for ( var loc = 0 ; loc < len ; loc++) {
if ( input[loc].length === currentCharLoc) {
totalComplete++;
same = false;
} else if (input[loc].length > currentCharLoc) {
currentCount++;
if (same) {
if ( current === null ) {
current = input[loc][currentCharLoc];
} else {
if (current !== input[loc][currentCharLoc]) {
same = false;
}
}
}
}
}
if (!same) {
totalCount += currentCount;
}
currentCharLoc++;
}
console.log(totalCount);

Is there a circular hash function?

Thinking about this question on testing string rotation, I wondered: Is there was such thing as a circular/cyclic hash function? E.g.
h(abcdef) = h(bcdefa) = h(cdefab) etc
Uses for this include scalable algorithms which can check n strings against each other to see where some are rotations of others.
I suppose the essence of the hash is to extract information which is order-specific but not position-specific. Maybe something that finds a deterministic 'first position', rotates to it and hashes the result?
It all seems plausible, but slightly beyond my grasp at the moment; it must be out there already...
I'd go along with your deterministic "first position" - find the "least" character; if it appears twice, use the next character as the tie breaker (etc). You can then rotate to a "canonical" position, and hash that in a normal way. If the tie breakers run for the entire course of the string, then you've got a string which is a rotation of itself (if you see what I mean) and it doesn't matter which you pick to be "first".
So:
"abcdef" => hash("abcdef")
"defabc" => hash("abcdef")
"abaac" => hash("aacab") (tie-break between aa, ac and ab)
"cabcab" => hash("abcabc") (it doesn't matter which "a" comes first!)
Update: As Jon pointed out, the first approach doesn't handle strings with repetition very well. Problems arise as duplicate pairs of letters are encountered and the resulting XOR is 0. Here is a modification that I believe fixes the the original algorithm. It uses Euclid-Fermat sequences to generate pairwise coprime integers for each additional occurrence of a character in the string. The result is that the XOR for duplicate pairs is non-zero.
I've also cleaned up the algorithm slightly. Note that the array containing the EF sequences only supports characters in the range 0x00 to 0xFF. This was just a cheap way to demonstrate the algorithm. Also, the algorithm still has runtime O(n) where n is the length of the string.
static int Hash(string s)
{
int H = 0;
if (s.Length > 0)
{
//any arbitrary coprime numbers
int a = s.Length, b = s.Length + 1;
//an array of Euclid-Fermat sequences to generate additional coprimes for each duplicate character occurrence
int[] c = new int[0xFF];
for (int i = 1; i < c.Length; i++)
{
c[i] = i + 1;
}
Func<char, int> NextCoprime = (x) => c[x] = (c[x] - x) * c[x] + x;
Func<char, char, int> NextPair = (x, y) => a * NextCoprime(x) * x.GetHashCode() + b * y.GetHashCode();
//for i=0 we need to wrap around to the last character
H = NextPair(s[s.Length - 1], s[0]);
//for i=1...n we use the previous character
for (int i = 1; i < s.Length; i++)
{
H ^= NextPair(s[i - 1], s[i]);
}
}
return H;
}
static void Main(string[] args)
{
Console.WriteLine("{0:X8}", Hash("abcdef"));
Console.WriteLine("{0:X8}", Hash("bcdefa"));
Console.WriteLine("{0:X8}", Hash("cdefab"));
Console.WriteLine("{0:X8}", Hash("cdfeab"));
Console.WriteLine("{0:X8}", Hash("a0a0"));
Console.WriteLine("{0:X8}", Hash("1010"));
Console.WriteLine("{0:X8}", Hash("0abc0def0ghi"));
Console.WriteLine("{0:X8}", Hash("0def0abc0ghi"));
}
The output is now:
7F7D7F7F
7F7D7F7F
7F7D7F7F
7F417F4F
C796C7F0
E090E0F0
A909BB71
A959BB71
First Version (which isn't complete): Use XOR which is commutative (order doesn't matter) and another little trick involving coprimes to combine ordered hashes of pairs of letters in the string. Here is an example in C#:
static int Hash(char[] s)
{
//any arbitrary coprime numbers
const int a = 7, b = 13;
int H = 0;
if (s.Length > 0)
{
//for i=0 we need to wrap around to the last character
H ^= (a * s[s.Length - 1].GetHashCode()) + (b * s[0].GetHashCode());
//for i=1...n we use the previous character
for (int i = 1; i < s.Length; i++)
{
H ^= (a * s[i - 1].GetHashCode()) + (b * s[i].GetHashCode());
}
}
return H;
}
static void Main(string[] args)
{
Console.WriteLine(Hash("abcdef".ToCharArray()));
Console.WriteLine(Hash("bcdefa".ToCharArray()));
Console.WriteLine(Hash("cdefab".ToCharArray()));
Console.WriteLine(Hash("cdfeab".ToCharArray()));
}
The output is:
4587590
4587590
4587590
7077996
You could find a deterministic first position by always starting at the position with the "lowest" (in terms of alphabetical ordering) substring. So in your case, you'd always start at "a". If there were multiple "a"s, you'd have to take two characters into account etc.
I am sure that you could find a function that can generate the same hash regardless of character position in the input, however, how will you ensure that h(abc) != h(efg) for every conceivable input? (Collisions will occur for all hash algorithms, so I mean, how do you minimize this risk.)
You'd need some additional checks even after generating the hash to ensure that the strings contain the same characters.
Here's an implementation using Linq
public string ToCanonicalOrder(string input)
{
char first = input.OrderBy(x => x).First();
string doubledForRotation = input + input;
string canonicalOrder
= (-1)
.GenerateFrom(x => doubledForRotation.IndexOf(first, x + 1))
.Skip(1) // the -1
.TakeWhile(x => x < input.Length)
.Select(x => doubledForRotation.Substring(x, input.Length))
.OrderBy(x => x)
.First();
return canonicalOrder;
}
assuming generic generator extension method:
public static class TExtensions
{
public static IEnumerable<T> GenerateFrom<T>(this T initial, Func<T, T> next)
{
var current = initial;
while (true)
{
yield return current;
current = next(current);
}
}
}
sample usage:
var sequences = new[]
{
"abcdef", "bcdefa", "cdefab",
"defabc", "efabcd", "fabcde",
"abaac", "cabcab"
};
foreach (string sequence in sequences)
{
Console.WriteLine(ToCanonicalOrder(sequence));
}
output:
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef
aacab
abcabc
then call .GetHashCode() on the result if necessary.
sample usage if ToCanonicalOrder() is converted to an extension method:
sequence.ToCanonicalOrder().GetHashCode();
One possibility is to combine the hash functions of all circular shifts of your input into one meta-hash which does not depend on the order of the inputs.
More formally, consider
for(int i=0; i<string.length; i++) {
result^=string.rotatedBy(i).hashCode();
}
Where you could replace the ^= with any other commutative operation.
More examply, consider the input
"abcd"
to get the hash we take
hash("abcd") ^ hash("dabc") ^ hash("cdab") ^ hash("bcda").
As we can see, taking the hash of any of these permutations will only change the order that you are evaluating the XOR, which won't change its value.
I did something like this for a project in college. There were 2 approaches I used to try to optimize a Travelling-Salesman problem. I think if the elements are NOT guaranteed to be unique, the second solution would take a bit more checking, but the first one should work.
If you can represent the string as a matrix of associations so abcdef would look like
a b c d e f
a x
b x
c x
d x
e x
f x
But so would any combination of those associations. It would be trivial to compare those matrices.
Another quicker trick would be to rotate the string so that the "first" letter is first. Then if you have the same starting point, the same strings will be identical.
Here is some Ruby code:
def normalize_string(string)
myarray = string.split(//) # split into an array
index = myarray.index(myarray.min) # find the index of the minimum element
index.times do
myarray.push(myarray.shift) # move stuff from the front to the back
end
return myarray.join
end
p normalize_string('abcdef').eql?normalize_string('defabc') # should return true
Maybe use a rolling hash for each offset (RabinKarp like) and return the minimum hash value? There could be collisions though.

Resources