Dynamic character generator; Generate all possible strings from a character set - string

I want to make a dynamic string generator that will generate all possible unique strings from a character set with a dynamic length.
I can make this very easily using for loops but then its static and not dynamic length.
// Prints all possible strings with the length of 3
for a in allowedCharacters {
for b in allowedCharacters {
for c in allowedCharacters {
println(a+b+c)
}
}
}
But when I want to make this dynamic of length so I can just call generate(length: 5) I get confused.
I found this Stackoverflow question But the accepted answer generates strings 1-maxLength length and I want maxLength on ever string.

As noted above, use recursion. Here is how it can be done with C#:
static IEnumerable<string> Generate(int length, char[] allowed_chars)
{
if (length == 1)
{
foreach (char c in allowed_chars)
yield return c.ToString();
}
else
{
var sub_strings = Generate(length - 1, allowed_chars);
foreach (char c in allowed_chars)
{
foreach (string sub in sub_strings)
{
yield return c + sub;
}
}
}
}
private static void Main(string[] args)
{
string chars = "abc";
List<string> result = Generate(3, chars.ToCharArray()).ToList();
}
Please note that the run time of this algorithm and the amount of data it returns is exponential as the length increases which means that if you have large lengths, you should expect the code to take a long time and to return a huge amount of data.

Translation of #YacoubMassad's C# code to Swift:
func generate(length: Int, allowedChars: [String]) -> [String] {
if length == 1 {
return allowedChars
}
else {
let subStrings = generate(length - 1, allowedChars: allowedChars)
var arr = [String]()
for c in allowedChars {
for sub in subStrings {
arr.append(c + sub)
}
}
return arr
}
}
println(generate(3, allowedChars: ["a", "b", "c"]))
Prints:
aaa, aab, aac, aba, abb, abc, aca, acb, acc, baa, bab, bac, bba, bbb, bbc, bca, bcb, bcc, caa, cab, cac, cba, cbb, cbc, cca, ccb, ccc

While you can (obviously enough) use recursion to solve this problem, it quite an inefficient way to do the job.
What you're really doing is just counting. In your example, with "a", "b" and "c" as the allowed characters, you're counting in base 3, and since you're allowing three character strings, they're three digit numbers.
An N-digit number in base M can represent NM different possible values, going from 0 through NM-1. So, for your case, that's limit=pow(3, 3)-1;. To generate all those values, you just count from 0 through the limit, and convert each number to base M, using the specified characters as the "digits". For example, in C++ the code can look like this:
#include <string>
#include <iostream>
int main() {
std::string letters = "abc";
std::size_t base = letters.length();
std::size_t digits = 3;
int limit = pow(base, digits);
for (int i = 0; i < limit; i++) {
int in = i;
for (int j = 0; j < digits; j++) {
std::cout << letters[in%base];
in /= base;
}
std::cout << "\t";
}
}
One minor note: as I've written it here, this produces the output in basically a little-endian format. That is, the "digit" that varies the fastest is on the left, and the one that changes the slowest is on the right.

Related

Determine number of char movement to get word

Suppose you are given a word
"sunflower"
You can perform only one operation type on it, pick a character and move it to the front.
So for instance if you picked 'f', the word would be "fsunlower".
You can have a series of these operations.
fsunlower (moved f to front)
wfsunloer (moved w to front)
fwsunloer (moved f to front again)
The problem is to get the minimum number of operations required, given the derived word and the original word. So if input strings are "fwsunloer", "sunflower", the output would be 3.
This problem is equivalent to : given String A and B, find the longest suffix of string A that is a sub-sequence of String B. Because, if we know which n - characters need to be moved, we will only need n steps. So what we need to find is the maximum number of character that don't need to be moved, which is equivalent to the longest suffix in A.
So for the given example, the longest suffix is sunlor
Java code:
public static void main(String[] args) {
System.out.println(minOp("ewfsunlor", "sunflower"));
}
public static int minOp(String A, String B) {
int n = A.length() - 1;//Start from the end of String A;
int pos = B.length();
int result = 0;
while (n >= 0) {
int nxt = -1;
for (int i = pos - 1; i >= 0; i--) {
if (B.charAt(i) == A.charAt(n)) {
nxt = i;
break;
}
}
if (nxt == -1) {
break;
}
result++;
pos = nxt;
n--;
}
return B.length() - result;
}
Result:
3
Time complexity O(n) with n is length of String A.
Note: this algorithm is based on an assumption that A and B contains same set of character. Otherwise, you need to check for that before using the function

Square Subsequence

A string is called a square string if it can be obtained by concatenating two copies of the same string. For example, "abab", "aa" are square strings, while "aaa", "abba" are not. Given a string, how many subsequences of the string are square strings? A subsequence of a string can be obtained by deleting zero or more characters from it, and maintaining the relative order of the remaining characters.The subsequence need not be unique.
eg string 'aaa' will have 3 square subsequences
Observation 1: The length of a square string is always even.
Observation 2: Every square subsequence of length 2n (n>1) is a combination of two shorter subsequences: one of length 2(n-1) and one of length 2.
First, find the subsequences of length two, i.e. the characters that occur twice or more in the string. We'll call these pairs. For each subsequence of length 2 (1 pair), remember the position of the first and last character in the sequence.
Now, suppose we have all subsequences of length 2(n-1), and we know for each where in the string the first and second part begins and ends. We can find sequences of length 2n by using observation 2:
Go through all the subsequences of length 2(n-1), and find all pairs where the first item in the pair lies between the last position of the first part and the first position of the second part, and the second item lies after the last position of the second part. Every time such a pair is found, combine it with the current subsequence of length 2(n-2) into a new subsequence of length 2n.
Repeat the last step until no more new square subsequences are found.
Psuedocode:
total_square_substrings <- 0
# Find every substring
for i in 1:length_of_string {
# Odd strings are not square, continue
if((length_of_string-i) % 2 == 1)
continue;
for j in 1:length_of_string {
# Remove i characters from the string, starting at character j
substring <- substr(string,0,j) + substr(string,j+1,length_of_string);
# Test all ways of splitting the substring into even, whole parts (e.g. if string is of length 15, this splits by 3 and 5)
SubstringTest: for(k in 2:(length_of_substring/2))
{
if(length_of_substring % k > 0)
continue;
first_partition <- substring[1:partition_size];
# Test every partition against the first for equality, if all pass, we have a square substring
for(m in 2:k)
{
if(first_partition != substring[(k-1)*partition_size:k*partition_size])
continue SubstringTest;
}
# We have a square substring, move on to next substring
total_square_substrings++;
break SubstringTest;
}
}
}
Here's a solution using LINQ:
IEnumerable<string> input = new[] {"a","a","a"};
// The next line assumes the existence of a "PowerSet" method for IEnumerable<T>.
// I'll provide my implementation of the method later.
IEnumerable<IEnumerable<string>> powerSet = input.PowerSet();
// Once you have the power set of all subsequences, select only those that are "square".
IEnumerable<IEnumerable<string>> squares = powerSet.Where(x => x.Take(x.Count()/2).SequenceEqual(x.Skip(x.Count()/2)));
Console.WriteLine(squares);
And here is my PowerSet extension method, along with a "Choose" extension method that is required by PowerSet:
public static class CombinatorialExtensionMethods
{
public static IEnumerable<IEnumerable<T>> Choose<T>(this IEnumerable<T> seq, int k)
{
// Use "Select With Index" to create IEnumerable<anonymous type containing sequence values with indexes>
var indexedSeq = seq.Select((Value, Index) => new {Value, Index});
// Create k copies of the sequence to join
var sequences = Enumerable.Repeat(indexedSeq,k);
// Create IEnumerable<TypeOf(indexedSeq)> containing one empty sequence
/// To create an empty sequence of the same anonymous type as indexedSeq, allow the compiler to infer the type from a query expression
var emptySequence =
from item in indexedSeq
where false
select item;
var emptyProduct = Enumerable.Repeat(emptySequence,1);
// Select "Choose" permutations, using Index to order the items
var indexChoose = sequences.Aggregate(
emptyProduct,
(accumulator, sequence) =>
from accseq in accumulator
from item in sequence
where accseq.All(accitem => accitem.Index < item.Index)
select accseq.Concat(new[] { item }));
// Select just the Value from each permutation
IEnumerable<IEnumerable<T>> result =
from item in indexChoose
select item.Select((x) => x.Value);
return result;
}
public static IEnumerable<IEnumerable<T>> PowerSet<T>(this IEnumerable<T> seq)
{
IEnumerable<IEnumerable<T>> result = new[] { Enumerable.Empty<T>() };
for (int i=1; i<=seq.Count(); i++)
{
result = result.Concat(seq.Choose<T>(i));
}
return result;
}
}
I initially derive all possible sub-sequences and then i will check if the derived sub-sequence is a square sub-sequence or not
import java.io.*;
import java.util.*;
public class Subsequence {
static int count;
public static void print(String prefix, String remaining, int k) {
if (k == 0) {
//System.out.println(prefix);
if(prefix.length() %2 == 0 && check(prefix) != 0 && prefix.length() != 0)
{
count++;
//System.out.println(prefix);
}
return;
}
if (remaining.length() == 0)
return;
print(prefix + remaining.charAt(0), remaining.substring(1), k-1);
print(prefix, remaining.substring(1), k);
}
public static void main(String[] args)
{
//String s = "aaa";
Scanner sc = new Scanner(System.in);
int t=Integer.parseInt(sc.nextLine());
while((t--)>0)
{
count = 0;
String s = sc.nextLine();
for(int i=0;i<=s.length();i++)
{
print("",s,i);
}
System.out.println(count);
}
}
public static int check(String s)
{
int i=0,j=(s.length())/2;
for(;i<(s.length())/2 && j < (s.length());i++,j++)
{
if(s.charAt(i)==s.charAt(j))
{
continue;
}
else
return 0;
}
return 1;
}
}
import java.io.*;
import java.util.*;
public class Solution {
/*
Sample Input:
3
aaa
abab
baaba
Sample Output:
3
3
6
*/
public static void main(String[] args) {
//Creating an object of SquareString class
SquareString squareStringObject=new SquareString();
Scanner in = new Scanner(System.in);
//Number of Test Cases
int T = in.nextInt();
in.nextLine();
String[] inputString=new String[T];
for(int i=0;i<T;i++){
// Taking input and storing in String Array
inputString[i]=in.nextLine();
}
for(int i=0;i<T;i++){
//Calculating and printing the number of Square Strings
squareStringObject.numberOfSquareStrings(inputString[i]);
}
}
}
class SquareString{
//The counter maintained for keeping a count of Square Strings
private int squareStringCounter;
//Default Constructor initialising the counter as 0
public SquareString(){
squareStringCounter=0;
}
//Function calculates and prints the number of square strings
public void numberOfSquareStrings(String inputString){
squareStringCounter=0;
//Initialising the string part1 as a single character iterated over the length
for(int iterStr1=0;iterStr1<inputString.length()-1;iterStr1++){
String str1=""+inputString.charAt(iterStr1);
String str2=inputString.substring(iterStr1+1);
//Calling a recursive method to generate substring
generateSubstringAndCountSquareStrings(str1,str2);
}
System.out.println(squareStringCounter);
}
//Recursive method to generate sub strings
private void generateSubstringAndCountSquareStrings(String str1,String str2){
for(int iterStr2=0;iterStr2<str2.length();iterStr2++){
String newStr1=str1+str2.charAt(iterStr2);
if(isSquareString(newStr1)){
squareStringCounter++;
}
String newStr2=str2.substring(iterStr2+1);
generateSubstringAndCountSquareStrings(newStr1,newStr2);
}
}
private boolean isSquareString(String str){
if(str.length()%2!=0)
return false;
String strPart1=str.substring(0,str.length()/2);
String strPart2=str.substring(str.length()/2);
return strPart1.equals(strPart2);
}
}

Sorting a string using another sorting order string [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I saw this in an interview question ,
Given a sorting order string, you are asked to sort the input string based on the given sorting order string.
for example if the sorting order string is dfbcae
and the Input string is abcdeeabc
the output should be dbbccaaee.
any ideas on how to do this , in an efficient way ?
The Counting Sort option is pretty cool, and fast when the string to be sorted is long compared to the sort order string.
create an array where each index corresponds to a letter in the alphabet, this is the count array
for each letter in the sort target, increment the index in the count array which corresponds to that letter
for each letter in the sort order string
add that letter to the end of the output string a number of times equal to it's count in the count array
Algorithmic complexity is O(n) where n is the length of the string to be sorted. As the Wikipedia article explains we're able to beat the lower bound on standard comparison based sorting because this isn't a comparison based sort.
Here's some pseudocode.
char[26] countArray;
foreach(char c in sortTarget)
{
countArray[c - 'a']++;
}
int head = 0;
foreach(char c in sortOrder)
{
while(countArray[c - 'a'] > 0)
{
sortTarget[head] = c;
head++;
countArray[c - 'a']--;
}
}
Note: this implementation requires that both strings contain only lowercase characters.
Here's a nice easy to understand algorithm that has decent algorithmic complexity.
For each character in the sort order string
scan string to be sorted, starting at first non-ordered character (you can keep track of this character with an index or pointer)
when you find an occurrence of the specified character, swap it with the first non-ordered character
increment the index for the first non-ordered character
This is O(n*m), where n is the length of the string to be sorted and m is the length of the sort order string. We're able to beat the lower bound on comparison based sorting because this algorithm doesn't really use comparisons. Like Counting Sort it relies on the fact that you have a predefined finite external ordering set.
Here's some psuedocode:
int head = 0;
foreach(char c in sortOrder)
{
for(int i = head; i < sortTarget.length; i++)
{
if(sortTarget[i] == c)
{
// swap i with head
char temp = sortTarget[head];
sortTarget[head] = sortTarget[i];
sortTarget[i] = temp;
head++;
}
}
}
In Python, you can just create an index and use that in a comparison expression:
order = 'dfbcae'
input = 'abcdeeabc'
index = dict([ (y,x) for (x,y) in enumerate(order) ])
output = sorted(input, cmp=lambda x,y: index[x] - index[y])
print 'input=',''.join(input)
print 'output=',''.join(output)
gives this output:
input= abcdeeabc
output= dbbccaaee
Use binary search to find all the "split points" between different letters, then use the length of each segment directly. This will be asymptotically faster then naive counting sort, but will be harder to implement:
Use an array of size 26*2 to store the begin and end of each letter;
Inspect the middle element, see if it is different from the element left to it. If so, then this is the begin for the middle element and end for the element before it;
Throw away the segment with identical begin and end (if there are any), recursively apply this algorithm.
Since there are at most 25 "split"s, you won't have to do the search for more than 25 segemnts, and for each segment it is O(logn). Since this is constant * O(logn), the algorithm is O(nlogn).
And of course, just use counting sort will be easier to implement:
Use an array of size 26 to record the number of different letters;
Scan the input string;
Output the string in the given sorting order.
This is O(n), n being the length of the string.
Interview questions are generally about thought process and don't usually care too much about language features, but I couldn't resist posting a VB.Net 4.0 version anyway.
"Efficient" can mean two different things. The first is "what's the fastest way to make a computer execute a task" and the second is "what's the fastest that we can get a task done". They might sound the same but the first can mean micro-optimizations like int vs short, running timers to compare execution times and spending a week tweaking every millisecond out of an algorithm. The second definition is about how much human time would it take to create the code that does the task (hopefully in a reasonable amount of time). If code A runs 20 times faster than code B but code B took 1/20th of the time to write, depending on the granularity of the timer (1ms vs 20ms, 1 week vs 20 weeks), each version could be considered "efficient".
Dim input = "abcdeeabc"
Dim sort = "dfbcae"
Dim SortChars = sort.ToList()
Dim output = New String((From c In input.ToList() Select c Order By SortChars.IndexOf(c)).ToArray())
Trace.WriteLine(output)
Here is my solution to the question
import java.util.*;
import java.io.*;
class SortString
{
public static void main(String arg[])throws IOException
{
BufferedReader br=new BufferedReader(new InputStreamReader(System.in));
// System.out.println("Enter 1st String :");
// System.out.println("Enter 1st String :");
// String s1=br.readLine();
// System.out.println("Enter 2nd String :");
// String s2=br.readLine();
String s1="tracctor";
String s2="car";
String com="";
String uncom="";
for(int i=0;i<s2.length();i++)
{
if(s1.contains(""+s2.charAt(i)))
{
com=com+s2.charAt(i);
}
}
System.out.println("Com :"+com);
for(int i=0;i<s1.length();i++)
if(!com.contains(""+s1.charAt(i)))
uncom=uncom+s1.charAt(i);
System.out.println("Uncom "+uncom);
System.out.println("Combined "+(com+uncom));
HashMap<String,Integer> h1=new HashMap<String,Integer>();
for(int i=0;i<s1.length();i++)
{
String m=""+s1.charAt(i);
if(h1.containsKey(m))
{
int val=(int)h1.get(m);
val=val+1;
h1.put(m,val);
}
else
{
h1.put(m,new Integer(1));
}
}
StringBuilder x=new StringBuilder();
for(int i=0;i<com.length();i++)
{
if(h1.containsKey(""+com.charAt(i)))
{
int count=(int)h1.get(""+com.charAt(i));
while(count!=0)
{x.append(""+com.charAt(i));count--;}
}
}
x.append(uncom);
System.out.println("Sort "+x);
}
}
Here is my version which is O(n) in time. Instead of unordered_map, I could have just used a char array of constant size. i.,e. char char_count[256] (and done ++char_count[ch - 'a'] ) assuming the input strings has all ASCII small characters.
string SortOrder(const string& input, const string& sort_order) {
unordered_map<char, int> char_count;
for (auto ch : input) {
++char_count[ch];
}
string res = "";
for (auto ch : sort_order) {
unordered_map<char, int>::iterator it = char_count.find(ch);
if (it != char_count.end()) {
string s(it->second, it->first);
res += s;
}
}
return res;
}
private static String sort(String target, String reference) {
final Map<Character, Integer> referencesMap = new HashMap<Character, Integer>();
for (int i = 0; i < reference.length(); i++) {
char key = reference.charAt(i);
if (!referencesMap.containsKey(key)) {
referencesMap.put(key, i);
}
}
List<Character> chars = new ArrayList<Character>(target.length());
for (int i = 0; i < target.length(); i++) {
chars.add(target.charAt(i));
}
Collections.sort(chars, new Comparator<Character>() {
#Override
public int compare(Character o1, Character o2) {
return referencesMap.get(o1).compareTo(referencesMap.get(o2));
}
});
StringBuilder sb = new StringBuilder();
for (Character c : chars) {
sb.append(c);
}
return sb.toString();
}
In C# I would just use the IComparer Interface and leave it to Array.Sort
void Main()
{
// we defin the IComparer class to define Sort Order
var sortOrder = new SortOrder("dfbcae");
var testOrder = "abcdeeabc".ToCharArray();
// sort the array using Array.Sort
Array.Sort(testOrder, sortOrder);
Console.WriteLine(testOrder.ToString());
}
public class SortOrder : IComparer
{
string sortOrder;
public SortOrder(string sortOrder)
{
this.sortOrder = sortOrder;
}
public int Compare(object obj1, object obj2)
{
var obj1Index = sortOrder.IndexOf((char)obj1);
var obj2Index = sortOrder.IndexOf((char)obj2);
if(obj1Index == -1 || obj2Index == -1)
{
throw new Exception("character not found");
}
if(obj1Index > obj2Index)
{
return 1;
}
else if (obj1Index == obj2Index)
{
return 0;
}
else
{
return -1;
}
}
}

Is there a circular hash function?

Thinking about this question on testing string rotation, I wondered: Is there was such thing as a circular/cyclic hash function? E.g.
h(abcdef) = h(bcdefa) = h(cdefab) etc
Uses for this include scalable algorithms which can check n strings against each other to see where some are rotations of others.
I suppose the essence of the hash is to extract information which is order-specific but not position-specific. Maybe something that finds a deterministic 'first position', rotates to it and hashes the result?
It all seems plausible, but slightly beyond my grasp at the moment; it must be out there already...
I'd go along with your deterministic "first position" - find the "least" character; if it appears twice, use the next character as the tie breaker (etc). You can then rotate to a "canonical" position, and hash that in a normal way. If the tie breakers run for the entire course of the string, then you've got a string which is a rotation of itself (if you see what I mean) and it doesn't matter which you pick to be "first".
So:
"abcdef" => hash("abcdef")
"defabc" => hash("abcdef")
"abaac" => hash("aacab") (tie-break between aa, ac and ab)
"cabcab" => hash("abcabc") (it doesn't matter which "a" comes first!)
Update: As Jon pointed out, the first approach doesn't handle strings with repetition very well. Problems arise as duplicate pairs of letters are encountered and the resulting XOR is 0. Here is a modification that I believe fixes the the original algorithm. It uses Euclid-Fermat sequences to generate pairwise coprime integers for each additional occurrence of a character in the string. The result is that the XOR for duplicate pairs is non-zero.
I've also cleaned up the algorithm slightly. Note that the array containing the EF sequences only supports characters in the range 0x00 to 0xFF. This was just a cheap way to demonstrate the algorithm. Also, the algorithm still has runtime O(n) where n is the length of the string.
static int Hash(string s)
{
int H = 0;
if (s.Length > 0)
{
//any arbitrary coprime numbers
int a = s.Length, b = s.Length + 1;
//an array of Euclid-Fermat sequences to generate additional coprimes for each duplicate character occurrence
int[] c = new int[0xFF];
for (int i = 1; i < c.Length; i++)
{
c[i] = i + 1;
}
Func<char, int> NextCoprime = (x) => c[x] = (c[x] - x) * c[x] + x;
Func<char, char, int> NextPair = (x, y) => a * NextCoprime(x) * x.GetHashCode() + b * y.GetHashCode();
//for i=0 we need to wrap around to the last character
H = NextPair(s[s.Length - 1], s[0]);
//for i=1...n we use the previous character
for (int i = 1; i < s.Length; i++)
{
H ^= NextPair(s[i - 1], s[i]);
}
}
return H;
}
static void Main(string[] args)
{
Console.WriteLine("{0:X8}", Hash("abcdef"));
Console.WriteLine("{0:X8}", Hash("bcdefa"));
Console.WriteLine("{0:X8}", Hash("cdefab"));
Console.WriteLine("{0:X8}", Hash("cdfeab"));
Console.WriteLine("{0:X8}", Hash("a0a0"));
Console.WriteLine("{0:X8}", Hash("1010"));
Console.WriteLine("{0:X8}", Hash("0abc0def0ghi"));
Console.WriteLine("{0:X8}", Hash("0def0abc0ghi"));
}
The output is now:
7F7D7F7F
7F7D7F7F
7F7D7F7F
7F417F4F
C796C7F0
E090E0F0
A909BB71
A959BB71
First Version (which isn't complete): Use XOR which is commutative (order doesn't matter) and another little trick involving coprimes to combine ordered hashes of pairs of letters in the string. Here is an example in C#:
static int Hash(char[] s)
{
//any arbitrary coprime numbers
const int a = 7, b = 13;
int H = 0;
if (s.Length > 0)
{
//for i=0 we need to wrap around to the last character
H ^= (a * s[s.Length - 1].GetHashCode()) + (b * s[0].GetHashCode());
//for i=1...n we use the previous character
for (int i = 1; i < s.Length; i++)
{
H ^= (a * s[i - 1].GetHashCode()) + (b * s[i].GetHashCode());
}
}
return H;
}
static void Main(string[] args)
{
Console.WriteLine(Hash("abcdef".ToCharArray()));
Console.WriteLine(Hash("bcdefa".ToCharArray()));
Console.WriteLine(Hash("cdefab".ToCharArray()));
Console.WriteLine(Hash("cdfeab".ToCharArray()));
}
The output is:
4587590
4587590
4587590
7077996
You could find a deterministic first position by always starting at the position with the "lowest" (in terms of alphabetical ordering) substring. So in your case, you'd always start at "a". If there were multiple "a"s, you'd have to take two characters into account etc.
I am sure that you could find a function that can generate the same hash regardless of character position in the input, however, how will you ensure that h(abc) != h(efg) for every conceivable input? (Collisions will occur for all hash algorithms, so I mean, how do you minimize this risk.)
You'd need some additional checks even after generating the hash to ensure that the strings contain the same characters.
Here's an implementation using Linq
public string ToCanonicalOrder(string input)
{
char first = input.OrderBy(x => x).First();
string doubledForRotation = input + input;
string canonicalOrder
= (-1)
.GenerateFrom(x => doubledForRotation.IndexOf(first, x + 1))
.Skip(1) // the -1
.TakeWhile(x => x < input.Length)
.Select(x => doubledForRotation.Substring(x, input.Length))
.OrderBy(x => x)
.First();
return canonicalOrder;
}
assuming generic generator extension method:
public static class TExtensions
{
public static IEnumerable<T> GenerateFrom<T>(this T initial, Func<T, T> next)
{
var current = initial;
while (true)
{
yield return current;
current = next(current);
}
}
}
sample usage:
var sequences = new[]
{
"abcdef", "bcdefa", "cdefab",
"defabc", "efabcd", "fabcde",
"abaac", "cabcab"
};
foreach (string sequence in sequences)
{
Console.WriteLine(ToCanonicalOrder(sequence));
}
output:
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef
aacab
abcabc
then call .GetHashCode() on the result if necessary.
sample usage if ToCanonicalOrder() is converted to an extension method:
sequence.ToCanonicalOrder().GetHashCode();
One possibility is to combine the hash functions of all circular shifts of your input into one meta-hash which does not depend on the order of the inputs.
More formally, consider
for(int i=0; i<string.length; i++) {
result^=string.rotatedBy(i).hashCode();
}
Where you could replace the ^= with any other commutative operation.
More examply, consider the input
"abcd"
to get the hash we take
hash("abcd") ^ hash("dabc") ^ hash("cdab") ^ hash("bcda").
As we can see, taking the hash of any of these permutations will only change the order that you are evaluating the XOR, which won't change its value.
I did something like this for a project in college. There were 2 approaches I used to try to optimize a Travelling-Salesman problem. I think if the elements are NOT guaranteed to be unique, the second solution would take a bit more checking, but the first one should work.
If you can represent the string as a matrix of associations so abcdef would look like
a b c d e f
a x
b x
c x
d x
e x
f x
But so would any combination of those associations. It would be trivial to compare those matrices.
Another quicker trick would be to rotate the string so that the "first" letter is first. Then if you have the same starting point, the same strings will be identical.
Here is some Ruby code:
def normalize_string(string)
myarray = string.split(//) # split into an array
index = myarray.index(myarray.min) # find the index of the minimum element
index.times do
myarray.push(myarray.shift) # move stuff from the front to the back
end
return myarray.join
end
p normalize_string('abcdef').eql?normalize_string('defabc') # should return true
Maybe use a rolling hash for each offset (RabinKarp like) and return the minimum hash value? There could be collisions though.

Generate list of all possible permutations of a string

How would I go about generating a list of all possible permutations of a string between x and y characters in length, containing a variable list of characters.
Any language would work, but it should be portable.
There are several ways to do this. Common methods use recursion, memoization, or dynamic programming. The basic idea is that you produce a list of all strings of length 1, then in each iteration, for all strings produced in the last iteration, add that string concatenated with each character in the string individually. (the variable index in the code below keeps track of the start of the last and the next iteration)
Some pseudocode:
list = originalString.split('')
index = (0,0)
list = [""]
for iteration n in 1 to y:
index = (index[1], len(list))
for string s in list.subset(index[0] to end):
for character c in originalString:
list.add(s + c)
you'd then need to remove all strings less than x in length, they'll be the first (x-1) * len(originalString) entries in the list.
It's better to use backtracking
#include <stdio.h>
#include <string.h>
void swap(char *a, char *b) {
char temp;
temp = *a;
*a = *b;
*b = temp;
}
void print(char *a, int i, int n) {
int j;
if(i == n) {
printf("%s\n", a);
} else {
for(j = i; j <= n; j++) {
swap(a + i, a + j);
print(a, i + 1, n);
swap(a + i, a + j);
}
}
}
int main(void) {
char a[100];
gets(a);
print(a, 0, strlen(a) - 1);
return 0;
}
You are going to get a lot of strings, that's for sure...
Where x and y is how you define them and r is the number of characters we are selecting from --if I am understanding you correctly. You should definitely generate these as needed and not get sloppy and say, generate a powerset and then filter the length of strings.
The following definitely isn't the best way to generate these, but it's an interesting aside, none-the-less.
Knuth (volume 4, fascicle 2, 7.2.1.3) tells us that (s,t)-combination is equivalent to s+1 things taken t at a time with repetition -- an (s,t)-combination is notation used by Knuth that is equal to . We can figure this out by first generating each (s,t)-combination in binary form (so, of length (s+t)) and counting the number of 0's to the left of each 1.
10001000011101 --> becomes the permutation: {0, 3, 4, 4, 4, 1}
Non recursive solution according to Knuth, Python example:
def nextPermutation(perm):
k0 = None
for i in range(len(perm)-1):
if perm[i]<perm[i+1]:
k0=i
if k0 == None:
return None
l0 = k0+1
for i in range(k0+1, len(perm)):
if perm[k0] < perm[i]:
l0 = i
perm[k0], perm[l0] = perm[l0], perm[k0]
perm[k0+1:] = reversed(perm[k0+1:])
return perm
perm=list("12345")
while perm:
print perm
perm = nextPermutation(perm)
You might look at "Efficiently Enumerating the Subsets of a Set", which describes an algorithm to do part of what you want - quickly generate all subsets of N characters from length x to y. It contains an implementation in C.
For each subset, you'd still have to generate all the permutations. For instance if you wanted 3 characters from "abcde", this algorithm would give you "abc","abd", "abe"...
but you'd have to permute each one to get "acb", "bac", "bca", etc.
Some working Java code based on Sarp's answer:
public class permute {
static void permute(int level, String permuted,
boolean used[], String original) {
int length = original.length();
if (level == length) {
System.out.println(permuted);
} else {
for (int i = 0; i < length; i++) {
if (!used[i]) {
used[i] = true;
permute(level + 1, permuted + original.charAt(i),
used, original);
used[i] = false;
}
}
}
}
public static void main(String[] args) {
String s = "hello";
boolean used[] = {false, false, false, false, false};
permute(0, "", used, s);
}
}
Here is a simple solution in C#.
It generates only the distinct permutations of a given string.
static public IEnumerable<string> permute(string word)
{
if (word.Length > 1)
{
char character = word[0];
foreach (string subPermute in permute(word.Substring(1)))
{
for (int index = 0; index <= subPermute.Length; index++)
{
string pre = subPermute.Substring(0, index);
string post = subPermute.Substring(index);
if (post.Contains(character))
continue;
yield return pre + character + post;
}
}
}
else
{
yield return word;
}
}
There are a lot of good answers here. I also suggest a very simple recursive solution in C++.
#include <string>
#include <iostream>
template<typename Consume>
void permutations(std::string s, Consume consume, std::size_t start = 0) {
if (start == s.length()) consume(s);
for (std::size_t i = start; i < s.length(); i++) {
std::swap(s[start], s[i]);
permutations(s, consume, start + 1);
}
}
int main(void) {
std::string s = "abcd";
permutations(s, [](std::string s) {
std::cout << s << std::endl;
});
}
Note: strings with repeated characters will not produce unique permutations.
I just whipped this up quick in Ruby:
def perms(x, y, possible_characters)
all = [""]
current_array = all.clone
1.upto(y) { |iteration|
next_array = []
current_array.each { |string|
possible_characters.each { |c|
value = string + c
next_array.insert next_array.length, value
all.insert all.length, value
}
}
current_array = next_array
}
all.delete_if { |string| string.length < x }
end
You might look into language API for built in permutation type functions, and you might be able to write more optimized code, but if the numbers are all that high, I'm not sure there is much of a way around having a lot of results.
Anyways, the idea behind the code is start with string of length 0, then keep track of all the strings of length Z where Z is the current size in the iteration. Then, go through each string and append each character onto each string. Finally at the end, remove any that were below the x threshold and return the result.
I didn't test it with potentially meaningless input (null character list, weird values of x and y, etc).
This is a translation of Mike's Ruby version, into Common Lisp:
(defun perms (x y original-string)
(loop with all = (list "")
with current-array = (list "")
for iteration from 1 to y
do (loop with next-array = nil
for string in current-array
do (loop for c across original-string
for value = (concatenate 'string string (string c))
do (push value next-array)
(push value all))
(setf current-array (reverse next-array)))
finally (return (nreverse (delete-if #'(lambda (el) (< (length el) x)) all)))))
And another version, slightly shorter and using more loop facility features:
(defun perms (x y original-string)
(loop repeat y
collect (loop for string in (or (car (last sets)) (list ""))
append (loop for c across original-string
collect (concatenate 'string string (string c)))) into sets
finally (return (loop for set in sets
append (loop for el in set when (>= (length el) x) collect el)))))
Here is a simple word C# recursive solution:
Method:
public ArrayList CalculateWordPermutations(string[] letters, ArrayList words, int index)
{
bool finished = true;
ArrayList newWords = new ArrayList();
if (words.Count == 0)
{
foreach (string letter in letters)
{
words.Add(letter);
}
}
for(int j=index; j<words.Count; j++)
{
string word = (string)words[j];
for(int i =0; i<letters.Length; i++)
{
if(!word.Contains(letters[i]))
{
finished = false;
string newWord = (string)word.Clone();
newWord += letters[i];
newWords.Add(newWord);
}
}
}
foreach (string newWord in newWords)
{
words.Add(newWord);
}
if(finished == false)
{
CalculateWordPermutations(letters, words, words.Count - newWords.Count);
}
return words;
}
Calling:
string[] letters = new string[]{"a","b","c"};
ArrayList words = CalculateWordPermutations(letters, new ArrayList(), 0);
... and here is the C version:
void permute(const char *s, char *out, int *used, int len, int lev)
{
if (len == lev) {
out[lev] = '\0';
puts(out);
return;
}
int i;
for (i = 0; i < len; ++i) {
if (! used[i])
continue;
used[i] = 1;
out[lev] = s[i];
permute(s, out, used, len, lev + 1);
used[i] = 0;
}
return;
}
permute (ABC) -> A.perm(BC) -> A.perm[B.perm(C)] -> A.perm[(*BC), (CB*)] -> [(*ABC), (BAC), (BCA*), (*ACB), (CAB), (CBA*)]
To remove duplicates when inserting each alphabet check to see if previous string ends with the same alphabet (why? -exercise)
public static void main(String[] args) {
for (String str : permStr("ABBB")){
System.out.println(str);
}
}
static Vector<String> permStr(String str){
if (str.length() == 1){
Vector<String> ret = new Vector<String>();
ret.add(str);
return ret;
}
char start = str.charAt(0);
Vector<String> endStrs = permStr(str.substring(1));
Vector<String> newEndStrs = new Vector<String>();
for (String endStr : endStrs){
for (int j = 0; j <= endStr.length(); j++){
if (endStr.substring(0, j).endsWith(String.valueOf(start)))
break;
newEndStrs.add(endStr.substring(0, j) + String.valueOf(start) + endStr.substring(j));
}
}
return newEndStrs;
}
Prints all permutations sans duplicates
Recursive solution in C++
int main (int argc, char * const argv[]) {
string s = "sarp";
bool used [4];
permute(0, "", used, s);
}
void permute(int level, string permuted, bool used [], string &original) {
int length = original.length();
if(level == length) { // permutation complete, display
cout << permuted << endl;
} else {
for(int i=0; i<length; i++) { // try to add an unused character
if(!used[i]) {
used[i] = true;
permute(level+1, original[i] + permuted, used, original); // find the permutations starting with this string
used[i] = false;
}
}
}
In Perl, if you want to restrict yourself to the lowercase alphabet, you can do this:
my #result = ("a" .. "zzzz");
This gives all possible strings between 1 and 4 characters using lowercase characters. For uppercase, change "a" to "A" and "zzzz" to "ZZZZ".
For mixed-case it gets much harder, and probably not doable with one of Perl's builtin operators like that.
Ruby answer that works:
class String
def each_char_with_index
0.upto(size - 1) do |index|
yield(self[index..index], index)
end
end
def remove_char_at(index)
return self[1..-1] if index == 0
self[0..(index-1)] + self[(index+1)..-1]
end
end
def permute(str, prefix = '')
if str.size == 0
puts prefix
return
end
str.each_char_with_index do |char, index|
permute(str.remove_char_at(index), prefix + char)
end
end
# example
# permute("abc")
The following Java recursion prints all permutations of a given string:
//call it as permut("",str);
public void permut(String str1,String str2){
if(str2.length() != 0){
char ch = str2.charAt(0);
for(int i = 0; i <= str1.length();i++)
permut(str1.substring(0,i) + ch + str1.substring(i,str1.length()),
str2.substring(1,str2.length()));
}else{
System.out.println(str1);
}
}
Following is the updated version of above "permut" method which makes n! (n factorial) less recursive calls compared to the above method
//call it as permut("",str);
public void permut(String str1,String str2){
if(str2.length() > 1){
char ch = str2.charAt(0);
for(int i = 0; i <= str1.length();i++)
permut(str1.substring(0,i) + ch + str1.substring(i,str1.length()),
str2.substring(1,str2.length()));
}else{
char ch = str2.charAt(0);
for(int i = 0; i <= str1.length();i++)
System.out.println(str1.substring(0,i) + ch + str1.substring(i,str1.length()),
str2.substring(1,str2.length()));
}
}
import java.util.*;
public class all_subsets {
public static void main(String[] args) {
String a = "abcd";
for(String s: all_perm(a)) {
System.out.println(s);
}
}
public static Set<String> concat(String c, Set<String> lst) {
HashSet<String> ret_set = new HashSet<String>();
for(String s: lst) {
ret_set.add(c+s);
}
return ret_set;
}
public static HashSet<String> all_perm(String a) {
HashSet<String> set = new HashSet<String>();
if(a.length() == 1) {
set.add(a);
} else {
for(int i=0; i<a.length(); i++) {
set.addAll(concat(a.charAt(i)+"", all_perm(a.substring(0, i)+a.substring(i+1, a.length()))));
}
}
return set;
}
}
I'm not sure why you would want to do this in the first place. The resulting set for any moderately large values of x and y will be huge, and will grow exponentially as x and/or y get bigger.
Lets say your set of possible characters is the 26 lowercase letters of the alphabet, and you ask your application to generate all permutations where length = 5. Assuming you don't run out of memory you'll get 11,881,376 (i.e. 26 to the power of 5) strings back. Bump that length up to 6, and you'll get 308,915,776 strings back. These numbers get painfully large, very quickly.
Here's a solution I put together in Java. You'll need to provide two runtime arguments (corresponding to x and y). Have fun.
public class GeneratePermutations {
public static void main(String[] args) {
int lower = Integer.parseInt(args[0]);
int upper = Integer.parseInt(args[1]);
if (upper < lower || upper == 0 || lower == 0) {
System.exit(0);
}
for (int length = lower; length <= upper; length++) {
generate(length, "");
}
}
private static void generate(int length, String partial) {
if (length <= 0) {
System.out.println(partial);
} else {
for (char c = 'a'; c <= 'z'; c++) {
generate(length - 1, partial + c);
}
}
}
}
Here's a non-recursive version I came up with, in javascript.
It's not based on Knuth's non-recursive one above, although it has some similarities in element swapping.
I've verified its correctness for input arrays of up to 8 elements.
A quick optimization would be pre-flighting the out array and avoiding push().
The basic idea is:
Given a single source array, generate a first new set of arrays which swap the first element with each subsequent element in turn, each time leaving the other elements unperturbed.
eg: given 1234, generate 1234, 2134, 3214, 4231.
Use each array from the previous pass as the seed for a new pass,
but instead of swapping the first element, swap the second element with each subsequent element. Also, this time, don't include the original array in the output.
Repeat step 2 until done.
Here is the code sample:
function oxe_perm(src, depth, index)
{
var perm = src.slice(); // duplicates src.
perm = perm.split("");
perm[depth] = src[index];
perm[index] = src[depth];
perm = perm.join("");
return perm;
}
function oxe_permutations(src)
{
out = new Array();
out.push(src);
for (depth = 0; depth < src.length; depth++) {
var numInPreviousPass = out.length;
for (var m = 0; m < numInPreviousPass; ++m) {
for (var n = depth + 1; n < src.length; ++n) {
out.push(oxe_perm(out[m], depth, n));
}
}
}
return out;
}
In ruby:
str = "a"
100_000_000.times {puts str.next!}
It is quite fast, but it is going to take some time =). Of course, you can start at "aaaaaaaa" if the short strings aren't interesting to you.
I might have misinterpreted the actual question though - in one of the posts it sounded as if you just needed a bruteforce library of strings, but in the main question it sounds like you need to permutate a particular string.
Your problem is somewhat similar to this one: http://beust.com/weblog/archives/000491.html (list all integers in which none of the digits repeat themselves, which resulted in a whole lot of languages solving it, with the ocaml guy using permutations, and some java guy using yet another solution).
I needed this today, and although the answers already given pointed me in the right direction, they weren't quite what I wanted.
Here's an implementation using Heap's method. The length of the array must be at least 3 and for practical considerations not be bigger than 10 or so, depending on what you want to do, patience and clock speed.
Before you enter your loop, initialise Perm(1 To N) with the first permutation, Stack(3 To N) with zeroes*, and Level with 2**. At the end of the loop call NextPerm, which will return false when we're done.
* VB will do that for you.
** You can change NextPerm a little to make this unnecessary, but it's clearer like this.
Option Explicit
Function NextPerm(Perm() As Long, Stack() As Long, Level As Long) As Boolean
Dim N As Long
If Level = 2 Then
Swap Perm(1), Perm(2)
Level = 3
Else
While Stack(Level) = Level - 1
Stack(Level) = 0
If Level = UBound(Stack) Then Exit Function
Level = Level + 1
Wend
Stack(Level) = Stack(Level) + 1
If Level And 1 Then N = 1 Else N = Stack(Level)
Swap Perm(N), Perm(Level)
Level = 2
End If
NextPerm = True
End Function
Sub Swap(A As Long, B As Long)
A = A Xor B
B = A Xor B
A = A Xor B
End Sub
'This is just for testing.
Private Sub Form_Paint()
Const Max = 8
Dim A(1 To Max) As Long, I As Long
Dim S(3 To Max) As Long, J As Long
Dim Test As New Collection, T As String
For I = 1 To UBound(A)
A(I) = I
Next
Cls
ScaleLeft = 0
J = 2
Do
If CurrentY + TextHeight("0") > ScaleHeight Then
ScaleLeft = ScaleLeft - TextWidth(" 0 ") * (UBound(A) + 1)
CurrentY = 0
CurrentX = 0
End If
T = vbNullString
For I = 1 To UBound(A)
Print A(I);
T = T & Hex(A(I))
Next
Print
Test.Add Null, T
Loop While NextPerm(A, S, J)
J = 1
For I = 2 To UBound(A)
J = J * I
Next
If J <> Test.Count Then Stop
End Sub
Other methods are described by various authors. Knuth describes two, one gives lexical order, but is complex and slow, the other is known as the method of plain changes. Jie Gao and Dianjun Wang also wrote an interesting paper.
Here is a link that describes how to print permutations of a string.
http://nipun-linuxtips.blogspot.in/2012/11/print-all-permutations-of-characters-in.html
This code in python, when called with allowed_characters set to [0,1] and 4 character max, would generate 2^4 results:
['0000', '0001', '0010', '0011', '0100', '0101', '0110', '0111', '1000', '1001', '1010', '1011', '1100', '1101', '1110', '1111']
def generate_permutations(chars = 4) :
#modify if in need!
allowed_chars = [
'0',
'1',
]
status = []
for tmp in range(chars) :
status.append(0)
last_char = len(allowed_chars)
rows = []
for x in xrange(last_char ** chars) :
rows.append("")
for y in range(chars - 1 , -1, -1) :
key = status[y]
rows[x] = allowed_chars[key] + rows[x]
for pos in range(chars - 1, -1, -1) :
if(status[pos] == last_char - 1) :
status[pos] = 0
else :
status[pos] += 1
break;
return rows
import sys
print generate_permutations()
Hope this is of use to you. Works with any character, not only numbers
Many of the previous answers used backtracking. This is the asymptotically optimal way O(n*n!) of generating permutations after initial sorting
class Permutation {
/* runtime -O(n) for generating nextPermutaion
* and O(n*n!) for generating all n! permutations with increasing sorted array as start
* return true, if there exists next lexicographical sequence
* e.g [a,b,c],3-> true, modifies array to [a,c,b]
* e.g [c,b,a],3-> false, as it is largest lexicographic possible */
public static boolean nextPermutation(char[] seq, int len) {
// 1
if (len <= 1)
return false;// no more perm
// 2: Find last j such that seq[j] <= seq[j+1]. Terminate if no such j exists
int j = len - 2;
while (j >= 0 && seq[j] >= seq[j + 1]) {
--j;
}
if (j == -1)
return false;// no more perm
// 3: Find last l such that seq[j] <= seq[l], then exchange elements j and l
int l = len - 1;
while (seq[j] >= seq[l]) {
--l;
}
swap(seq, j, l);
// 4: Reverse elements j+1 ... count-1:
reverseSubArray(seq, j + 1, len - 1);
// return seq, add store next perm
return true;
}
private static void swap(char[] a, int i, int j) {
char temp = a[i];
a[i] = a[j];
a[j] = temp;
}
private static void reverseSubArray(char[] a, int lo, int hi) {
while (lo < hi) {
swap(a, lo, hi);
++lo;
--hi;
}
}
public static void main(String[] args) {
String str = "abcdefg";
char[] array = str.toCharArray();
Arrays.sort(array);
int cnt=0;
do {
System.out.println(new String(array));
cnt++;
}while(nextPermutation(array, array.length));
System.out.println(cnt);//5040=7!
}
//if we use "bab"-> "abb", "bab", "bba", 3(#permutations)
}
Recursive Approach
func StringPermutations(inputStr string) (permutations []string) {
for i := 0; i < len(inputStr); i++ {
inputStr = inputStr[1:] + inputStr[0:1]
if len(inputStr) <= 2 {
permutations = append(permutations, inputStr)
continue
}
leftPermutations := StringPermutations(inputStr[0 : len(inputStr)-1])
for _, leftPermutation := range leftPermutations {
permutations = append(permutations, leftPermutation+inputStr[len(inputStr)-1:])
}
}
return
}
Though this doesn't answer your question exactly, here's one way to generate every permutation of the letters from a number of strings of the same length: eg, if your words were "coffee", "joomla" and "moodle", you can expect output like "coodle", "joodee", "joffle", etc.
Basically, the number of combinations is the (number of words) to the power of (number of letters per word). So, choose a random number between 0 and the number of combinations - 1, convert that number to base (number of words), then use each digit of that number as the indicator for which word to take the next letter from.
eg: in the above example. 3 words, 6 letters = 729 combinations. Choose a random number: 465. Convert to base 3: 122020. Take the first letter from word 1, 2nd from word 2, 3rd from word 2, 4th from word 0... and you get... "joofle".
If you wanted all the permutations, just loop from 0 to 728. Of course, if you're just choosing one random value, a much simpler less-confusing way would be to loop over the letters. This method lets you avoid recursion, should you want all the permutations, plus it makes you look like you know Maths(tm)!
If the number of combinations is excessive, you can break it up into a series of smaller words and concatenate them at the end.
c# iterative:
public List<string> Permutations(char[] chars)
{
List<string> words = new List<string>();
words.Add(chars[0].ToString());
for (int i = 1; i < chars.Length; ++i)
{
int currLen = words.Count;
for (int j = 0; j < currLen; ++j)
{
var w = words[j];
for (int k = 0; k <= w.Length; ++k)
{
var nstr = w.Insert(k, chars[i].ToString());
if (k == 0)
words[j] = nstr;
else
words.Add(nstr);
}
}
}
return words;
}
def gen( x,y,list): #to generate all strings inserting y at different positions
list = []
list.append( y+x )
for i in range( len(x) ):
list.append( func(x,0,i) + y + func(x,i+1,len(x)-1) )
return list
def func( x,i,j ): #returns x[i..j]
z = ''
for i in range(i,j+1):
z = z+x[i]
return z
def perm( x , length , list ): #perm function
if length == 1 : # base case
list.append( x[len(x)-1] )
return list
else:
lists = perm( x , length-1 ,list )
lists_temp = lists #temporarily storing the list
lists = []
for i in range( len(lists_temp) ) :
list_temp = gen(lists_temp[i],x[length-2],lists)
lists += list_temp
return lists
def permutation(str)
posibilities = []
str.split('').each do |char|
if posibilities.size == 0
posibilities[0] = char.downcase
posibilities[1] = char.upcase
else
posibilities_count = posibilities.length
posibilities = posibilities + posibilities
posibilities_count.times do |i|
posibilities[i] += char.downcase
posibilities[i+posibilities_count] += char.upcase
end
end
end
posibilities
end
Here is my take on a non recursive version

Resources