What is optimal algorithm to make all possible combinations of a string? - string

I find other similar question too complicated.
I think it means if we are given pot then combinations will be
pot
opt
top
pot
pto
pot
so I wrote the following code:
#include<iostream>
#include<string.h>
using namespace std;
int main(){
char s[10];
char temp;
cin>>s;
char t[10];
for(int i=0;i<3;i++)
{
for(int j=i;j<3;j++)
{
strcpy(t,s);
temp=s[i];
s[i]=s[j];
s[j]=temp;
cout<<s<<"\n";
strcpy(s,t);
}
}
Is there a better way ?

This problem is inherently an O(N!) (factorial) complexity problem. The reason is that for each position of each potential word, there will be a decrementing amount of possibilities of characters that can fill the position, An example with 4 letters a, b, c, and d.
-----------------
Positions: | 0 | 1 | 2 | 3 |
-----------------
In position 0, there are 4 possibilities, a, b, c, or d
Lets fill with a
-----------------
String: | a | | | |
-----------------
Now Position 1 has 3 possibilities of fill letters b, c, or d
Lets fill with b
-----------------
String: | a | b | | |
-----------------
Now Position 2 has 2 possibilities of fill letters c, or d
Lets fill with c
-----------------
String: | a | b | c | |
-----------------
Now Position 1 has only 1 possibility for a fill letter: d
-----------------
String: | a | b | c | d |
-----------------
This is only for 1 string, the complexity comes from (in this case) the potential possibilities that can fill a character location for a given output word, thus:
4 * 3 * 2 * 1 = 4!
This can be extended to any amount of input letters and is exactly N! if there are no repeat letters. This also represents the AMOUNT OF WORDS you should result with.
Code to perform something like this could be (TESTED AND WORKING IN C):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define TRUE 1
#define FALSE 0
void printPermutations(int level, const char * inString, char * outString){
unsigned int len = strlen(inString);
char * unusedLetter;
int j;
if( 1 == len ){
printf("%s%s\n", outString, inString);
}else{
unusedLetters = (char *)malloc(sizeof(char) * (len - 1));
for(int startLetter = 0; startLetter < len; startLetter++){
outString[level] = inString[startLetter];
// setup the "rest of string" string
j = 0;
for(int i = 0; i < len; i++){
if( i != startLetter ){
unusedLetter[j] = inString[i];
j++;
}
}
// recursive call to THIS routine
printPermutations(level+1, unusedLetters, outString);
}
}
}
int main(int argc, char * argv[]){
unsigned int len;
char * outString;
if(argc != 2) return 0;
len = strlen(argv[1]);
outString = (char *)malloc(sizeof(char) * (len + 1));
outstring[len] = '\0';
printPermutations(0, argv[1], outString);
return 0;
}
From outside, call this as follows:
projectName abc
sample output from using "abc"
abc
acb
bac
bca
cab
cba
If there are repeat letters lets say a, a, b, c
then there will ALWAYS be repeat words.
With these cases, the amount of UNIQUE result words should be the amount of unique characters factorial, so for the above case it would be 3! not 4!.
The reason for this is that it does not matter WHICH of the a's fills a given spot and thus the uniqueness is given be the amount of unique letters provided. This is also a hard problem, and in ways I would say you should generate ALL N! words first, then run a second algorithm to search for the repeat words and delete. There may be smarter ways of generating the unique words on the fly.

The following solution is O(N!).This takes repetitions into account too :
#include<stdio.h>
void permute(char s[10],char *p);
int count=0;
main(){
char s[10];
int i;
scanf("%s",s);
permute(s,s);
}
//takes into account repetetion
void permute(char s[10],char *p){
char *swap,temp;
if(*(p+1)==0) {
count++;
printf("%4d] %s\n",count,s);
}
else{
for(swap=p;*swap;++swap){
char *same;
for(same=p;*same!=*swap;++same){};
if(same==swap){
temp=*swap;
*swap=*p;
*p=temp;
permute(s,p+1);
*p=*swap;/*restoring the original string*/
*swap=temp;
}
}
}
}

Related

How to calculate the number of neighbors of a string with exact and at most d mismatches?

Given a string, and a set of four alphabets (A, B, C, D) for generating strings of length n. I need a generalized mathematical formula to calculate the number of neighbors for any string of length n with at most d mismatches, and the number of neighbors with exactly d mismatches.
For example: Given a string=”AAA” and d=3
We have 9 Strings with exactly d=1
BAA
CAA
DAA
ABA
ACA
ADA
AAB
AAC
AAD
We have 27 Strings with exactly d=2
BBA BCA BDA
BAB BAC BAD
CBA CCA CDA
CAB CAC CAD
DBA DCA DDA
DAB DAC DAD
ABB ABC ABD
ACB ACC ACD
ADB ADC ADD
We have 27 Strings with exactly d=3
BBB CBB DBB
BCB CCB DCB
BDB CDB DDB
BBC CBC DBC
BCC CCC DCC
BDC CDC DDC
BBD CBD DBD
BCD CCD DCD
BDD CDD DDD
Number of Strings with at most d=3 are 9+27+27=63 strings
Let's consider a string of size n.
We want to know how many 'neighbors' this string has, with a distance d. The first thing we remark, with your definition of 'distance', is that it means that we must choose d characters among the n of the string and modify them. So there are n choose d possible combinations of charactersto modify.
Each of these can be modified in 3 different manners (since the size of the alphabet is 4.
So ultimately, we have:
n choose d possible combinations of characters that will be modified
d characters will be modified, and each of them can be modified in 3different manners.
So the formula is ultimately (s - 1) ^ d * (n choose d), where s is the size of the alphabet (here 4). I let you verify that it fits the first examples you provided.
If you want to try it out:
#include <iostream>
#include <string>
using namespace std;
int n = 3; int d = 2;
string s = "AAA";
int counter(string curr, int index, int currd){
if(currd == 0 || index == n){
cout<<curr<<s.substr(index, n - index)<<endl;
return 1;
}
int ans = 0;
for(char c = 'A'; c < 'E'; c++){
if(c != s[index]){
ans += counter(curr + c, index + 1, currd - 1);
}
else{
ans += counter(curr + c, index + 1, currd);
}
}
return ans;
}
int main(){
cout<<"answer = "<<counter("", 0, d) - 1;
}

Replacing and deleting a character from a string in c++?

This program is giving wrong output,, basically i want to remove the character specified and replace it by 'g'...For e.g: All that glitters is not gold if the user entered o then the output should be All that glitters is ngt ggld but the program is deleting all the characters from n onwards
#include <iostream>
using namespace std;
int main()
{
string input(" ALL GLItters are not gold");
char a;
cin>>a;
for(int i=0;i<input.size();i++)
{
if(input.at(i)==a)
{
input.erase(i,i+1);
input.insert(i,"g");
}
}
cout<<"\n";
cout<<input;
}
string& erase (size_t pos = 0, size_t len = npos);
The second parameter ( len ) is the Number of characters to erase.
You have to put 1 not i+1 :
input.erase(i,1);
http://www.cplusplus.com/reference/string/string/erase/
Why not replace it directly? Replace your for loop with this:
for (char& c : input)
{
if (c == a)
c = 'g';
}
Live example here.

Remove occurrences of substring recursively

Here's a problem:
Given string A and a substring B, remove the first occurence of substring B in string A till it is possible to do so. Note that removing a substring, can further create a new same substring. Ex. removing 'hell' from 'hehelllloworld' once would yield 'helloworld' which after removing once more would become 'oworld', the desired string.
Write a program for the above for input constraints of length 10^6 for A, and length 100 for B.
This question was asked to me in an interview, I gave them a simple algorithm to solve it that was to do exactly what the statement was and remove it iteratievly(to decresae over head calls), I later came to know there's a better solution for it that's much faster what would it be ? I've thought of a few optimizations but it's still not as fast as the fastest soln for the problem(acc. the company), so can anyone tell me of a faster way to solve the problem ?
P.S> I know of stackoverflow rules and that having code is better, but for this problem, I don't think that having code would be in any way beneficial...
Your approach has a pretty bad complexity. In a very bad case the string a will be aaaaaaaaabbbbbbbbb, and the string b will be ab, in which case you will need O(|a|) searches, each taking O(|a| + |b|) (assuming using some sophisticated search algorithm), resulting in a total complexity of O(|a|^2 + |a| * |b|), which with their constraints is years.
For their constraints a good complexity to aim for would be O(|a| * |b|), which is around 100 million operations, will finish in subsecond. Here's one way to approach it. For each position i in the string a let's compute the largest length n_i, such that the a[i - n_i : i] = b[0 : n_i] (in other words, the longest suffix of a at that position which is a prefix of b). We can compute it in O(|a| + |b|) by using Knuth-Morris-Pratt algorithm.
After we have n_i computed, finding the first occurrence of b in a is just a matter of finding the first n_i that is equal to |b|. This will be the right end of one of the occurrences of b in a.
Finally, we will need to modify Knuth-Morris-Pratt slightly. We will be logically removing occurrences of b as soon as we compute an n_i that is equal to |b|. To account for the fact that some letters were removed from a we will rely on the fact that Knuth-Morris-Pratt only relies on the last value of n_i (and those computed for b), and the current letter of a, so we just need a fast way of retrieving the last value of n_i after we logically remove an occurrence of b. That can be done with a deque, that stores all the valid values of n_i. Each value will be pushed into the deque once, and popped from it once, so that complexity of maintaining it is O(|a|), while the complexity of the Knuth-Morris-Pratt is O(|a| + |b|), resulting in O(|a| + |b|) total complexity.
Here's a C++ implementation. It could have some off-by-one errors, but it works on your sample, and it flies for the worst case that I described at the beginning.
#include <deque>
#include <string>
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
int main() {
string a, b;
cin >> a >> b;
size_t blen = b.size();
// make a = b$a
a = b + "$" + a;
vector<size_t> n(a.size()); // array for knuth-morris-pratt
vector<bool> removals(a.size()); // positions of right ends at which we remove `b`s
deque<size_t> lastN;
n[0] = 0;
// For the first blen + 1 iterations just do vanilla knuth-morris-pratt
for (size_t i = 1; i < blen + 1; ++ i) {
size_t z = n[i - 1];
while (z && a[i] != a[z]) {
z = n[z - 1];
}
if (a[i] != a[z]) n[i] = 0;
else n[i] = z + 1;
lastN.push_back(n[i]);
}
// For the remaining iterations some characters could have been logically
// removed from `a`, so use lastN to get last value of n instaed
// of actually getting it from `n[i - 1]`
for (size_t i = blen + 1; i < a.size(); ++ i) {
size_t z = lastN.back();
while (z && a[i] != a[z]) {
z = n[z - 1];
}
if (a[i] != a[z]) n[i] = 0;
else n[i] = z + 1;
if (n[i] == blen) // found a match
{
removals[i] = true;
// kill last |b| - 1 `n_i`s
for (size_t j = 0; j < blen - 1; ++ j) {
lastN.pop_back();
}
}
else {
lastN.push_back(n[i]);
}
}
string ret;
size_t toRemove = 0;
for (size_t pos = a.size() - 1; a[pos] != '$'; -- pos) {
if (removals[pos]) toRemove += blen;
if (toRemove) -- toRemove;
else ret.push_back(a[pos]);
}
reverse(ret.begin(), ret.end());
cout << ret << endl;
return 0;
}
[in] hehelllloworld
[in] hell
[out] oworld
[in] abababc
[in] ababc
[out] ab
[in] caaaaa ... aaaaaabbbbbb ... bbbbc
[in] ab
[out] cc

CodeJam 2014: How to solve task "New Lottery Game"?

I want to know efficient approach for the New Lottery Game problem.
The Lottery is changing! The Lottery used to have a machine to generate a random winning number. But due to cheating problems, the Lottery has decided to add another machine. The new winning number will be the result of the bitwise-AND operation between the two random numbers generated by the two machines.
To find the bitwise-AND of X and Y, write them both in binary; then a bit in the result in binary has a 1 if the corresponding bits of X and Y were both 1, and a 0 otherwise. In most programming languages, the bitwise-AND of X and Y is written X&Y.
For example:
The old machine generates the number 7 = 0111.
The new machine generates the number 11 = 1011.
The winning number will be (7 AND 11) = (0111 AND 1011) = 0011 = 3.
With this measure, the Lottery expects to reduce the cases of fraudulent claims, but unfortunately an employee from the Lottery company has leaked the following information: the old machine will always generate a non-negative integer less than A and the new one will always generate a non-negative integer less than B.
Catalina wants to win this lottery and to give it a try she decided to buy all non-negative integers less than K.
Given A, B and K, Catalina would like to know in how many different ways the machines can generate a pair of numbers that will make her a winner.
For small input we can check all possible pairs but how to do it with large inputs. I guess we represent the binary number into string first and then check permutations which would give answer less than K. But I can't seem to figure out how to calculate possible permutations of 2 binary strings.
I used a general DP technique that I described in a lot of detail in another answer.
We want to count the pairs (a, b) such that a < A, b < B and a & b < K.
The first step is to convert the numbers to binary and to pad them to the same size by adding leading zeroes. I just padded them to a fixed size of 40. The idea is to build up the valid a and b bit by bit.
Let f(i, loA, loB, loK) be the number of valid suffix pairs of a and b of size 40 - i. If loA is true, it means that the prefix up to i is already strictly smaller than the corresponding prefix of A. In that case there is no restriction on the next possible bit for a. If loA ist false, A[i] is an upper bound on the next bit we can place at the end of the current prefix. loB and loK have an analogous meaning.
Now we have the following transition:
long long f(int i, bool loA, bool loB, bool loK) {
// TODO add memoization
if (i == 40)
return loA && loB && loK;
int hiA = loA ? 1: A[i]-'0'; // upper bound on the next bit in a
int hiB = loB ? 1: B[i]-'0'; // upper bound on the next bit in b
int hiK = loK ? 1: K[i]-'0'; // upper bound on the next bit in a & b
long long res = 0;
for (int a = 0; a <= hiA; ++a)
for (int b = 0; b <= hiB; ++b) {
int k = a & b;
if (k > hiK) continue;
res += f(i+1, loA || a < A[i]-'0',
loB || b < B[i]-'0',
loK || k < K[i]-'0');
}
return res;
}
The result is f(0, false, false, false).
The runtime is O(max(log A, log B)) if memoization is added to ensure that every subproblem is only solved once.
What I did was just to identify when the answer is A * B.
Otherwise, just brute force the rest, this code passed the large input.
// for each test cases
long count = 0;
if ((K > A) || (K > B)) {
count = A * B;
continue; // print count and go to the next test case
}
count = A * B - (A-K) * (B-K);
for (int i = K; i < A; i++) {
for (int j = K; j < B; j++) {
if ((i&j) < K) count++;
}
}
I hope this helps!
just as Niklas B. said.
the whole answer is.
#include <algorithm>
#include <cstring>
#include <iomanip>
#include <iostream>
#include <iterator>
#include <map>
#include <sstream>
#include <string>
#include <vector>
using namespace std;
#define MAX_SIZE 32
int A, B, K;
int arr_a[MAX_SIZE];
int arr_b[MAX_SIZE];
int arr_k[MAX_SIZE];
bool flag [MAX_SIZE][2][2][2];
long long matrix[MAX_SIZE][2][2][2];
long long
get_result();
int main(int argc, char *argv[])
{
int case_amount = 0;
cin >> case_amount;
for (int i = 0; i < case_amount; ++i)
{
const long long result = get_result();
cout << "Case #" << 1 + i << ": " << result << endl;
}
return 0;
}
long long
dp(const int h,
const bool can_A_choose_1,
const bool can_B_choose_1,
const bool can_K_choose_1)
{
if (MAX_SIZE == h)
return can_A_choose_1 && can_B_choose_1 && can_K_choose_1;
if (flag[h][can_A_choose_1][can_B_choose_1][can_K_choose_1])
return matrix[h][can_A_choose_1][can_B_choose_1][can_K_choose_1];
int cnt_A_max = arr_a[h];
int cnt_B_max = arr_b[h];
int cnt_K_max = arr_k[h];
if (can_A_choose_1)
cnt_A_max = 1;
if (can_B_choose_1)
cnt_B_max = 1;
if (can_K_choose_1)
cnt_K_max = 1;
long long res = 0;
for (int i = 0; i <= cnt_A_max; ++i)
{
for (int j = 0; j <= cnt_B_max; ++j)
{
int k = i & j;
if (k > cnt_K_max)
continue;
res += dp(h + 1,
can_A_choose_1 || (i < cnt_A_max),
can_B_choose_1 || (j < cnt_B_max),
can_K_choose_1 || (k < cnt_K_max));
}
}
flag[h][can_A_choose_1][can_B_choose_1][can_K_choose_1] = true;
matrix[h][can_A_choose_1][can_B_choose_1][can_K_choose_1] = res;
return res;
}
long long
get_result()
{
cin >> A >> B >> K;
memset(arr_a, 0, sizeof(arr_a));
memset(arr_b, 0, sizeof(arr_b));
memset(arr_k, 0, sizeof(arr_k));
memset(flag, 0, sizeof(flag));
memset(matrix, 0, sizeof(matrix));
int i = 31;
while (i >= 1)
{
arr_a[i] = A % 2;
A /= 2;
arr_b[i] = B % 2;
B /= 2;
arr_k[i] = K % 2;
K /= 2;
i--;
}
return dp(1, 0, 0, 0);
}

Scrabble tile checking

For tile checking in scrabble, you make four 5x5 grids of letters totalling 100 tiles. I would like to make one where all 40 horizontal and vertical words are valid. The set of available tiles contains:
12 x E
9 x A, I
8 x O
6 x N, R, T
4 x D, L, S, U
3 x G
2 x B, C, F, H, M, P, V, W, Y, blank tile (wildcard)
1 x K, J, Q, X, Z
The dictionary of valid words is available here (700KB). There are about 12,000 valid 5 letter words.
Here's an example where all 20 horizontal words are valid:
Z O W I E|P I N O T
Y O G I N|O C t A D <= blank being used as 't'
X E B E C|N A L E D
W A I T E|M E R L E
V I N E R|L U T E A
---------+---------
U S N E A|K N O S P
T A V E R|J O L E D
S O F T A|I A M B I
R I D G Y|H A I T h <= blank being used as 'h'
Q U R S H|G R O U F
I'd like to create one where all the vertical ones are also valid. Can you help me solve this? It is not homework. It is a question a friend asked me for help with.
Final Edit: Solved! Here is a solution.
GNAWN|jOULE
RACHE|EUROS
IDIOT|STEAN
PINOT|TRAvE
TRIPY|SOLES
-----+-----
HOWFF|ZEBRA
AGILE|EQUID
CIVIL|BUXOM
EVENT|RIOJA
KEDGY|ADMAN
Here's a photo of it constructed with my scrabble set. http://twitpic.com/3wn7iu
This one was easy to find once I had the right approach, so I bet you could find many more this way. See below for methodology.
Construct a prefix tree from the dictionary of 5 letter words for each row and column. Recursively, a given tile placement is valid if it forms valid prefixes for its column and row, and if the tile is available, and if the next tile placement is valid. The base case is that it is valid if there is no tile left to place.
It probably makes sense to just find all valid 5x5 boards, like Glenn said, and see if any four of them can be combined. Recursing to a depth of 100 doesn't sound like fun.
Edit: Here is version 2 of my code for this.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
typedef union node node;
union node {
node* child[26];
char string[6];
};
typedef struct snap snap;
struct snap {
node* rows[5];
node* cols[5];
char tiles[27];
snap* next;
};
node* root;
node* vtrie[5];
node* htrie[5];
snap* head;
char bag[27] = {9,2,2,4,12,2,3,2,9,1,1,4,2,6,8,2,1,6,4,6,4,2,2,1,2,1,2};
const char full_bag[27] = {9,2,2,4,12,2,3,2,9,1,1,4,2,6,8,2,1,6,4,6,4,2,2,1,2,1,2};
const char order[26] = {16,23,9,25,21,22,5,10,1,6,7,12,15,2,24,3,20,13,19,11,8,17,14,0,18,4};
void insert(char* string){
node* place = root;
int i;
for(i=0;i<5;i++){
if(place->child[string[i] - 'A'] == NULL){
int j;
place->child[string[i] - 'A'] = malloc(sizeof(node));
for(j=0;j<26;j++){
place->child[string[i] - 'A']->child[j] = NULL;
}
}
place = place->child[string[i] - 'A'];
}
memcpy(place->string, string, 6);
}
void check_four(){
snap *a, *b, *c, *d;
char two_total[27];
char three_total[27];
int i;
bool match;
a = head;
for(b = a->next; b != NULL; b = b->next){
for(i=0;i<27; i++)
two_total[i] = a->tiles[i] + b->tiles[i];
for(c = b->next; c != NULL; c = c->next){
for(i=0;i<27; i++)
three_total[i] = two_total[i] + c->tiles[i];
for(d = c->next; d != NULL; d = d->next){
match = true;
for(i=0; i<27; i++){
if(three_total[i] + d->tiles[i] != full_bag[i]){
match = false;
break;
}
}
if(match){
printf("\nBoard Found!\n\n");
for(i=0;i<5;i++){
printf("%s\n", a->rows[i]->string);
}
printf("\n");
for(i=0;i<5;i++){
printf("%s\n", b->rows[i]->string);
}
printf("\n");
for(i=0;i<5;i++){
printf("%s\n", c->rows[i]->string);
}
printf("\n");
for(i=0;i<5;i++){
printf("%s\n", d->rows[i]->string);
}
exit(0);
}
}
}
}
}
void snapshot(){
snap* shot = malloc(sizeof(snap));
int i;
for(i=0;i<5;i++){
printf("%s\n", htrie[i]->string);
shot->rows[i] = htrie[i];
shot->cols[i] = vtrie[i];
}
printf("\n");
for(i=0;i<27;i++){
shot->tiles[i] = full_bag[i] - bag[i];
}
bool transpose = false;
snap* place = head;
while(place != NULL && !transpose){
transpose = true;
for(i=0;i<5;i++){
if(shot->rows[i] != place->cols[i]){
transpose = false;
break;
}
}
place = place->next;
}
if(transpose){
free(shot);
}
else {
shot->next = head;
head = shot;
check_four();
}
}
void pick(x, y){
if(y==5){
snapshot();
return;
}
int i, tile,nextx, nexty, nextz;
node* oldv = vtrie[x];
node* oldh = htrie[y];
if(x+1==5){
nexty = y+1;
nextx = 0;
} else {
nextx = x+1;
nexty = y;
}
for(i=0;i<26;i++){
if(vtrie[x]->child[order[i]]!=NULL &&
htrie[y]->child[order[i]]!=NULL &&
(tile = bag[i] ? i : bag[26] ? 26 : -1) + 1) {
vtrie[x] = vtrie[x]->child[order[i]];
htrie[y] = htrie[y]->child[order[i]];
bag[tile]--;
pick(nextx, nexty);
vtrie[x] = oldv;
htrie[y] = oldh;
bag[tile]++;
}
}
}
int main(int argc, char** argv){
root = malloc(sizeof(node));
FILE* wordlist = fopen("sowpods5letters.txt", "r");
head = NULL;
int i;
for(i=0;i<26;i++){
root->child[i] = NULL;
}
for(i=0;i<5;i++){
vtrie[i] = root;
htrie[i] = root;
}
char* string = malloc(sizeof(char)*6);
while(fscanf(wordlist, "%s", string) != EOF){
insert(string);
}
free(string);
fclose(wordlist);
pick(0,0);
return 0;
}
This tries the infrequent letters first, which I'm no longer sure is a good idea. It starts to get bogged down before it makes it out of the boards starting with x. After seeing how many 5x5 blocks there were I altered the code to just list out all the valid 5x5 blocks. I now have a 150 MB text file with all 4,430,974 5x5 solutions.
I also tried it with just recursing through the full 100 tiles, and that is still running.
Edit 2: Here is the list of all the valid 5x5 blocks I generated. http://web.cs.sunyit.edu/~levyt/solutions.rar
Edit 3: Hmm, seems there was a bug in my tile usage tracking, because I just found a block in my output file that uses 5 Zs.
COSTE
ORCIN
SCUZZ
TIZZY
ENZYM
Edit 4: Here is the final product.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
typedef union node node;
union node {
node* child[26];
char string[6];
};
node* root;
node* vtrie[5];
node* htrie[5];
int score;
int max_score;
char block_1[27] = {4,2,0,2, 2,0,0,0,2,1,0,0,2,1,2,0,1,2,0,0,2,0,0,1,0,1,0};//ZEBRA EQUID BUXOM RIOJA ADMAN
char block_2[27] = {1,0,1,1, 4,2,2,1,3,0,1,2,0,1,1,0,0,0,0,1,0,2,1,0,1,0,0};//HOWFF AGILE CIVIL EVENT KEDGY
char block_3[27] = {2,0,1,1, 1,0,1,1,4,0,0,0,0,3,2,2,0,2,0,3,0,0,1,0,1,0,0};//GNAWN RACHE IDIOT PINOT TRIPY
//JOULE EUROS STEAN TRAVE SOLES
char bag[27] = {9,2,2,4,12,2,3,2,9,1,1,4,2,6,8,2,1,6,4,6,4,2,2,1,2,1,2};
const char full_bag[27] = {9,2,2,4,12,2,3,2,9,1,1,4,2,6,8,2,1,6,4,6,4,2,2,1,2,1,2};
const char order[26] = {16,23,9,25,21,22,5,10,1,6,7,12,15,2,24,3,20,13,19,11,8,17,14,0,18,4};
const int value[27] = {244,862,678,564,226,1309,844,765,363,4656,909,414,691,463,333,687,11998,329,218,423,536,1944,1244,4673,639,3363,0};
void insert(char* string){
node* place = root;
int i;
for(i=0;i<5;i++){
if(place->child[string[i] - 'A'] == NULL){
int j;
place->child[string[i] - 'A'] = malloc(sizeof(node));
for(j=0;j<26;j++){
place->child[string[i] - 'A']->child[j] = NULL;
}
}
place = place->child[string[i] - 'A'];
}
memcpy(place->string, string, 6);
}
void snapshot(){
static int count = 0;
int i;
for(i=0;i<5;i++){
printf("%s\n", htrie[i]->string);
}
for(i=0;i<27;i++){
printf("%c%d ", 'A'+i, bag[i]);
}
printf("\n");
if(++count>=1000){
exit(0);
}
}
void pick(x, y){
if(y==5){
if(score>max_score){
snapshot();
max_score = score;
}
return;
}
int i, tile,nextx, nexty;
node* oldv = vtrie[x];
node* oldh = htrie[y];
if(x+1==5){
nextx = 0;
nexty = y+1;
} else {
nextx = x+1;
nexty = y;
}
for(i=0;i<26;i++){
if(vtrie[x]->child[order[i]]!=NULL &&
htrie[y]->child[order[i]]!=NULL &&
(tile = bag[order[i]] ? order[i] : bag[26] ? 26 : -1) + 1) {
vtrie[x] = vtrie[x]->child[order[i]];
htrie[y] = htrie[y]->child[order[i]];
bag[tile]--;
score+=value[tile];
pick(nextx, nexty);
vtrie[x] = oldv;
htrie[y] = oldh;
bag[tile]++;
score-=value[tile];
}
}
}
int main(int argc, char** argv){
root = malloc(sizeof(node));
FILE* wordlist = fopen("sowpods5letters.txt", "r");
score = 0;
max_score = 0;
int i;
for(i=0;i<26;i++){
root->child[i] = NULL;
}
for(i=0;i<5;i++){
vtrie[i] = root;
htrie[i] = root;
}
for(i=0;i<27;i++){
bag[i] = bag[i] - block_1[i];
bag[i] = bag[i] - block_2[i];
bag[i] = bag[i] - block_3[i];
printf("%c%d ", 'A'+i, bag[i]);
}
char* string = malloc(sizeof(char)*6);
while(fscanf(wordlist, "%s", string) != EOF){
insert(string);
}
free(string);
fclose(wordlist);
pick(0,0);
return 0;
}
After finding out how many blocks there were (nearly 2 billion and still counting), I switched to trying to find certain types of blocks, in particular the difficult to construct ones using uncommon letters. My hope was that if I ended up with a benign enough set of letters going in to the last block, the vast space of valid blocks would probably have one for that set of letters.
I assigned each tile a value inversely proportional to the number of 5 letter words it appears in. Then, when I found a valid block I would sum up the tile values, and if the score was the best I had yet seen, I would print out the block.
For the first block I removed the blank tiles, figuring that the last block would need that flexibility the most. After letting it run until I had not seen a better block appear for some time, I selected the best block, and removed the tiles in it from the bag, and ran the program again, getting the second block. I repeated this for the 3rd block. Then for the last block I added the blanks back in and used the first valid block it found.
Here's how I would try this. First construct a prefix tree.
Pick a word and place it horizontally on top. Pick a word and place it vertically. Alternate them until exhausted options. By alternating you start to fix the first letters and eliminating lots of mismatching words. If you really do find such square, then do a check whether they can be made with those pieces.
For 5x5 squares: after doing some thinking it can't be worse than O(12000!/11990!) for random text words. But thinking about it a little bit more. Every time you fix a letter (in normal text) you eliminate about 90% (an optimistic guess) of your words. This means after three iterations you've got 12 words. So the actual speed would be
O(n * n/10 * n/10 * n/100 * n/100 * n/1000 * n/1000 ...
which for 12000 elements acts something like n^4 algorithm
which isn't that bad.
Probably someone can do a better analysis of the problem. But the search for words should still converge quite quickly.
There can be more eliminating done by abusing the infrequent letters. Essentially find all words that have infrequent letters. Try to make a matching positions for each letters. Construct a set of valid letters for each position.
For example, let's say we have four words with letter Q in it.
AQFED, ZQABE, EDQDE, ELQUO
this means there are two valid positionings of those:
xZxxx
AQFED
xAxxx ---> this limits our search for words that contain [ABDEFZ] as the second letter
xBxxx
xExxx
same for the other
EDQDE ---> this limits our search for words that contain [EDLU] as the third letter
ELQUO
all appropriate words are in union of those two conditions
So basically, if we have multiple words that contain infrequent letter X in word S at position N, means that other words that are in that matrix must have letter that is also in S in position n.
Formula:
Find all words that contain infrequent letter X at position 1 (next iteration 2, 3... )
Make a set A out of the letters in those words
Keep only those words from the dictionary that have letter from set A in position 1
Try to fit those into the matrix (with the first method)
Repeat with position 2
I would approach the problem (naively, to be sure) by taking a pessimistic view. I'd try to prove there was no 5x5 solution, and therefore certainly not four 5x5 solutions. To prove there was no 5x5 solution I'd try to construct one from all possibilities. If my conjecture failed and I was able to construct a 5x5 solution, well, then, I'd have a way to construct 5x5 solutions and I would try to construct all of the (independent) 5x5 solutions. If there were at least 4, then I would determine if some combination satisfied the letter count restrictions.
[Edit] Null Set has determined that there are "4,430,974 5x5 solutions". Are these valid?
I mean that we have a limitation on the number of letters we can use. This limitation can be expressed as a boundary vector BV = [9, 2, 2, 4, ...] corresponding to the limits on A, B, C, etc. (You see this vector in Null Set's code). A 5x5 solution is valid if each term of its letter count vector is less than the corresponding term in BV. It would be easy to check if a 5x5 solution is valid as it was created. Perhaps the 4,430,974 number can be reduced, say to N.
Regardless, we can state the problem as: find four letter count vectors among the N whose sum is equal to BV. There are (N, 4) possible sums ("N choose 4"). With N equal to 4 million this is still on the order of 10^25---not an encouraging number. Perhaps you could search for four whose first terms sum to 9, and if so checking that their second terms sum to 2, etc.
I'd remark that after choosing 4 from N the computations are independent, so if you have a multi-core machine you can make this go faster with a parallel solution.
[Edit2] Parallelizing probably wouldn't make much difference, though. At this point I might take an optimistic view: there are certainly more 5x5 solutions than I expected, so there may be more final solutions than expected, too. Perhaps you might not have to get far into the 10^25 to hit one.
I'm starting with something simpler.
Here are some results so far:
3736 2x2 solutions
8812672 3x3 solutions
The 1000th 4x4 solution is
A A H S
A C A I
L A I R
S I R E
The 1000th 5x5 solution is
A A H E D
A B U N A
H U R S T
E N S U E
D A T E D
The 1000th 2x4x4 solution is
A A H S | A A H S
A B A C | A B A C
H A I R | L E K U
S C R Y | S T E D
--------+--------
D E E D | D E E M
E I N E | I N T I
E N O L | O V E R
T E L T | L Y N E
Note that transposing an 'A' and a blank that is being used as an 'A' should be considered the same solution. But transposing the rows with the columns should be considered a different solution. I hope that makes sense.
Here are a lot of precomputed 5x5's. Left as an exercise to the reader to find 4 compatible ones :-)
http://www.gtoal.com/wordgames/wordsquare/all5

Resources