Difference between values and how the code reads the values in CS50 - cs50

Here's my main code:
// Update vote totals given a new vote
bool vote(string name, int candidatecount, candidate candidates1[MAX])
{
// TODO
for (int a = 0; a < candidatecount; a++)
{
if (candidatecount == a)
{
printf("a\n");
return false;
}
if (name == candidates1[a].names)
{
printf("b\n");
candidates[a].votes = candidates[a].votes + 1;
}
printf("%s, %s\n", candidates1[a].names, name);
}
printf("1\n");
return(name);
}
This was my command line:
./plurality R D
The output (with my responses) was this:
Number of voters: 3
Vote: R
R, R
D, R
1
Vote: D
R, D
D, D
1
Vote: R
R, R
D, R
1
vote candidate 0 go around 0
b 0 e (null)
vote candidate 0 go around 1
b 0 e (null)
(null) won with 0 votes.
It makes it seem like this:
if (name == candidates1[a].names)
should work, but it doesn't, because the printf I have after it doesn't work.
Does anyone know why?

Review the Week 3 video, starting around the 40 minute mark. The prof says
It turns out that in C, you can't use equals equals to compare two strings.
He introduces strcmp at that time, and will give further explanation about why in the next lesson. The simple explanation is that strings are pointers and == will compare the addresses of the operands, not the values.

Related

string split into all possible combination

for a given string "ABC", i wish to get all the possible character combination out of it ascendingly and without skip of character, the result should be:["A","B","C"],["AB","C"],["ABC"],["A","BC"]
Any idea how can i achieve this? I was thinking using a nested for loop to get all the component:
string input="ABCD";
List<string> component=new List<string>();
for(int i=0;i<=input.Length;i++){
for(int j=1;j<=(input.Length-i);j++){
component.Add(input.Substring(i,j));
}
}
But i have no idea how to put them into group as the above result. Any advice is appreciated.
You can go about this in several ways.
One way is recursion. Keep a current list of substrings and an overall results list. At the top level, iterate over all the possible gaps. Split the string into a substring and the rest. This should include the "gap" at the end, where you split the string into itself and the empty string as rest. Add the (non-empty) substring to the current list and recurse on the rest of the string. When the rest of the string is empty, add the current list to the overall results list. This will give you all 2ⁿ possibilities for a string with n + 1 letters.
Pseudocode:
// recursive function
function splits_r(str, current, res)
{
if (str.length == 0) {
res += [current]
} else {
for (i = 0; i < str.length; i++) {
splits_r(str.substr(i + 1, end),
current + [str.substr(0, i + 1)], res)
}
}
}
// wrapper to get the recursion going
function splits(str)
{
res = [];
splits_r(str, [], res);
return res;
}
Another way is enumeration of all possibilities. There are 2ⁿ possibilities for a string with n + 1 letters. You can consider one individual posibility as a combination of splits and non-splits. For example:
enum splits result
0 0 0 A B C D "ABCD"
0 0 1 A B C | D "ABC", "D"
0 1 0 A B | C D "AB", "CD"
0 1 1 A B | C | D "AB", "C", "D"
1 0 0 A | B C D "A", "BCD"
1 0 1 A | B C | D "A", "BC", "D"
1 1 0 A | B | C D "A", "B", "CD"
1 1 1 A | B | C | D "A", "B", "C", "D"
The enumeration uses 0 for no split and 1 for a split. It can be seen as a binary number. If you are familiar with bitwise operations, you can now enumerate all values from 0 to 2ⁿ and find out where the splits are.
Pseudocode:
function splits(str)
{
let m = str.length - 1; // possible gap positions
let n = (1 << m); // == pow(2, m)
let res = []
for (i = 0; i < n; i++) {
let last = 0
let current = []
for (j = 0; j < m; j++) { // loop over all gaps
if (i & (1 << j)) { // test for split
current.append(str.substr(last, j + 1));
last = j + 1;
}
}
current.append(s[last:])
res.append(current);
return res;
}

Maximum element in array which is equal to product of two elements in array

We need to find the maximum element in an array which is also equal to product of two elements in the same array. For example [2,3,6,8] , here 6=2*3 so answer is 6.
My approach was to sort the array and followed by a two pointer method which checked whether the product exist for each element. This is o(nlog(n)) + O(n^2) = O(n^2) approach. Is there a faster way to this ?
There is a slight better solution with O(n * sqrt(n)) if you are allowed to use O(M) memory M = max number in A[i]
Use an array of size M to mark every number while you traverse them from smaller to bigger number.
For each number try all its factors and see if those were already present in the array map.
Here is a pseudo code for that:
#define M 1000000
int array_map[M+2];
int ans = -1;
sort(A,A+n);
for(i=0;i<n;i++) {
for(j=1;j<=sqrt(A[i]);j++) {
int num1 = j;
if(A[i]%num1==0) {
int num2 = A[i]/num1;
if(array_map[num1] && array_map[num2]) {
if(num1==num2) {
if(array_map[num1]>=2) ans = A[i];
} else {
ans = A[i];
}
}
}
}
array_map[A[i]]++;
}
There is an ever better approach if you know how to find all possible factors in log(M) this just becomes O(n*logM). You have to use sieve and backtracking for that
#JerryGoyal 's solution is correct. However, I think it can be optimized even further if instead of using B pointer, we use binary search to find the other factor of product if arr[c] is divisible by arr[a]. Here's the modification for his code:
for(c=n-1;(c>1)&& (max==-1);c--){ // loop through C
for(a=0;(a<c-1)&&(max==-1);a++){ // loop through A
if(arr[c]%arr[a]==0) // If arr[c] is divisible by arr[a]
{
if(binary_search(a+1, c-1, (arr[c]/arr[a]))) //#include<algorithm>
{
max = arr[c]; // if the other factor x of arr[c] is also in the array such that arr[c] = arr[a] * x
break;
}
}
}
}
I would have commented this on his solution, unfortunately I lack the reputation to do so.
Try this.
Written in c++
#include <vector>
#include <algorithm>
using namespace std;
int MaxElement(vector< int > Input)
{
sort(Input.begin(), Input.end());
int LargestElementOfInput = 0;
int i = 0;
while (i < Input.size() - 1)
{
if (LargestElementOfInput == Input[Input.size() - (i + 1)])
{
i++;
continue;
}
else
{
if (Input[i] != 0)
{
LargestElementOfInput = Input[Input.size() - (i + 1)];
int AllowedValue = LargestElementOfInput / Input[i];
int j = 0;
while (j < Input.size())
{
if (Input[j] > AllowedValue)
break;
else if (j == i)
{
j++;
continue;
}
else
{
int Product = Input[i] * Input[j++];
if (Product == LargestElementOfInput)
return Product;
}
}
}
i++;
}
}
return -1;
}
Once you have sorted the array, then you can use it to your advantage as below.
One improvement I can see - since you want to find the max element that meets the criteria,
Start from the right most element of the array. (8)
Divide that with the first element of the array. (8/2 = 4).
Now continue with the double pointer approach, till the element at second pointer is less than the value from the step 2 above or the match is found. (i.e., till second pointer value is < 4 or match is found).
If the match is found, then you got the max element.
Else, continue the loop with next highest element from the array. (6).
Efficient solution:
2 3 8 6
Sort the array
keep 3 pointers C, B and A.
Keeping C at the last and A at 0 index and B at 1st index.
traverse the array using pointers A and B till C and check if A*B=C exists or not.
If it exists then C is your answer.
Else, Move C a position back and traverse again keeping A at 0 and B at 1st index.
Keep repeating this till you get the sum or C reaches at 1st index.
Here's the complete solution:
int arr[] = new int[]{2, 3, 8, 6};
Arrays.sort(arr);
int n=arr.length;
int a,b,c,prod,max=-1;
for(c=n-1;(c>1)&& (max==-1);c--){ // loop through C
for(a=0;(a<c-1)&&(max==-1);a++){ // loop through A
for(b=a+1;b<c;b++){ // loop through B
prod=arr[a]*arr[b];
if(prod==arr[c]){
System.out.println("A: "+arr[a]+" B: "+arr[b]);
max=arr[c];
break;
}
if(prod>arr[c]){ // no need to go further
break;
}
}
}
}
System.out.println(max);
I came up with below solution where i am using one array list, and following one formula:
divisor(a or b) X quotient(b or a) = dividend(c)
Sort the array.
Put array into Collection Col.(ex. which has faster lookup, and maintains insertion order)
Have 2 pointer a,c.
keep c at last, and a at 0.
try to follow (divisor(a or b) X quotient(b or a) = dividend(c)).
Check if a is divisor of c, if yes then check for b in col.(a
If a is divisor and list has b, then c is the answer.
else increase a by 1, follow step 5, 6 till c-1.
if max not found then decrease c index, and follow the steps 4 and 5.
Check this C# solution:
-Loop through each element,
-loop and multiply each element with other elements,
-verify if the product exists in the array and is the max
private static int GetGreatest(int[] input)
{
int max = 0;
int p = 0; //product of pairs
//loop through the input array
for (int i = 0; i < input.Length; i++)
{
for (int j = i + 1; j < input.Length; j++)
{
p = input[i] * input[j];
if (p > max && Array.IndexOf(input, p) != -1)
{
max = p;
}
}
}
return max;
}
Time complexity O(n^2)

Finding maximum substring that is cyclic equivalent

This is a problem from a programming contest that was held recently.
Two strings a[0..n-1] and b[0..n-1] are called cyclic equivalent if and only if there exists an offset d, such that for all 0 <= i < n, a[i] = b[(i + d) mod n].
Given two strings s[0..L-1] and t[0..L-1] with same length L. You need to find the maximum p such that s[0..p-1] and t[0..p-1] are cyclic equivalent.Print 0 if no such valid p exists.
Input
The first line contains an integer T indicating the number of test cases.
For each test case, there are two lines in total. The first line contains s. The second line contains t.
All strings contain only lower case alphabets.
Output
Output T lines in total. Each line should start with "Case #: " and followed by the maximum p. Here "#" is the number of the test case starting from 1.
Constraints
1 ≤ T ≤ 10
1 ≤ L ≤ 1000000
Example
Input:
2
abab
baba
abab
baac
Output:
Case 1: 4
Case 2: 3
Explanation
Case 1, d can be 1.
Case 2, d can be 2.
My approach :
Generate all substrings of S and T in the from S[0...i], T[0...i] and concatenate S[0...i] with itself and check if T is a substring of S[0...i]+S[0...i]. if it a substring then maximum P = i
bool isCyclic( string s, string t ){
string str = s;
str.append(s);
if( str.find(t) != string::npos )
return true;
return false;
}
int main(){
string s, t;
int t1,l, o=1;
scanf("%d", &t1);
while( t1-- ){
cin>>s>>t;
l = min( s.length(), t.length());
int i, maxP = 0;
for( i=1; i<=l; i++ ){
if( isCyclic(s.substr(0,i), t.substr(0,i)) ){
maxP = i;
}
}
printf("Case %d: %d\n", o++, maxP);
}
return 0;
}
I knew that this not the most optimized approach for this problem since i got Time Limit Exceeded.I came to know that prefix function can be used to get an O(n) algorithm. I dont know about prefix function.Could someone explain the O(n) approach ?
Contest link http://www.codechef.com/ACMKGP14/problems/ACM14KP3

Minimum no. of comparisons to find median of 3 numbers

I was implementing quicksort and I wished to set the pivot to be the median or three numbers. The three numbers being the first element, the middle element, and the last element.
Could I possibly find the median in less no. of comparisons?
median(int a[], int p, int r)
{
int m = (p+r)/2;
if(a[p] < a[m])
{
if(a[p] >= a[r])
return a[p];
else if(a[m] < a[r])
return a[m];
}
else
{
if(a[p] < a[r])
return a[p];
else if(a[m] >= a[r])
return a[m];
}
return a[r];
}
If the concern is only comparisons, then this should be used.
int getMedian(int a, int b , int c) {
int x = a-b;
int y = b-c;
int z = a-c;
if(x*y > 0) return b;
if(x*z > 0) return c;
return a;
}
int32_t FindMedian(const int n1, const int n2, const int n3) {
auto _min = min(n1, min(n2, n3));
auto _max = max(n1, max(n2, n3));
return (n1 + n2 + n3) - _min - _max;
}
You can't do it in one, and you're only using two or three, so I'd say you've got the minimum number of comparisons already.
Rather than just computing the median, you might as well put them in place. Then you can get away with just 3 comparisons all the time, and you've got your pivot closer to being in place.
T median(T a[], int low, int high)
{
int middle = ( low + high ) / 2;
if( a[ middle ].compareTo( a[ low ] ) < 0 )
swap( a, low, middle );
if( a[ high ].compareTo( a[ low ] ) < 0 )
swap( a, low, high );
if( a[ high ].compareTo( a[ middle ] ) < 0 )
swap( a, middle, high );
return a[middle];
}
I know that this is an old thread, but I had to solve exactly this problem on a microcontroller that has very little RAM and does not have a h/w multiplication unit (:)). In the end I found the following works well:
static char medianIndex[] = { 1, 1, 2, 0, 0, 2, 1, 1 };
signed short getMedian(const signed short num[])
{
return num[medianIndex[(num[0] > num[1]) << 2 | (num[1] > num[2]) << 1 | (num[0] > num[2])]];
}
If you're not afraid to get your hands a little dirty with compiler intrinsics you can do it with exactly 0 branches.
The same question was discussed before on:
Fastest way of finding the middle value of a triple?
Though, I have to add that in the context of naive implementation of quicksort, with a lot of elements, reducing the amount of branches when finding the median is not so important because the branch predictor will choke either way when you'll start tossing elements around the the pivot. More sophisticated implementations (which don't branch on the partition operation, and avoid WAW hazards) will benefit from this greatly.
remove max and min value from total sum
int med3(int a, int b, int c)
{
int tot_v = a + b + c ;
int max_v = max(a, max(b, c));
int min_v = min(a, min(b, c));
return tot_v - max_v - min_v
}
There is actually a clever way to isolate the median element from three using a careful analysis of the 6 possible permutations (of low, median, high). In python:
def med(a, start, mid, last):
# put the median of a[start], a[mid], a[last] in the a[start] position
SM = a[start] < a[mid]
SL = a[start] < a[last]
if SM != SL:
return
ML = a[mid] < a[last]
m = mid if SM == ML else last
a[start], a[m] = a[m], a[start]
Half the time you have two comparisons otherwise you have 3 (avg 2.5). And you only swap the median element once when needed (2/3 of the time).
Full python quicksort using this at:
https://github.com/mckoss/labs/blob/master/qs.py
You can write up all the permutations:
1 0 2
1 2 0
0 1 2
2 1 0
0 2 1
2 0 1
Then we want to find the position of the 1. We could do this with two comparisons, if our first comparison could split out a group of equal positions, such as the first two lines.
The issue seems to be that the first two lines are different on any comparison we have available: a<b, a<c, b<c. Hence we have to fully identify the permutation, which requires 3 comparisons in the worst case.
Using a Bitwise XOR operator, the median of three numbers can be found.
def median(a,b,c):
m = max(a,b,c)
n = min(a,b,c)
ans = m^n^a^b^c
return ans

Scrabble tile checking

For tile checking in scrabble, you make four 5x5 grids of letters totalling 100 tiles. I would like to make one where all 40 horizontal and vertical words are valid. The set of available tiles contains:
12 x E
9 x A, I
8 x O
6 x N, R, T
4 x D, L, S, U
3 x G
2 x B, C, F, H, M, P, V, W, Y, blank tile (wildcard)
1 x K, J, Q, X, Z
The dictionary of valid words is available here (700KB). There are about 12,000 valid 5 letter words.
Here's an example where all 20 horizontal words are valid:
Z O W I E|P I N O T
Y O G I N|O C t A D <= blank being used as 't'
X E B E C|N A L E D
W A I T E|M E R L E
V I N E R|L U T E A
---------+---------
U S N E A|K N O S P
T A V E R|J O L E D
S O F T A|I A M B I
R I D G Y|H A I T h <= blank being used as 'h'
Q U R S H|G R O U F
I'd like to create one where all the vertical ones are also valid. Can you help me solve this? It is not homework. It is a question a friend asked me for help with.
Final Edit: Solved! Here is a solution.
GNAWN|jOULE
RACHE|EUROS
IDIOT|STEAN
PINOT|TRAvE
TRIPY|SOLES
-----+-----
HOWFF|ZEBRA
AGILE|EQUID
CIVIL|BUXOM
EVENT|RIOJA
KEDGY|ADMAN
Here's a photo of it constructed with my scrabble set. http://twitpic.com/3wn7iu
This one was easy to find once I had the right approach, so I bet you could find many more this way. See below for methodology.
Construct a prefix tree from the dictionary of 5 letter words for each row and column. Recursively, a given tile placement is valid if it forms valid prefixes for its column and row, and if the tile is available, and if the next tile placement is valid. The base case is that it is valid if there is no tile left to place.
It probably makes sense to just find all valid 5x5 boards, like Glenn said, and see if any four of them can be combined. Recursing to a depth of 100 doesn't sound like fun.
Edit: Here is version 2 of my code for this.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
typedef union node node;
union node {
node* child[26];
char string[6];
};
typedef struct snap snap;
struct snap {
node* rows[5];
node* cols[5];
char tiles[27];
snap* next;
};
node* root;
node* vtrie[5];
node* htrie[5];
snap* head;
char bag[27] = {9,2,2,4,12,2,3,2,9,1,1,4,2,6,8,2,1,6,4,6,4,2,2,1,2,1,2};
const char full_bag[27] = {9,2,2,4,12,2,3,2,9,1,1,4,2,6,8,2,1,6,4,6,4,2,2,1,2,1,2};
const char order[26] = {16,23,9,25,21,22,5,10,1,6,7,12,15,2,24,3,20,13,19,11,8,17,14,0,18,4};
void insert(char* string){
node* place = root;
int i;
for(i=0;i<5;i++){
if(place->child[string[i] - 'A'] == NULL){
int j;
place->child[string[i] - 'A'] = malloc(sizeof(node));
for(j=0;j<26;j++){
place->child[string[i] - 'A']->child[j] = NULL;
}
}
place = place->child[string[i] - 'A'];
}
memcpy(place->string, string, 6);
}
void check_four(){
snap *a, *b, *c, *d;
char two_total[27];
char three_total[27];
int i;
bool match;
a = head;
for(b = a->next; b != NULL; b = b->next){
for(i=0;i<27; i++)
two_total[i] = a->tiles[i] + b->tiles[i];
for(c = b->next; c != NULL; c = c->next){
for(i=0;i<27; i++)
three_total[i] = two_total[i] + c->tiles[i];
for(d = c->next; d != NULL; d = d->next){
match = true;
for(i=0; i<27; i++){
if(three_total[i] + d->tiles[i] != full_bag[i]){
match = false;
break;
}
}
if(match){
printf("\nBoard Found!\n\n");
for(i=0;i<5;i++){
printf("%s\n", a->rows[i]->string);
}
printf("\n");
for(i=0;i<5;i++){
printf("%s\n", b->rows[i]->string);
}
printf("\n");
for(i=0;i<5;i++){
printf("%s\n", c->rows[i]->string);
}
printf("\n");
for(i=0;i<5;i++){
printf("%s\n", d->rows[i]->string);
}
exit(0);
}
}
}
}
}
void snapshot(){
snap* shot = malloc(sizeof(snap));
int i;
for(i=0;i<5;i++){
printf("%s\n", htrie[i]->string);
shot->rows[i] = htrie[i];
shot->cols[i] = vtrie[i];
}
printf("\n");
for(i=0;i<27;i++){
shot->tiles[i] = full_bag[i] - bag[i];
}
bool transpose = false;
snap* place = head;
while(place != NULL && !transpose){
transpose = true;
for(i=0;i<5;i++){
if(shot->rows[i] != place->cols[i]){
transpose = false;
break;
}
}
place = place->next;
}
if(transpose){
free(shot);
}
else {
shot->next = head;
head = shot;
check_four();
}
}
void pick(x, y){
if(y==5){
snapshot();
return;
}
int i, tile,nextx, nexty, nextz;
node* oldv = vtrie[x];
node* oldh = htrie[y];
if(x+1==5){
nexty = y+1;
nextx = 0;
} else {
nextx = x+1;
nexty = y;
}
for(i=0;i<26;i++){
if(vtrie[x]->child[order[i]]!=NULL &&
htrie[y]->child[order[i]]!=NULL &&
(tile = bag[i] ? i : bag[26] ? 26 : -1) + 1) {
vtrie[x] = vtrie[x]->child[order[i]];
htrie[y] = htrie[y]->child[order[i]];
bag[tile]--;
pick(nextx, nexty);
vtrie[x] = oldv;
htrie[y] = oldh;
bag[tile]++;
}
}
}
int main(int argc, char** argv){
root = malloc(sizeof(node));
FILE* wordlist = fopen("sowpods5letters.txt", "r");
head = NULL;
int i;
for(i=0;i<26;i++){
root->child[i] = NULL;
}
for(i=0;i<5;i++){
vtrie[i] = root;
htrie[i] = root;
}
char* string = malloc(sizeof(char)*6);
while(fscanf(wordlist, "%s", string) != EOF){
insert(string);
}
free(string);
fclose(wordlist);
pick(0,0);
return 0;
}
This tries the infrequent letters first, which I'm no longer sure is a good idea. It starts to get bogged down before it makes it out of the boards starting with x. After seeing how many 5x5 blocks there were I altered the code to just list out all the valid 5x5 blocks. I now have a 150 MB text file with all 4,430,974 5x5 solutions.
I also tried it with just recursing through the full 100 tiles, and that is still running.
Edit 2: Here is the list of all the valid 5x5 blocks I generated. http://web.cs.sunyit.edu/~levyt/solutions.rar
Edit 3: Hmm, seems there was a bug in my tile usage tracking, because I just found a block in my output file that uses 5 Zs.
COSTE
ORCIN
SCUZZ
TIZZY
ENZYM
Edit 4: Here is the final product.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
typedef union node node;
union node {
node* child[26];
char string[6];
};
node* root;
node* vtrie[5];
node* htrie[5];
int score;
int max_score;
char block_1[27] = {4,2,0,2, 2,0,0,0,2,1,0,0,2,1,2,0,1,2,0,0,2,0,0,1,0,1,0};//ZEBRA EQUID BUXOM RIOJA ADMAN
char block_2[27] = {1,0,1,1, 4,2,2,1,3,0,1,2,0,1,1,0,0,0,0,1,0,2,1,0,1,0,0};//HOWFF AGILE CIVIL EVENT KEDGY
char block_3[27] = {2,0,1,1, 1,0,1,1,4,0,0,0,0,3,2,2,0,2,0,3,0,0,1,0,1,0,0};//GNAWN RACHE IDIOT PINOT TRIPY
//JOULE EUROS STEAN TRAVE SOLES
char bag[27] = {9,2,2,4,12,2,3,2,9,1,1,4,2,6,8,2,1,6,4,6,4,2,2,1,2,1,2};
const char full_bag[27] = {9,2,2,4,12,2,3,2,9,1,1,4,2,6,8,2,1,6,4,6,4,2,2,1,2,1,2};
const char order[26] = {16,23,9,25,21,22,5,10,1,6,7,12,15,2,24,3,20,13,19,11,8,17,14,0,18,4};
const int value[27] = {244,862,678,564,226,1309,844,765,363,4656,909,414,691,463,333,687,11998,329,218,423,536,1944,1244,4673,639,3363,0};
void insert(char* string){
node* place = root;
int i;
for(i=0;i<5;i++){
if(place->child[string[i] - 'A'] == NULL){
int j;
place->child[string[i] - 'A'] = malloc(sizeof(node));
for(j=0;j<26;j++){
place->child[string[i] - 'A']->child[j] = NULL;
}
}
place = place->child[string[i] - 'A'];
}
memcpy(place->string, string, 6);
}
void snapshot(){
static int count = 0;
int i;
for(i=0;i<5;i++){
printf("%s\n", htrie[i]->string);
}
for(i=0;i<27;i++){
printf("%c%d ", 'A'+i, bag[i]);
}
printf("\n");
if(++count>=1000){
exit(0);
}
}
void pick(x, y){
if(y==5){
if(score>max_score){
snapshot();
max_score = score;
}
return;
}
int i, tile,nextx, nexty;
node* oldv = vtrie[x];
node* oldh = htrie[y];
if(x+1==5){
nextx = 0;
nexty = y+1;
} else {
nextx = x+1;
nexty = y;
}
for(i=0;i<26;i++){
if(vtrie[x]->child[order[i]]!=NULL &&
htrie[y]->child[order[i]]!=NULL &&
(tile = bag[order[i]] ? order[i] : bag[26] ? 26 : -1) + 1) {
vtrie[x] = vtrie[x]->child[order[i]];
htrie[y] = htrie[y]->child[order[i]];
bag[tile]--;
score+=value[tile];
pick(nextx, nexty);
vtrie[x] = oldv;
htrie[y] = oldh;
bag[tile]++;
score-=value[tile];
}
}
}
int main(int argc, char** argv){
root = malloc(sizeof(node));
FILE* wordlist = fopen("sowpods5letters.txt", "r");
score = 0;
max_score = 0;
int i;
for(i=0;i<26;i++){
root->child[i] = NULL;
}
for(i=0;i<5;i++){
vtrie[i] = root;
htrie[i] = root;
}
for(i=0;i<27;i++){
bag[i] = bag[i] - block_1[i];
bag[i] = bag[i] - block_2[i];
bag[i] = bag[i] - block_3[i];
printf("%c%d ", 'A'+i, bag[i]);
}
char* string = malloc(sizeof(char)*6);
while(fscanf(wordlist, "%s", string) != EOF){
insert(string);
}
free(string);
fclose(wordlist);
pick(0,0);
return 0;
}
After finding out how many blocks there were (nearly 2 billion and still counting), I switched to trying to find certain types of blocks, in particular the difficult to construct ones using uncommon letters. My hope was that if I ended up with a benign enough set of letters going in to the last block, the vast space of valid blocks would probably have one for that set of letters.
I assigned each tile a value inversely proportional to the number of 5 letter words it appears in. Then, when I found a valid block I would sum up the tile values, and if the score was the best I had yet seen, I would print out the block.
For the first block I removed the blank tiles, figuring that the last block would need that flexibility the most. After letting it run until I had not seen a better block appear for some time, I selected the best block, and removed the tiles in it from the bag, and ran the program again, getting the second block. I repeated this for the 3rd block. Then for the last block I added the blanks back in and used the first valid block it found.
Here's how I would try this. First construct a prefix tree.
Pick a word and place it horizontally on top. Pick a word and place it vertically. Alternate them until exhausted options. By alternating you start to fix the first letters and eliminating lots of mismatching words. If you really do find such square, then do a check whether they can be made with those pieces.
For 5x5 squares: after doing some thinking it can't be worse than O(12000!/11990!) for random text words. But thinking about it a little bit more. Every time you fix a letter (in normal text) you eliminate about 90% (an optimistic guess) of your words. This means after three iterations you've got 12 words. So the actual speed would be
O(n * n/10 * n/10 * n/100 * n/100 * n/1000 * n/1000 ...
which for 12000 elements acts something like n^4 algorithm
which isn't that bad.
Probably someone can do a better analysis of the problem. But the search for words should still converge quite quickly.
There can be more eliminating done by abusing the infrequent letters. Essentially find all words that have infrequent letters. Try to make a matching positions for each letters. Construct a set of valid letters for each position.
For example, let's say we have four words with letter Q in it.
AQFED, ZQABE, EDQDE, ELQUO
this means there are two valid positionings of those:
xZxxx
AQFED
xAxxx ---> this limits our search for words that contain [ABDEFZ] as the second letter
xBxxx
xExxx
same for the other
EDQDE ---> this limits our search for words that contain [EDLU] as the third letter
ELQUO
all appropriate words are in union of those two conditions
So basically, if we have multiple words that contain infrequent letter X in word S at position N, means that other words that are in that matrix must have letter that is also in S in position n.
Formula:
Find all words that contain infrequent letter X at position 1 (next iteration 2, 3... )
Make a set A out of the letters in those words
Keep only those words from the dictionary that have letter from set A in position 1
Try to fit those into the matrix (with the first method)
Repeat with position 2
I would approach the problem (naively, to be sure) by taking a pessimistic view. I'd try to prove there was no 5x5 solution, and therefore certainly not four 5x5 solutions. To prove there was no 5x5 solution I'd try to construct one from all possibilities. If my conjecture failed and I was able to construct a 5x5 solution, well, then, I'd have a way to construct 5x5 solutions and I would try to construct all of the (independent) 5x5 solutions. If there were at least 4, then I would determine if some combination satisfied the letter count restrictions.
[Edit] Null Set has determined that there are "4,430,974 5x5 solutions". Are these valid?
I mean that we have a limitation on the number of letters we can use. This limitation can be expressed as a boundary vector BV = [9, 2, 2, 4, ...] corresponding to the limits on A, B, C, etc. (You see this vector in Null Set's code). A 5x5 solution is valid if each term of its letter count vector is less than the corresponding term in BV. It would be easy to check if a 5x5 solution is valid as it was created. Perhaps the 4,430,974 number can be reduced, say to N.
Regardless, we can state the problem as: find four letter count vectors among the N whose sum is equal to BV. There are (N, 4) possible sums ("N choose 4"). With N equal to 4 million this is still on the order of 10^25---not an encouraging number. Perhaps you could search for four whose first terms sum to 9, and if so checking that their second terms sum to 2, etc.
I'd remark that after choosing 4 from N the computations are independent, so if you have a multi-core machine you can make this go faster with a parallel solution.
[Edit2] Parallelizing probably wouldn't make much difference, though. At this point I might take an optimistic view: there are certainly more 5x5 solutions than I expected, so there may be more final solutions than expected, too. Perhaps you might not have to get far into the 10^25 to hit one.
I'm starting with something simpler.
Here are some results so far:
3736 2x2 solutions
8812672 3x3 solutions
The 1000th 4x4 solution is
A A H S
A C A I
L A I R
S I R E
The 1000th 5x5 solution is
A A H E D
A B U N A
H U R S T
E N S U E
D A T E D
The 1000th 2x4x4 solution is
A A H S | A A H S
A B A C | A B A C
H A I R | L E K U
S C R Y | S T E D
--------+--------
D E E D | D E E M
E I N E | I N T I
E N O L | O V E R
T E L T | L Y N E
Note that transposing an 'A' and a blank that is being used as an 'A' should be considered the same solution. But transposing the rows with the columns should be considered a different solution. I hope that makes sense.
Here are a lot of precomputed 5x5's. Left as an exercise to the reader to find 4 compatible ones :-)
http://www.gtoal.com/wordgames/wordsquare/all5

Resources