haskell optimized fill of max value - haskell

Started building some small real world stuff in Haskell for getting some routine. Need piece of code that takes a value-A (max) and a list of values (x:xs) that will return a list of values of which the sum equals or is lower than value-A. What I have now below, but this keeps fitting numbers until full. It will not return the most optimal sequence.
fill :: Int -> [Int] -> Int
fill max (x:xs) = if x < max then fill (max - x) xs else fill max xs
fill max [] = max
I think I would need to write a function that takes the first value of the list, traverses the rest of the list for the most optimal addition, add that to another list and keep doing that until the smallest remainder or a match with max. Guess might be simple for you but proven a tough nut to crack for me.
Edit: if I sort the value list first, it's a little simpler it seems, since I can discard values that blow up max

Related

Optimization of Python comprehension expression

I was trying to get the frequency of max value in an integer list (intlist)
intlist.count(max(intlist))
this works and is good in speed as well.
I wanted to implement the max method with comprehension,-
[x if x>y else y for x in intlist for y in intlist if x!=y][-1]
the later turns out to be very slow.
Can any one point out what is the issue here.
testing with
intlist=np.array([1, 2, 3,3,-1])
in this case the value expected is 2 as 3 is the max value and it occurs 2 times.
The list comprehension will not calculate the maximum value in the first place. Indeed, it will here calculate the maximum of two values from intlist of the latest values. So unless the last two items in the list are the same, it will calculate the maximum of the last two values.
Furthermore it is not very efficient, since it runs in O(n2) time, and O(n2) memory. For huge lists, this would thus require gigantic amounts of memory.
Usually it is not a good idea to use list comprehension if you do not need a list in the first place. You can calculate a maximum with a for loop, where you each time compare an item with the thus far obtained maximum:
def other_max(listlike):
mmax = listlike[0]
for x in listlike:
if x > mmax:
mmax = x
return mmax
or with numpy we can sum up the array of booleans:
>>> (intlist == intlist.max()).sum()
2

Check if a particular list in a "list of lists" is full or not in Haskell

I'm playing Tic Tac Toe and the columns are represented by lists,
so classic 3x3 Tic Tac full of alternating X and O, from bottom to up, for three columns, would be [X,O,X][X,O,X][X,O,X]. Empty would be represented by Empty I guess (is that a good idea or bad idea)
How would I check if a selected column X, is full?
I want to have a function called Checker :: board -> Int -> Bool
Not really sure where to begin on defining the function Checker.
Edit: Clarifications
1) The board (like any real life game of Tic Tac Toe) will start off obviously as
[empty,empty,empty][empty,empty,empty][empty,empty,empty]
or it will start off as the empty list and a function needs to transform it to
[empty,empty,empty][empty,empty,empty][empty,empty,empty]
2) I want to check if the column is full, so to error check. I do not want players to add X's or O's to full columns. Columns could be filled up with any combination of X's and O's, just like mid way in a real life game of Tic-Tac-Toe.
3) The board is a list of lists. Columns by human interpretation are merely lists
So in a tic tac toe board that is ALL X's EXCEPT the middle being an O, is
[X,X,X][X.O,X][X,X,X]
You can check if a given (1-dimensional) list is all X with
all (== X) list
(As long as your data type has an Eq instance, which you can give it with e.g.
data Square = X | O | Empty
deriving (Eq)
).
Similarly, you can check if every element is non-empty with
all (/= Empty) list
or by defining your own function isFull :: Square -> Bool and using all isFull list.
You can extract a column from a list by mapping a list index operator over it.
column n xss = map (!! n) xss
Another way, which is arguably more elegant, is to transpose it and then look at the rows.

Binary search - worst/avg case

I'm finding it difficult to understand why/how the worst and average case for searching for a key in an array/list using binary search is O(log(n)).
log(1,000,000) is only 6. log(1,000,000,000) is only 9 - I get that, but I don't understand the explanation. If one did not test it, how do we know that the avg/worst case is actually log(n)?
I hope you guys understand what I'm trying to say. If not, please let me know and I'll try to explain it differently.
Worst case
Every time the binary search code makes a decision, it eliminates half of the remaining elements from consideration. So you're dividing the number of elements by 2 with each decision.
How many times can you divide by 2 before you are down to only a single element? If n is the starting number of elements and x is the number of times you divide by 2, we can write this as:
n / (2 * 2 * 2 * ... * 2) = 1 [the '2' is repeated x times]
or, equivalently,
n / 2^x = 1
or, equivalently,
n = 2^x
So log base 2 of n gives you x, which is the number of decisions being made.
Finally, you might ask, if I used log base 2, why is it also OK to write it as log base 10, as you have done? The base does not matter because the difference is only a constant factor which is "ignored" by Big O notation.
Average case
I see that you also asked about the average case. Consider:
There is only one element in the array that can be found on the first try.
There are only two elements that can be found on the second try. (Because after the first try, we chose either the right half or the left half.)
There are only four elements that can be found on the third try.
You can see the pattern: 1, 2, 4, 8, ... , n/2. To express the same pattern going in the other direction:
Half the elements take the maximum number of decisions to find.
A quarter of the elements take one fewer decision to find.
etc.
Since half of the elements take the maximum amount of time, it doesn't matter how much less time the other elements take. We could assume that all elements take the maximum amount of time, and even if half of them actually take 0 time, our assumption would not be more than double whatever the true average is. We can ignore "double" since it is a constant factor. So the average case is the same as the worst case, as far as Big O notation is concerned.
For binary search, the array should be arranged in ascending or descending order.
In each step, the algorithm compares the search key value with the key value of the middle element of the array.
If the keys match, then a matching element has been found and its index, or position, is returned.
Otherwise, if the search key is less than the middle element's key, then the algorithm repeats its action on the sub-array to the left of the middle element.
Or, if the search key is greater,then the algorithm repeats its action on the sub-array to the right.
If the remaining array to be searched is empty, then the key cannot be found in the array and a special "not found" indication is returned.
So, a binary search is a dichotomic divide and conquer search algorithm. Thereby it takes logarithmic time for performing the search operation as the elements are reduced by half in each of the iteration.
For sorted lists which we can do a binary search, each "decision" made by the binary search compares your key to the middle element, if greater it takes the right half of the list, if less it will take the left half of the list (if it's a match it will return the element at that position) you effectively reduce your list by half for every decision yielding O(logn).
Binary search however, only works for sorted lists. For un-sorted lists you can do a straight search starting with the first element yielding a complexity of O(n).
O(logn) < O(n)
Although it entirely depends on how many searches you'll be doing, your inputs, etc what your best approach would be.
For Binary search the prerequisite is a sorted array as input.
• As the list is sorted:
• Certainly we don't have to check every word in the dictionary to look up a word.
• A basic strategy is to repeatedly halve our search range until we find the value.
• For example, look for 5 in the list of 9 #s below.v = 1 1 3 5 8 10 18 33 42
• We would first start in the middle: 8
• Since 5<8, we know we can look at just the first half: 1 1 3 5
• Looking at the middle # again, narrow down to 3 5
• Then we stop when we're down to one #: 5
How many comparison is needed: 4 =log(base 2)(9-1)=O(log(base2)n)
int binary_search (vector<int> v, int val) {
int from = 0;
int to = v.size()-1;
int mid;
while (from <= to) {
mid = (from+to)/2;
if (val == v[mid])
return mid;
else if (val > v[mid])
from = mid+1;
else
to = mid-1;
}
return -1;
}

What is the fastest way to sort n strings of length n each?

I have n strings, each of length n. I wish to sort them in ascending order.
The best algorithm I can think of is n^2 log n, which is quick sort. (Comparing two strings takes O(n) time). The challenge is to do it in O(n^2) time. How can I do it?
Also, radix sort methods are not permitted as you do not know the number of letters in the alphabet before hand.
Assume any letter is a to z.
Since no requirement for in-place sorting, create an array of linked list with length 26:
List[] sorted= new List[26]; // here each element is a list, where you can append
For a letter in that string, its sorted position is the difference of ascii: x-'a'.
For example, position for 'c' is 2, which will be put to position as
sorted[2].add('c')
That way, sort one string only take n.
So sort all strings takes n^2.
For example, if you have "zdcbacdca".
z goes to sorted['z'-'a'].add('z'),
d goes to sorted['d'-'a'].add('d'),
....
After sort, one possible result looks like
0 1 2 3 ... 25 <br/>
a b c d ... z <br/>
a b c <br/>
c
Note: the assumption of letter collection decides the length of sorted array.
For small numbers of strings a regular comparison sort will probably be faster than a radix sort here, since radix sort takes time proportional to the number of bits required to store each character. For a 2-byte Unicode encoding, and making some (admittedly dubious) assumptions about equal constant factors, radix sort will only be faster if log2(n) > 16, i.e. when sorting more than about 65,000 strings.
One thing I haven't seen mentioned yet is the fact that a comparison sort of strings can be enhanced by exploiting known common prefixes.
Suppose our strings are S[0], S[1], ..., S[n-1]. Let's consider augmenting mergesort with a Longest Common Prefix (LCP) table. First, instead of moving entire strings around in memory, we will just manipulate lists of indices into a fixed table of strings.
Whenever we merge two sorted lists of string indices X[0], ..., X[k-1] and Y[0], ..., Y[k-1] to produce Z[0], ..., Z[2k-1], we will also be given 2 LCP tables (LCPX[0], ..., LCPX[k-1] for X and LCPY[0], ..., LCPY[k-1] for Y), and we need to produce LCPZ[0], ..., LCPZ[2k-1] too. LCPX[i] gives the length of the longest prefix of X[i] that is also a prefix of X[i-1], and similarly for LCPY and LCPZ.
The first comparison, between S[X[0]] and S[Y[0]], cannot use LCP information and we need a full O(n) character comparisons to determine the outcome. But after that, things speed up.
During this first comparison, between S[X[0]] and S[Y[0]], we can also compute the length of their LCP -- call that L. Set Z[0] to whichever of S[X[0]] and S[Y[0]] compared smaller, and set LCPZ[0] = 0. We will maintain in L the length of the LCP of the most recent comparison. We will also record in M the length of the LCP that the last "comparison loser" shares with the next string from its block: that is, if the most recent comparison, between two strings S[X[i]] and S[Y[j]], determined that S[X[i]] was smaller, then M = LCPX[i+1], otherwise M = LCPY[j+1].
The basic idea is: After the first string comparison in any merge step, every remaining string comparison between S[X[i]] and S[Y[j]] can start at the minimum of L and M, instead of at 0. That's because we know that S[X[i]] and S[Y[j]] must agree on at least this many characters at the start, so we don't need to bother comparing them. As larger and larger blocks of sorted strings are formed, adjacent strings in a block will tend to begin with longer common prefixes, and so these LCP values will become larger, eliminating more and more pointless character comparisons.
After each comparison between S[X[i]] and S[Y[j]], the string index of the "loser" is appended to Z as usual. Calculating the corresponding LCPZ value is easy: if the last 2 losers both came from X, take LCPX[i]; if they both came from Y, take LCPY[j]; and if they came from different blocks, take the previous value of L.
In fact, we can do even better. Suppose the last comparison found that S[X[i]] < S[Y[j]], so that X[i] was the string index most recently appended to Z. If M ( = LCPX[i+1]) > L, then we already know that S[X[i+1]] < S[Y[j]] without even doing any comparisons! That's because to get to our current state, we know that S[X[i]] and S[Y[j]] must have first differed at character position L, and it must have been that the character x in this position in S[X[i]] was less than the character y in this position in S[Y[j]], since we concluded that S[X[i]] < S[Y[j]] -- so if S[X[i+1]] shares at least the first L+1 characters with S[X[i]], it must also contain x at position L, and so it must also compare less than S[Y[j]]. (And of course the situation is symmetrical: if the last comparison found that S[Y[j]] < S[X[i]], just swap the names around.)
I don't know whether this will improve the complexity from O(n^2 log n) to something better, but it ought to help.
You can build a Trie, which will cost O(s*n),
Details:
https://stackoverflow.com/a/13109908
Solving it for all cases should not be possible in better that O(N^2 Log N).
However if there are constraints that can relax the string comparison, it can be optimised.
-If the strings have high repetition rate and are from a finite ordered set. You can use ideas from count sort and use a map to store their count. later, sorting just the map keys should suffice. O(NMLogM) where M is the number of unique strings. You can even directly use TreeMap for this purpose.
-If the strings are not random but the suffixes of some super string this can well be done
O(N Log^2N). http://discuss.codechef.com/questions/21385/a-tutorial-on-suffix-arrays

Project Euler #2 For Large Limits

I have a Haskell solution to Project Euler Problem 2 which works fine for the four million limit, as well as for limits up to 10^100000, taking only a few seconds on my machine.
But for anything bigger, e.g. 10^1000000, the computation does not return in good time, if at all (have tried leaving it for a couple of minutes). What is the limiting factor here?
evenFibonacciSum :: Integer -> Integer
evenFibonacciSum limit =
foldl' (\t (_,b) -> t + b) 0 . takeWhile ((<=limit) . snd) . iterate doIteration $ (1,2) where
doIteration (a, b) = (twoAB - a, twoAB + b) where
twoAB = 2*(a + b)
The problem is that you are summing the (even) Fibonacci numbers. That means you have to calculate them all. But
F(n) ≈ φ^n / √5, with φ = (1 + √5)/2
So you are adding a lot of numbers of large size, Θ(n) bits for F(n). For a limit of 10^1000000, you need about 800000×2 additions of numbers larger than 10^500000. In general, you need Θ(n) additions of numbers with Θ(n) bits.
Adding numbers of d digits [in whatever base] is an O(d) operation. So your algorithm is quadratic in the exponent.
To avoid that, find a closed formula for the sum S(k) of the first k even Fibonacci numbers (hint: it's a relatively easy formula involving one Fibonacci number), find the largest k so that F(3*k) <= limit, and compute the sum using the formula and the algorithm to compute F(n) in O(log n) steps e.g. here.
The problem here seems that you're using a formula for the even fibonacci-numbers that takes linear time to be computed. IIf you double your limit, your computation time also doubles. There should be an algorithm that takes only logarithmic time (if you double the limit, the time changes by a constant value), but it's your job to find out. I'm not spoiling Euler answers here.

Resources