Ranges in Haskell (GHCi) - haskell

I'm reading Learn You A Haskell for Great Good. His examples [2,2..20] and [3, 6..20] work fine but I got three weird results:
Count by 17's from one to 171: [17, 1..171] yields the null list.
Count by 17's from seventeen to 1711111: [17, 17..171111] repeats the number 17 until I interrupt GHCi.
There is a weird difference between take 54 [171, 234..] and take 54 [171, 244..]:
ghci> take 54 [171, 234..]
[171,234,297,360,423,486,549,612,675,738,801,864,927,990,1053,1116,1179,1242,1305,1368,1431,1494,1557,1620,1683,1746,1809,1872,1935,1998,2061,2124,2187,2250,2313,2376,2439,2502,2565,2628,2691,2754,2817,2880,2943,3006,3069,3132,3195,3258,3321,3384,3447,3510]
ghci> take 54 [171, 244..]
[171,244,317,390,463,536,609,682,755,828,901,974,1047,1120,1193,1266,1339,1412,1485,1558,1631,1704,1777,1850,1923,1996,2069,2142,2215,2288,2361,2434,2507,2580,2653,2726,2799,2872,2945,3018,3091,3164,3237,3310,3383,3456,3529,3602,3675,3748,3821,3894,3967,4040]
Why?

You have the meaning of ranges slightly off. The Haskell range syntax is one of four things: [first..], [first,second..], [first..last], [first,second..last]. The examples from Learn You A Haskell are
ghci> [2,4..20]
[2,4,6,8,10,12,14,16,18,20]
ghci> [3,6..20]
[3,6,9,12,15,18]
Note that in the first case, the list counts by twos, and in the second case, the list counts by threes. That's because the difference between the first and second items is two and three, respectively. In your syntax, you're trying to write [first,step..last] to get the list [first,first+step,first+2*step,...,last]; however, the step size of a range like that is actually the difference between the first two numbers. Without a second element, the step size is always one; and without a final element, the list goes on forever (or until the maximum/minimum element of the type is reached).
Thus, let's look at your three examples:
[17,1..171] == []. Since you specify 17,1, Haskell sees that the first two elements of your list ought to be seventeen and one, so you must be counting by -16. In that case, Haskell wants to stop as soon as the elements are smaller than the last element---but they start that way, and so no elements are produced. To count up by one, you want [17,18..171] (the first two elements of your list are 17 and 18), or simply [17..171].
[17, 17..171111] == repeat 17. This one's fun. Since the first two elements of your list are both 17, Haskell determines that you must be counting up by zero—and it will happily keep counting up until the result exceeds 171111. Of course, when counting by zero, this will never happen, and so you get an infinite list of seventeens. To count up by seventeen, you want [17,34..171111], or [17,17+17..171111] if you think that's clearer.
take 54 [171,234..] vs. take 54 [171,244..]. I'm not sure what behavior you were expecting here, but what they're each doing is the same as above: the first returns a list of fifty-four integers, starting at 171 and counting by 234 - 171 = 63; the second returns a list of fifty-four integers, starting at 171 and counting by 244 - 171 = 73. Each list goes on infinitely far (or at least until maxBound, if the lists are of finite Ints and not arbitrarily-large Integers), and so you just request the first fifty-four elements.
For some of the more nitty-gritty details on what range syntax means (it's translated into functions in the Enum type class), including slightly surprising behavior on ranges of floating-point numbers, hammar has a good answer to another question.

Well, the semantics of those operations are a little bit different than you expect. The construct [a,b..c] is actually just syntactical sugar for enumFromThenTo a b c, which behaves a little bit like this:
Calculate d = b - a. The output of [a,b..c] is [a,a+d,a+d+d,a+d+d+d,...]. This is repeated, till a+n*d > c, if d and c - a have different signs (In this case, the list would be infinite, so there is no output instead), or till maxBound or minBound is reached, then the output ends. (Of course, this is implemented differently, since we are using arbitrary instances of Enum here).
So [1,3..10] becomes [1,3,5,7,9] and since 17 - 17 = 0, [17, 17..171111] yields [17,17+0,17+0+0...]. And by that slightly complicated rule, [17, 1..171] yields the empty list.
To your addition: [x,y..] is implemented using the function enumFromThen x y, that behaves just like enumFromThenTo, except that there is no boundary condition, so if your Enum is infinite, so will be the resulting list.

I, too, was a bit surprised by this behavior, so I wrote a range function that feels more natural to me (and maybe to you as well):
range step start end = takeWhile (<=end) $ iterate (+step) start
Citing your examples:
Count by 17's from one to 171
is done by range 17 1 171, producing [1,18,35,52,69,86,103,120,137,154,171]
Count by 17's from seventeen to 1711111
is done by range 17 17 1711111, producing [17,34,51,68,85, ...

I was also confused by this tutorial: The tutorial uses the word step, which isn't explained, and in my view isn't what I think of as a step. It then shows the example which can easily be misinterpreted. since [2,4..20] looks like it means with a step 2 starting at 4.
The clue is in the output though:
ghci> [2,4..20]
[2,4,6,8,10,12,14,16,18,20]
if you look carefully (which I didn't). It means start at 2, the next being 4, with implicit step thenceforth of (4 - 2), carry on to output numbers in steps of 2 up to at most 20.
"ghci>" [1,6..20]
[1,6,11,16]
Note 20 is not output as 16 + 5 is greater than 20

Related

What is wrong with my beginner Brainfuck addition program?

I've been working on this programming challenge: http://www.codeabbey.com/index/task_view/summing-up
Which basically states:
Input data has two values A and B in the single line.
Output should have the sum A+B printed into it.
Additionally after the stop the program should have values A, B, A+B in the cells 0, 1 and 2 respectively.
So for example input would look like this:
9 26
Now, I think I be misunderstanding either the problem or the solution because I believe the solution is supposed to be 9 26 35 where 9, 26, and 35 are all in their own cells.
My solution returns 9 26 35 and I believe in the correct cells (0,1, and 2 respectfully) but I am getting the answer wrong. Can anyone please look at the problem and my code and tell me where I am going wrong?
Code:
;:>;:><[-<+>]<:
I tried plugging this into a couple of online brainfuck interpreters. There is one here:
http://copy.sh/brainfuck/
and another here:
http://esoteric.sange.fi/brainfuck/impl/interp/i.html
In both cases, I needed to change your character set slightly --> : becomes . and ; becomes ,
The output from both was
9 Y
Notice that 35 - 9 = 24, and Y is the 24th letter of the alphabet. I think you are outputting the number "35" and having it interpretted as a letter.
I would try changing the program so that your output is literally single digits of the answer -- ie, output a 3, then output a 5, instead of outputting a numerical "35" (but leave the numerical value in cell 2 at the end). In other words, your text output should be a formatted version of the values in memory, rather than just outputting the numerical values directly.
It sounds like the output should only have A+B printed, not A, B, and A+B, as you're doing with :.
And your result seems like it'll have A+B in cell 0, and 0 in cell 1 (essentially the same as the example code).
>< is just cancelling itself out.

Binary search - worst/avg case

I'm finding it difficult to understand why/how the worst and average case for searching for a key in an array/list using binary search is O(log(n)).
log(1,000,000) is only 6. log(1,000,000,000) is only 9 - I get that, but I don't understand the explanation. If one did not test it, how do we know that the avg/worst case is actually log(n)?
I hope you guys understand what I'm trying to say. If not, please let me know and I'll try to explain it differently.
Worst case
Every time the binary search code makes a decision, it eliminates half of the remaining elements from consideration. So you're dividing the number of elements by 2 with each decision.
How many times can you divide by 2 before you are down to only a single element? If n is the starting number of elements and x is the number of times you divide by 2, we can write this as:
n / (2 * 2 * 2 * ... * 2) = 1 [the '2' is repeated x times]
or, equivalently,
n / 2^x = 1
or, equivalently,
n = 2^x
So log base 2 of n gives you x, which is the number of decisions being made.
Finally, you might ask, if I used log base 2, why is it also OK to write it as log base 10, as you have done? The base does not matter because the difference is only a constant factor which is "ignored" by Big O notation.
Average case
I see that you also asked about the average case. Consider:
There is only one element in the array that can be found on the first try.
There are only two elements that can be found on the second try. (Because after the first try, we chose either the right half or the left half.)
There are only four elements that can be found on the third try.
You can see the pattern: 1, 2, 4, 8, ... , n/2. To express the same pattern going in the other direction:
Half the elements take the maximum number of decisions to find.
A quarter of the elements take one fewer decision to find.
etc.
Since half of the elements take the maximum amount of time, it doesn't matter how much less time the other elements take. We could assume that all elements take the maximum amount of time, and even if half of them actually take 0 time, our assumption would not be more than double whatever the true average is. We can ignore "double" since it is a constant factor. So the average case is the same as the worst case, as far as Big O notation is concerned.
For binary search, the array should be arranged in ascending or descending order.
In each step, the algorithm compares the search key value with the key value of the middle element of the array.
If the keys match, then a matching element has been found and its index, or position, is returned.
Otherwise, if the search key is less than the middle element's key, then the algorithm repeats its action on the sub-array to the left of the middle element.
Or, if the search key is greater,then the algorithm repeats its action on the sub-array to the right.
If the remaining array to be searched is empty, then the key cannot be found in the array and a special "not found" indication is returned.
So, a binary search is a dichotomic divide and conquer search algorithm. Thereby it takes logarithmic time for performing the search operation as the elements are reduced by half in each of the iteration.
For sorted lists which we can do a binary search, each "decision" made by the binary search compares your key to the middle element, if greater it takes the right half of the list, if less it will take the left half of the list (if it's a match it will return the element at that position) you effectively reduce your list by half for every decision yielding O(logn).
Binary search however, only works for sorted lists. For un-sorted lists you can do a straight search starting with the first element yielding a complexity of O(n).
O(logn) < O(n)
Although it entirely depends on how many searches you'll be doing, your inputs, etc what your best approach would be.
For Binary search the prerequisite is a sorted array as input.
• As the list is sorted:
• Certainly we don't have to check every word in the dictionary to look up a word.
• A basic strategy is to repeatedly halve our search range until we find the value.
• For example, look for 5 in the list of 9 #s below.v = 1 1 3 5 8 10 18 33 42
• We would first start in the middle: 8
• Since 5<8, we know we can look at just the first half: 1 1 3 5
• Looking at the middle # again, narrow down to 3 5
• Then we stop when we're down to one #: 5
How many comparison is needed: 4 =log(base 2)(9-1)=O(log(base2)n)
int binary_search (vector<int> v, int val) {
int from = 0;
int to = v.size()-1;
int mid;
while (from <= to) {
mid = (from+to)/2;
if (val == v[mid])
return mid;
else if (val > v[mid])
from = mid+1;
else
to = mid-1;
}
return -1;
}

What is the fastest way to sort n strings of length n each?

I have n strings, each of length n. I wish to sort them in ascending order.
The best algorithm I can think of is n^2 log n, which is quick sort. (Comparing two strings takes O(n) time). The challenge is to do it in O(n^2) time. How can I do it?
Also, radix sort methods are not permitted as you do not know the number of letters in the alphabet before hand.
Assume any letter is a to z.
Since no requirement for in-place sorting, create an array of linked list with length 26:
List[] sorted= new List[26]; // here each element is a list, where you can append
For a letter in that string, its sorted position is the difference of ascii: x-'a'.
For example, position for 'c' is 2, which will be put to position as
sorted[2].add('c')
That way, sort one string only take n.
So sort all strings takes n^2.
For example, if you have "zdcbacdca".
z goes to sorted['z'-'a'].add('z'),
d goes to sorted['d'-'a'].add('d'),
....
After sort, one possible result looks like
0 1 2 3 ... 25 <br/>
a b c d ... z <br/>
a b c <br/>
c
Note: the assumption of letter collection decides the length of sorted array.
For small numbers of strings a regular comparison sort will probably be faster than a radix sort here, since radix sort takes time proportional to the number of bits required to store each character. For a 2-byte Unicode encoding, and making some (admittedly dubious) assumptions about equal constant factors, radix sort will only be faster if log2(n) > 16, i.e. when sorting more than about 65,000 strings.
One thing I haven't seen mentioned yet is the fact that a comparison sort of strings can be enhanced by exploiting known common prefixes.
Suppose our strings are S[0], S[1], ..., S[n-1]. Let's consider augmenting mergesort with a Longest Common Prefix (LCP) table. First, instead of moving entire strings around in memory, we will just manipulate lists of indices into a fixed table of strings.
Whenever we merge two sorted lists of string indices X[0], ..., X[k-1] and Y[0], ..., Y[k-1] to produce Z[0], ..., Z[2k-1], we will also be given 2 LCP tables (LCPX[0], ..., LCPX[k-1] for X and LCPY[0], ..., LCPY[k-1] for Y), and we need to produce LCPZ[0], ..., LCPZ[2k-1] too. LCPX[i] gives the length of the longest prefix of X[i] that is also a prefix of X[i-1], and similarly for LCPY and LCPZ.
The first comparison, between S[X[0]] and S[Y[0]], cannot use LCP information and we need a full O(n) character comparisons to determine the outcome. But after that, things speed up.
During this first comparison, between S[X[0]] and S[Y[0]], we can also compute the length of their LCP -- call that L. Set Z[0] to whichever of S[X[0]] and S[Y[0]] compared smaller, and set LCPZ[0] = 0. We will maintain in L the length of the LCP of the most recent comparison. We will also record in M the length of the LCP that the last "comparison loser" shares with the next string from its block: that is, if the most recent comparison, between two strings S[X[i]] and S[Y[j]], determined that S[X[i]] was smaller, then M = LCPX[i+1], otherwise M = LCPY[j+1].
The basic idea is: After the first string comparison in any merge step, every remaining string comparison between S[X[i]] and S[Y[j]] can start at the minimum of L and M, instead of at 0. That's because we know that S[X[i]] and S[Y[j]] must agree on at least this many characters at the start, so we don't need to bother comparing them. As larger and larger blocks of sorted strings are formed, adjacent strings in a block will tend to begin with longer common prefixes, and so these LCP values will become larger, eliminating more and more pointless character comparisons.
After each comparison between S[X[i]] and S[Y[j]], the string index of the "loser" is appended to Z as usual. Calculating the corresponding LCPZ value is easy: if the last 2 losers both came from X, take LCPX[i]; if they both came from Y, take LCPY[j]; and if they came from different blocks, take the previous value of L.
In fact, we can do even better. Suppose the last comparison found that S[X[i]] < S[Y[j]], so that X[i] was the string index most recently appended to Z. If M ( = LCPX[i+1]) > L, then we already know that S[X[i+1]] < S[Y[j]] without even doing any comparisons! That's because to get to our current state, we know that S[X[i]] and S[Y[j]] must have first differed at character position L, and it must have been that the character x in this position in S[X[i]] was less than the character y in this position in S[Y[j]], since we concluded that S[X[i]] < S[Y[j]] -- so if S[X[i+1]] shares at least the first L+1 characters with S[X[i]], it must also contain x at position L, and so it must also compare less than S[Y[j]]. (And of course the situation is symmetrical: if the last comparison found that S[Y[j]] < S[X[i]], just swap the names around.)
I don't know whether this will improve the complexity from O(n^2 log n) to something better, but it ought to help.
You can build a Trie, which will cost O(s*n),
Details:
https://stackoverflow.com/a/13109908
Solving it for all cases should not be possible in better that O(N^2 Log N).
However if there are constraints that can relax the string comparison, it can be optimised.
-If the strings have high repetition rate and are from a finite ordered set. You can use ideas from count sort and use a map to store their count. later, sorting just the map keys should suffice. O(NMLogM) where M is the number of unique strings. You can even directly use TreeMap for this purpose.
-If the strings are not random but the suffixes of some super string this can well be done
O(N Log^2N). http://discuss.codechef.com/questions/21385/a-tutorial-on-suffix-arrays

CodeJam 2014: Solution for The Repeater

I participated in code jam, I successfully solved small input of The Repeater Challenge but can't seem to figure out approach for multiple strings.
Can any one give the algorithm used for multiple strings. For 2 strings ( small input ) I am comparing strings character by character and doing operations to make them equal. However this approach would time out for large input.
Can some one explain their algorithm they used. I can see solutions of other users but can't figure out what have they done.
I can tell you my solution which worked fine for both small and large inputs.
First, we have to see if there is a solution, you do that by bringing all strings to their "simplest" form. If any of them does not match, there there is no solution.
e.g.
aaabbbc => abc
abbbbbcc => abc
abbcca => abca
If only the first two were given, then a solution would be possible. As soon as the third is thrown into the mix, then it's impossible. The algorithm to do the "simplification" is to parse the string and eliminate any double character you see. As soon as a string does not equal the simplified form of the batch, bail out.
As for actual solution to the problem, i simply converted the strings to a [letter, repeat] format. So for example
qwerty => 1q,1w,1e,1r,1t,1y
qqqwweeertttyy => 3q,2w,3e,1r,3t,2y
(mind you the outputs are internal structures, not actual strings)
Imagine now you have 100 strings, you have already passed the test that there is a solution and you have all strings into the [letter, repeat] representation. Now go through every letter and find the least 'difference' of repetitions you have to do, to reach the same number. So for example
1a, 1a, 1a => 0 diff
1a, 2a, 2a => 1 diff
1a, 3a, 10a => 9 diff (to bring everything to 3)
the way to do this (i'm pretty sure there is a more efficient way) is to go from the min number to the max number and calculate the sum of all diffs. You are not guaranteed that the number will be one of the numbers in the set. For the last example, you would calculate the diff to bring everything to 1 (0,2,9 =11) then for 2 (1,1,8 =10), the for 3 (2,0,7 =9) and so on up to 10 and choose the min again. Strings are limited to 1000 characters so this is an easy calculation. On my moderate laptop, the results were instant.
Repeat the same for every letter of the strings and sum everything up and that is your solution.
This answer gives an example to explain why finding the median number of repeats produces the lowest cost.
Suppose we have values:
1 20 30 40 100
And we are trying to find the value which has shortest total distance to all these values.
We might guess the best answer is 50, with cost |50-1|+|50-20|+|50-30|+|50-40|+|50-100| = 159.
Split this into two sums, left and right, where left is the cost of all numbers to the left of our target, and right is the cost of all numbers to the right.
left = |50-1|+|50-20|+|50-30|+|50-40| = 50-1+50-20+50-30+50-40 = 109
right = |50-100| = 100-50 = 50
cost = left + right = 159
Now consider changing the value by x. Providing x is small enough such that the same numbers are on the left, then the values will change to:
left(x) = |50+x-1|+|50+x-20|+|50+x-30|+|50+x-40| = 109 + 4x
right(x) = |50+x-100| = 50 - x
cost(x) = left(x)+right(x) = 159+3x
So if we set x=-1 we will decrease our cost by 3, therefore the best answer is not 50.
The amount our cost will change if we move is given by difference between the number to our left (4) and the number to our right (1).
Therefore, as long as these are different we can always decrease our cost by moving towards the median.
Therefore the median gives the lowest cost.
If there are an even number of points, such as 1,100 then all numbers between the two middle points will give identical costs, so any of these values can be chosen.
Since Thanasis already explained the solution, I'm providing here my source code in Ruby. It's really short (only 400B) and following his algorithm exactly.
def solve(strs)
form = strs.first.squeeze
strs.map { |str|
return 'Fegla Won' if form != str.squeeze
str.chars.chunk { |c| c }.map { |arr|
arr.last.size
}
}.transpose.map { |row|
Range.new(*row.minmax).map { |n|
row.map { |r|
(r - n).abs
}.reduce :+
}.min
}.reduce :+
end
gets.to_i.times { |i|
result = solve gets.to_i.times.map { gets.chomp }
puts "Case ##{i+1}: #{result}"
}
It uses a method squeeze on strings, which removes all the duplicate characters. This way, you just compare every squeezed line to the reference (variable form). If there's an inconsistency, you just return that Fegla Won.
Next you use a chunk method on char array, which collects all consecutive characters. This way you can count them easily.

Convert text to numbers while preserving ordering?

I've got a strange requirements, which I can't seem to get my head around. I need to come up with a function that would take a text string and return a number corresponding to that string - in such a way that, when sorted, these numbers would go in the same order as the original strings. For example, if I the function produces this mapping:
"abcd" -> x
"abdef" -> y
"xyz" -> z
then the numbers must be such that x < y < z. The strings can be arbitrary length, but always non-empty and the string comparison should be case-insensitive (i.e. "ABC" and "abc" should result in the same numerical value).
My first though was to map each letter to a corresponding number 1 through 26 and then just get the resulting number, e.g. a = 1, b = 2, c = 3, ..., z = 26, then "abc" would become 1*26^2 + 2*26 + 3, however then I realised that the text string can contain any text in any language (i.e. full unicode), so this isn't going to work. At this point I'm stuck. Any other ideas before I tell the client to sod off?
P.S. This strange requirement is due to a limitation in a proprietary system that can only do sorting by a numeric field. If the sorting is required by any other field type, it must be converted to some numerical representation - and then sorted. Don't ask.
You can make this work if you allow for arbitrary-precision real numbers, though that kinda feels like cheating. Unicode strings are sequences of characters drawn from 1,114,112 options. You can therefore think of them as decimal base-1,114,113 numbers: write 0., then write out your Unicode string, and you have a real number in base-1,114,113 (shift each character's numeric value up by one so that missing characters have the value 0). Comparing two of these numbers in base-1,114,113 compares the numbers lexicographically: if you scan the digits from left-to-right, the first digit that they disagree on tiebreaks between the two. This approach is completely infeasible unless you have an arbitrary-precision real number library.
If you just have IEEE-734 doubles, this approach won't work. One way to see this is that there are at most 264 possible doubles (or 280 of them if you allow for long doubles) because there are only 64 (80) bits in a double, but there are infinitely many different strings. That eliminates the possibility simply because there are too many strings to go around.
Unfortunately, you can't make this work if you have arbitrary-precision integers. The natural ordering on strings has the fun property that you can find pairs of strings that have infinitely many strings lexicographically between them. For example, notice that
a < ab < aab < aaab < aaaab < ... < b
Now imagine that you have a function that maps each string to an integer that obeys the rules you'd like. That would mean that
f(a) < f(ab) < f(aab) < f(aaab) < f(aaaab) < ... < f(b)
But that's not possible in the integers - you can't have two integers f(a) and f(b) with infinitely many integers between them. (The number of integers between f(a) and f(b) is at most f(b) - f(a) - 1).
So it seems like the answer is "this is possible if you have arbitrary-precision real numbers, it's not possible with doubles, and it's not possible with arbitrary-precision integers." I'd basically label that "not going to happen in practice" even though it's theoretically possible. :-)

Resources