Why is [a..b] an empty list when a > b? - haskell

If I enter [5..1] into the Haskell console it returns [], whereas I expected [5, 4, 3, 2, 1].
In general, [a..b] = [] if a > b. Why?

The Report covers the details. In Section 3.10:
Arithmetic sequences satisfy these identities:
[ e1..e3 ] = enumFromTo e1 e3
In Section 6.3.4:
For the types Int and Integer, the enumeration functions have the following meaning:
The sequence enumFromTo e1 e3 is the list [e1,e1 + 1,e1 + 2,…e3]. The list is empty if e1 > e3.
For Float and Double, the semantics of the enumFrom family is given by the rules for Int above, except that the list terminates when the elements become greater than e3 + i∕2 for positive increment i, or when they become less than e3 + i∕2 for negative i.
Then the next question is, "Why was the Report specified that way?". There I think the answer is that this choice is quite natural for a mathematician, which most of the original committee were to some extent. It also has a number of nice properties:
If [x..y] has n values, then [x..y-1] and [x+1..y] have n-1 values (where in n-1, the subtraction saturates at 0, an ahem natural choice).
Checking whether a particular element is in the range [x..y] only requires checking that it is bigger than x and smaller than y -- you need not first determine which of x or y is bigger.
It prevents a certain class of surprising off-by-one errors: if you want to take the next n>=0 elements after x, you can write [x..x+n-1]. If you choose the other rule, where [x..y] might mean [y,y+1,...,x] if y is smaller, there is no way to create an empty list with [_.._] syntax, so no uniform way to take the next n elements. One would have to write the more cumbersome if n>0 then [x..x+n-1] else []; and it would be very easy to forget to write this check.
If you would like the list [5,4,3,2,1], that may be achieved by specifying an explicit second step, as in [5,4..1].

Related

Periodicity (Fibonacci mod sequence) in infinites list Haskell

I need to create a function in Haskell, which works as follows
periodicity ::[Integer] ->[Integer]
periodicity [1,2,3,3,4,1,2,3,3,4...] = [1,2,3,4]
periodicity [0,1,2,2,5,4,3,3,0,1,2,5,4...] = [0,1,2,5,4,3]
That is to say, that from a list you get the part that is always repeated, what in Mathematical Sciences would be called period of a function.
I've tried this, but I doesn't work like I want for the reason that I want that work with infinites list
periodicty :: Eq a => [a] -> [a]
periodicity xs = take n xs
where l = length xs
n = head [m | m <- divisors l,
concat (replicate (l `div` m) (take m xs)) == xs]
I have found this function that gives me the length of period, I could have solved the problem, but I don't understand the code after where:
periodo 1 = 1
periodo n = f 1 ps 0
where
f 0 (1 : xs) pi = pi
f _ (x : xs) pi = f x xs (pi + 1)
ps = 1 : 1 : zipWith (\u v -> (u + v) `mod` n) (tail ps) ps
The function you want, as you have stated it, is impossible1.
But since you said you are really after is the Pisano period, it's enough to notice that two successive numbers is enough to determine the remainder of a fibonacci sequence (mod n or otherwise). So you are really looking for the first reoccurrence of an adjacent pair, e.g.
0, 1, 1, 2, 0, 2, 2, 1, 0, 1, 1, 2, 0, 2, 2, 1, 0, 1, 1, 2, 0, 2, 2, 1, 0
^^^^ ^^^^
[--------- 8 -----------)
I am not much for coding people's problems for them, but I can sketch the way I would solve this. One thing to keep in mind is that the periodicity might have a prefix that does not repeat -- I don't know whether this actually occurs in Fibonacci sequences mod n, but it occurs in general. So we need to be prepared to throw away a prefix.
First, zip the list with its tail to get a list of adjacent pairs
[ 0, 1, 1, 2, 0, 2, 2, 1 ...]
-> [(0,1), (1,1), (1,2), (2,0), (0,2), (2,2), (2,1), ... ]
From this, fold through the list building a Data.Map keyed on this pair, where the value is the index it first occurred. You could do this with foldr but I'd probably just use a recursive function with an accumulator. For the above example the map at each step would look like:
{(0,1): 0}
{(0,1): 0, (1,1): 1}
{(0,1): 0, (1,1): 1, (1,2): 2}
{(0,1): 0, (1,1): 1, (1,2): 2, (2,0): 3}
...
When you reach a point in the list where the key is already present, you can then subtract the current index from the one in the map, and there's your period.
1 Here's a proof. Let's say you have the specification for a Turing machine, and you make a list steps of the steps of its execution. This list will be finite if it halts, infinite otherwise. Now construct this list:
bad = zipWith const (cycle [1,2,3]) steps ++ cycle [1,2,3,4]
This list cycles with period 3 as long as the machine runs, and with period 4 afterward. So if the Turing machine halts, periodicity bad = 4, otherwise periodicity bad = 3. That is, periodicity can decide the halting problem, which is impossible.
What you are asking for is impossible for an arbitrary infinite list. We can only examine a finite sublist in finite time, and the next element of the list might, for all we know, break the pattern.
In your comments, you clarify that you really are looking for a periodic part of the Fibonacci sequence, modulo m. In that special case, it is possible, if I understand you correctly.
The Fibonacci sequence (mod m) is periodic after a certain point if either the same value repeats three times: the previous two values are both equal to their predecessors, so the function becomes periodic with a period of 1. It is also periodic after a certain point if any sequence of two or more numbers repeats even once, as then we know that the this value and its predecessor are repeats of the ones k and k-1 terms ago, and the function will generate the same subsequence again with period k. There is no shorter period, or we would have detected it, going left to right.
Furthermore, any sequence that repeats infinitely will repeat once first, so this detects all such sequences.
Therefore, a better way to calculate this than I originally wrote would be to search for the current number and its predecessor earlier in the list. (You can use luqui’s strategy of building a list of consecutive pairs, or search the same data structure recursively instead of building a new one.) If a match exists, the sequence is guaranteed to repeat with a period equal to the distance between the two appearances of the same pair.
That takes time quadratic in the length of the non-periodic initial subsequence, since you search each initial subsequence from the beginning. To do it in linear time with an upper bound of m ²+2 steps: we know there are only m possible values, meaning only m ² possible pairs of values, a sequence of k numbers contains k-1 consecutive pairs of numbers, and therefore by the pigeonhole principle the first m ²+2 elements of the sequence must contain some pair of consecutive values in two different places, and become periodic from the first instance of the pair onward. So searching that fixed-length initial subsequence suffices, and we can build a table of the index (if any) of each of the n ² potential pairs in the list until we encounter the first duplicate. (That said, we would need to use a mutable array, so we sacrifice either speed or functional purity.)
This is similar to lugui’s algorithm, but with a faster lookup.
Conjecture
The sequence is periodic iff 0:1 appears more than once. If every Fibonacci sequence (mod m) is periodic, then the period is simply the position of the second occurrence of [0,1].
0:1 would be generated only by a preceding -1:1, which would be generated by a preceding -3:2, which would be generated by a preceding -8:5, and so on. [...,-8,5,-3,2,-1,1,0] is exactly the fibonacci sequence, backwards, with alternating sign, mod m, and if any two consecutive numbers appear in the original sequence, it is periodic. Thus, iff [0,1,1] would ever be generated by this pattern, it will eventually generate 0:1 in the Fibonacci sequence mod m. This occurs iff m-1 and 1 occur consecutively in Fibo mod m, in either order.
Two Special Cases
If Fibo mod m contains m-1:1 at position i, the sequence has period i+2, and if it contains 1:m-1, the sequence has period 2 i+4. (If the sequence contains 1:-1, the next position is i+2 and the next i+2 steps are: {0,-1,-1,-2,-3,-5,...,-1,1}). So this lets us shortcut a bit; when we see 1,4 at position 8 of Fibo mod 5, we know the sequence has a period of 20. In this special case, the scan needs fewer than half the elements on average, has an upper bound of m ²/2+1 elements to scan in order to rule the case out, and uses constant memory.

Selecting parameters for string hashing

I was recently reading an article on string hashing. We can hash a string by converting a string into a polynomial.
H(s1s2s3 ...sn) = (s1 + s2*p + s3*(p^2) + ··· + sn*(p^n−1)) mod M.
What are the constraints on p and M so that the probability of collision decreases?
A good requirement for a hash function on strings is that it should be difficult to find a
pair of different strings, preferably of the same length n, that have equal fingerprints. This
excludes the choice of M < n. Indeed, in this case at some point the powers of p corresponding
to respective symbols of the string start to repeat.
Similarly, if gcd(M, p) > 1 then powers of p modulo M may repeat for
exponents smaller than n. The safest choice is to set p as one of
the generators of the group U(ZM) – the group of all integers
relatively prime to M under multiplication modulo M.
I am not able to understand the above constraints. How selecting M < n and gcd(M,p) > 1 increases collision? Can somebody explain these two with some examples? I just need a basic understanding of these.
In addition, if anyone can focus on upper and lower bounds of M, it will be more than enough.
The above facts has been taken from the following article string hashing mit.
The "correct" answers to these questions involve some amount of number theory, but it can often be instructive to look at some extreme cases to see why the constraints might be useful.
For example, let's look at why we want M ≥ n. As an extreme case, let's pick M = 2 and n = 4. Then look at the numbers p0 mod 2, p1 mod 2, p2 mod 2, and p3 mod 2. Because there are four numbers here and only two possible remainders, by the pigeonhole principle we know that at least two of these numbers must be equal. Let's assume, for simplicity, that p0 and p1 are the same. This means that the hash function will return the same hash code for any two strings whose first two characters have been swapped, since those characters are multiplied by the same amount, which isn't a desirable property of a hash function. More generally, the reason why we want M ≥ n is so that the values p0, p1, ..., pn-1 at least have the possibility of being distinct. If M < n, there will just be too many powers of p for them to all be unique.
Now, let's think about why we want gcd(M, p) = 1. As an extreme case, suppose we pick p such that gcd(M, p) = M (that is, we pick p = M). Then
s0p0 + s1p1 + s2p2 + ... + sn-1pn-1 (mod M)
= s0M0 + s1M1 + s2M2 + ... + sn-1Mn-1 (mod M)
= s0
Oops, that's no good - that makes our hash code exactly equal to the first character of the string. This means that if p isn't coprime with M (that is, if gcd(M, p) ≠ 1), you run the risk of certain characters being "modded out" of the hash code, increasing the collision probability.
How selecting M < n and gcd(M,p) > 1 increases collision?
In your hash function formula, M might reasonably be used to restrict the hash result to a specific bit-width: e.g. M=216 for a 16-bit hash, M=232 for a 32-bit hash, M=2^64 for a 64-bit hash. Usually, a mod/% operation is not actually needed in an implementation, as using the desired size of unsigned integer for the hash calculation inherently performs that function.
I don't recommend it, but sometimes you do see people describing hash functions that are so exclusively coupled to the size of a specific hash table that they mod the results directly to the table size.
The text you quote from says:
A good requirement for a hash function on strings is that it should be difficult to find a pair of different strings, preferably of the same length n, that have equal fingerprints. This excludes the choice of M < n.
This seems a little silly in three separate regards. Firstly, it implies that hashing a long passage of text requires a massively long hash value, when practically it's the number of distinct passages of text you need to hash that's best considered when selecting M.
More specifically, if you have V distinct values to hash with a good general purpose hash function, you'll get dramatically less collisions of the hash values if your hash function produces at least V2 distinct hash values. For example, if you are hashing 1000 values (~210), you want M to be at least 1 million (i.e. at least 2*10 = 20-bit hash values, which is fine to round up to 32-bit but ideally don't settle for 16-bit). Read up on the Birthday Problem for related insights.
Secondly, given n is the number of characters, the number of potential values (i.e. distinct inputs) is the number of distinct values any specific character can take, raised to the power n. The former is likely somewhere from 26 to 256 values, depending on whether the hash supports only letters, or say alphanumeric input, or standard- vs. extended-ASCII and control characters etc., or even more for Unicode. The way "excludes the choice of M < n" implies any relevant linear relationship between M and n is bogus; if anything, it's as M drops below the number of distinct potential input values that it increasingly promotes collisions, but again it's the actual number of distinct inputs that tends to matter much, much more.
Thirdly, "preferably of the same length n" - why's that important? As far as I can see, it's not.
I've nothing to add to templatetypedef's discussion on gcd.

Taxicab Numbers in Haskell

Taxicab number is defined as a positive integer that can be expressed as a sum of two cubes in at least two different ways.
1729=1^3+12^3=9^3+10^3
I wrote this code to produce a taxicab number which on running would give the nth smallest taxicab number:
taxicab :: Int -> Int
taxicab n = [(cube a + cube b)
| a <- [1..100],
b <- [(a+1)..100],
c <- [(a+1)..100],
d <- [(c+1)..100],
(cube a + cube b) == (cube c + cube d)]!!(n-1)
cube x = x * x * x
But the output I get is not what I expected.For the numbers one to three the code produces correct output but taxicab 4 produces 39312 instead of 20683.Another strange thing is that 39312 is originally the 6th smallest taxicab number-not fourth!
So why is this happening? Where is the flaw in my code?
I think you mistakenly believe that your list contains the taxicab numbers in an increasing order. This is the actual content of your list:
[1729,4104,13832,39312,704977,46683,216027,32832,110656,314496,
216125,439101,110808,373464,593047,149389,262656,885248,40033,
195841,20683,513000,805688,65728,134379,886464,515375,64232,171288,
443889,320264,165464,920673,842751,525824,955016,994688,327763,
558441,513856,984067,402597,1016496,1009736,684019]
Recall that a list comprehension such as [(a,b) | a<-[1..100],b<-[1..100]] will generate its pairs as follows:
[(1,1),...,(1,100),(2,1),...,(2,100),...,...,(100,100)]
Note that when a gets to its next value, b is restarted from 1. In your code, suppose you just found a taxicab number of the form a^3+b^3, and then no larger b gives you a taxicab. In such case the next value of a is tried. We might find a taxicab of the form (a+1)^3+b'^3 but there is no guarantee that this number will be larger, since b' is any number in [a+2..100], and can be smaller than b. This can also happen with larger values of a: when a increases, there's no guarantee its related taxicabs are larger than what we found before.
Also note that, for the same reason, an hypotetical taxicab of the form 101^3+b^3 could be smaller than the taxicabs you have on your list, but it does not occur there.
Finally, note that you function is quite inefficient, since every time you call taxicab n you recompute all the first n taxicab values.

Return smallest even number from 3 arguments or largest uneven number if there are no even numbers

I have a semi-voluntary Haskell homework here and need some help on how to solve it.
The task:
Write a Haskell function
evenmin a b c
that returns the smallest even number from the arguments or the largest uneven one if there is no even number in the arguments.
I know that i can do that with many guards, but I am sure that there is a much nicer way! Please don't write out solution, but nudge me in the right direction if you can. Thanks!
Hint: Instead of 3 arguments, suppose your input is a non-empty list of integers, i.e.
evenmin' :: [Int] -> Int
Suppose further you have a function phi that partitions the input like so:
phi [1, 2, 3, 4, 5, 6] == ([1,3,5],[2,4,6])
What would the definition of evenmin' be? Afterwards, define evenmin a b c = evenmin' [a, b, c].
Order integers in this way:
even integers are ordered by <=
odd integers are ordered by >=
even integers are always smaller than odd ones.
Define myCompare :: Int -> Int -> Ordering.
Realize you want the minimum according to the above ordering.
How to compute the minimum of two objects w.r.t. a generic ordering?
How to extend that to three objects?
Bonus: how to extend that to lists?

Generating triangular number using iteration in haskell

I am trying to write a function in Haskell to generate triangular number, I am not allowed to use recursion, I am supposed to use iteration
here is my code ...
triSeries 0 = [0]
triSeries n = take n $iterate (\x->(0+x)) 1
I know that my function after iterate is wrong .
But It has been hours looking for a function, any hint please?
Start by writing out some triangular numbers
T(1) = 1
T(2) = 1 + 2
T(3) = 1 + 2 + 3
An iterative process to generate T(n) is to start from [1..n], take the first element of the list, and add it to a running total. In a language with mutable state, you might write:
def tri(n):
sum = 0
for x in [1..n]:
sum += x
return sum
In Haskell, you can iteratively consume a list of numbers and accumulate state via a fold function (foldl, foldr, or some variant). Hopefully that's enough to get started with.
Maybe wikipedia could be a hint, where something like
triangular :: Int -> Int
triangular x = x * (x + 1) `div` 2
could be got from.
triSeries could be something like
triSeries :: Int -> [Int]
triSeries x = map triangular [1..x]
and works like that
> triSeries 10
[1,3,6,10,15,21,28,36,45,55]
Talking about iterate. Maybe there is some way to use it here, but as John said, foldl would be sufficient. Take a look at this page, what are you looking is in the very beginning.
It is not clear what is meant by "recursion is not allowed, use iteration". All functions that appear to be "iterative" are recursive inside.
iterate in all your uses can only modify the input with a constant, and iterate (+1) 1 is the same as [1..]. Consider using a Data.List function that can combine a number from infinite range [1..] and the previously computed sum to produce a infinite list of such sums:
T_i=i+T_{i-1}
This is definitely cheaper than x*(x+1) div 2
Consider using a Data.List function that can produce an infinite list of finite lists of sums from a infinite list of sums. This is going to be cheaper than computing a list of 10, then a list of 11 repeating the same computation done for the list of 10, etc.

Resources