Haskell List Comprehension, delete integers from List of numbers - haskell

i want to implement a function in list comprehension.
It should deletes the integers in a list of numbers.
And i have a question about it.
delete xs = [ys|ys<-xs, ys /=fromInteger (round ys) ]
xxx.hs> delete [1,2,3.0,4.5,6.7]
[4.5,6.7]
is that means 3.0 is counted as integer instead of float?
And another question:
delete xs = [ys|ys<-xs, ys ==fromInteger (round ys) ]
this time i want it to return integers from a list of numbers.
xxx.hs> delete [1,2,3.0,4.5,6.7]
[1.0,2.0,3.0]
since i did not give the number 1 and 2 in decimal form, why it returns the numbers in decimal?
Thanks for helping me.

I want to implement a function in list comprehension. It should deletes the integers in a list of numbers.
In a list all elements have the same type. So in a list [1.2, 3, 4.5], the 3 is also a value of a type that is a member of the Floating typeclass.
Since i did not give the number 1 and 2 in decimal form, why it returns the numbers in decimal?
Because all the elements are of the same type. GHC will default to Double by the type defaulting rules.
Your filter does not specify that elements should be of an Integral type. This would also be non-sensical since types are resolved at compile time, not at runtime. It simply checks that ys is the same if you fromInteger (round ys). Since round 3.0 == 3, and fromInteger 3 == 3.0 in this case, it thus filters out elements with a fractional part. 1.0 and 2.0 have no fractional part.
The filtering is however not safe, for larger numbers, the mantisse can not represent every integer exactly, and thus this means that some values will be filtered retained when these are not integeral numbers. For more information, see is floating point math broken.

Related

Optimization of Python comprehension expression

I was trying to get the frequency of max value in an integer list (intlist)
intlist.count(max(intlist))
this works and is good in speed as well.
I wanted to implement the max method with comprehension,-
[x if x>y else y for x in intlist for y in intlist if x!=y][-1]
the later turns out to be very slow.
Can any one point out what is the issue here.
testing with
intlist=np.array([1, 2, 3,3,-1])
in this case the value expected is 2 as 3 is the max value and it occurs 2 times.
The list comprehension will not calculate the maximum value in the first place. Indeed, it will here calculate the maximum of two values from intlist of the latest values. So unless the last two items in the list are the same, it will calculate the maximum of the last two values.
Furthermore it is not very efficient, since it runs in O(n2) time, and O(n2) memory. For huge lists, this would thus require gigantic amounts of memory.
Usually it is not a good idea to use list comprehension if you do not need a list in the first place. You can calculate a maximum with a for loop, where you each time compare an item with the thus far obtained maximum:
def other_max(listlike):
mmax = listlike[0]
for x in listlike:
if x > mmax:
mmax = x
return mmax
or with numpy we can sum up the array of booleans:
>>> (intlist == intlist.max()).sum()
2

What is the fastest way to sort n strings of length n each?

I have n strings, each of length n. I wish to sort them in ascending order.
The best algorithm I can think of is n^2 log n, which is quick sort. (Comparing two strings takes O(n) time). The challenge is to do it in O(n^2) time. How can I do it?
Also, radix sort methods are not permitted as you do not know the number of letters in the alphabet before hand.
Assume any letter is a to z.
Since no requirement for in-place sorting, create an array of linked list with length 26:
List[] sorted= new List[26]; // here each element is a list, where you can append
For a letter in that string, its sorted position is the difference of ascii: x-'a'.
For example, position for 'c' is 2, which will be put to position as
sorted[2].add('c')
That way, sort one string only take n.
So sort all strings takes n^2.
For example, if you have "zdcbacdca".
z goes to sorted['z'-'a'].add('z'),
d goes to sorted['d'-'a'].add('d'),
....
After sort, one possible result looks like
0 1 2 3 ... 25 <br/>
a b c d ... z <br/>
a b c <br/>
c
Note: the assumption of letter collection decides the length of sorted array.
For small numbers of strings a regular comparison sort will probably be faster than a radix sort here, since radix sort takes time proportional to the number of bits required to store each character. For a 2-byte Unicode encoding, and making some (admittedly dubious) assumptions about equal constant factors, radix sort will only be faster if log2(n) > 16, i.e. when sorting more than about 65,000 strings.
One thing I haven't seen mentioned yet is the fact that a comparison sort of strings can be enhanced by exploiting known common prefixes.
Suppose our strings are S[0], S[1], ..., S[n-1]. Let's consider augmenting mergesort with a Longest Common Prefix (LCP) table. First, instead of moving entire strings around in memory, we will just manipulate lists of indices into a fixed table of strings.
Whenever we merge two sorted lists of string indices X[0], ..., X[k-1] and Y[0], ..., Y[k-1] to produce Z[0], ..., Z[2k-1], we will also be given 2 LCP tables (LCPX[0], ..., LCPX[k-1] for X and LCPY[0], ..., LCPY[k-1] for Y), and we need to produce LCPZ[0], ..., LCPZ[2k-1] too. LCPX[i] gives the length of the longest prefix of X[i] that is also a prefix of X[i-1], and similarly for LCPY and LCPZ.
The first comparison, between S[X[0]] and S[Y[0]], cannot use LCP information and we need a full O(n) character comparisons to determine the outcome. But after that, things speed up.
During this first comparison, between S[X[0]] and S[Y[0]], we can also compute the length of their LCP -- call that L. Set Z[0] to whichever of S[X[0]] and S[Y[0]] compared smaller, and set LCPZ[0] = 0. We will maintain in L the length of the LCP of the most recent comparison. We will also record in M the length of the LCP that the last "comparison loser" shares with the next string from its block: that is, if the most recent comparison, between two strings S[X[i]] and S[Y[j]], determined that S[X[i]] was smaller, then M = LCPX[i+1], otherwise M = LCPY[j+1].
The basic idea is: After the first string comparison in any merge step, every remaining string comparison between S[X[i]] and S[Y[j]] can start at the minimum of L and M, instead of at 0. That's because we know that S[X[i]] and S[Y[j]] must agree on at least this many characters at the start, so we don't need to bother comparing them. As larger and larger blocks of sorted strings are formed, adjacent strings in a block will tend to begin with longer common prefixes, and so these LCP values will become larger, eliminating more and more pointless character comparisons.
After each comparison between S[X[i]] and S[Y[j]], the string index of the "loser" is appended to Z as usual. Calculating the corresponding LCPZ value is easy: if the last 2 losers both came from X, take LCPX[i]; if they both came from Y, take LCPY[j]; and if they came from different blocks, take the previous value of L.
In fact, we can do even better. Suppose the last comparison found that S[X[i]] < S[Y[j]], so that X[i] was the string index most recently appended to Z. If M ( = LCPX[i+1]) > L, then we already know that S[X[i+1]] < S[Y[j]] without even doing any comparisons! That's because to get to our current state, we know that S[X[i]] and S[Y[j]] must have first differed at character position L, and it must have been that the character x in this position in S[X[i]] was less than the character y in this position in S[Y[j]], since we concluded that S[X[i]] < S[Y[j]] -- so if S[X[i+1]] shares at least the first L+1 characters with S[X[i]], it must also contain x at position L, and so it must also compare less than S[Y[j]]. (And of course the situation is symmetrical: if the last comparison found that S[Y[j]] < S[X[i]], just swap the names around.)
I don't know whether this will improve the complexity from O(n^2 log n) to something better, but it ought to help.
You can build a Trie, which will cost O(s*n),
Details:
https://stackoverflow.com/a/13109908
Solving it for all cases should not be possible in better that O(N^2 Log N).
However if there are constraints that can relax the string comparison, it can be optimised.
-If the strings have high repetition rate and are from a finite ordered set. You can use ideas from count sort and use a map to store their count. later, sorting just the map keys should suffice. O(NMLogM) where M is the number of unique strings. You can even directly use TreeMap for this purpose.
-If the strings are not random but the suffixes of some super string this can well be done
O(N Log^2N). http://discuss.codechef.com/questions/21385/a-tutorial-on-suffix-arrays

Create list of strings from list of doubles, non Scientific notation

listOfLongDeci = [showFFloat Nothing (1/a) | a<-[2..1000], length (show (1/a)) > 7]
listOfLongDeci2 = [show (1/a) | a<-[2..1000], length (show (1/a)) > 7]
listOfLongDeci3 = [(1/a) | a<-[2..1000], length (show (1/a)) > 7]
the 1st gives a list of ShowS, how can I make a string from showS?
the 2nd gives a list of scientific notation
the 3rd only gives list
of doubles
How can I use any of these to create a list of strings with non scientific notation? (Euler 26)
As requested:
the 1st gives a list of ShowS, how can I make a String from ShowS?
Since ShowS is a type synonym for String -> String, you obtain a String by applying the function to a String. Since the showXFloat functions produce a function that prepends some String to the final String argument (basically a difference list; many show-related functions produce such - shows, showChar, showString, to name a few - for reasons of efficiency), the natural choice for the final argument is the empty String, so
listOfLongDeci = [showFFloat Nothing (1/a) "" | a<-[2..1000], length (show (1/a)) > 7]
produces a list of Strings, correctly rounded approximations to the decimal representation of the numbers 1/a in non scientific notation.
how can I use any of these to create a list of strings with non scientific notation? (euler 26)
The first part has been answered, but these representations won't help you solve Problem 26 of Project Euler,
Find the value of d < 1000 for which 1/d contains the longest recurring cycle in its decimal fraction part.
A Double has 53 bits of precision (52 explicit bits for the significand plus one hidden bit for normalized numbers, no hidden bit, thus 52 or fewer bits of precision for subnormal numbers), and the number 1/d cannot be exactly represented as a Double unless d is a power of 2. The 53 bits of precision give you roughly
Prelude> 53 * log 2 / log 10
15.954589770191001
significant decimal digits of precision, so from the first nonzero digit on, you have 15 or 16 digits that you can expect to be correct for the exact [terminating or recurring] decimal expansion of the fraction 1/d, beyond that, the expansions differ.
For example, 1/71 has a recurring cycle 01408450704225352112676056338028169 of length 35 (by far not the longest in the range to be considered). The closest representable Double to 1/71 is
0.01408450704225352144438598855913369334302842617034912109375 = 8119165525400331 / (2^59)
of which the first 17 significant digits are correct (and 0.014084507042253521 is also what showFFloat Nothing (1/71) "" gives you).
To find the longest recurring cycle in the decimal expansion of 1/d, you can use an exact (or sufficiently accurate finite) string representation of the Rational number 1 % d, or, better, use pure integer arithmetic (compute the decimal expansion using long division) without involving a Rational.

Convert text to numbers while preserving ordering?

I've got a strange requirements, which I can't seem to get my head around. I need to come up with a function that would take a text string and return a number corresponding to that string - in such a way that, when sorted, these numbers would go in the same order as the original strings. For example, if I the function produces this mapping:
"abcd" -> x
"abdef" -> y
"xyz" -> z
then the numbers must be such that x < y < z. The strings can be arbitrary length, but always non-empty and the string comparison should be case-insensitive (i.e. "ABC" and "abc" should result in the same numerical value).
My first though was to map each letter to a corresponding number 1 through 26 and then just get the resulting number, e.g. a = 1, b = 2, c = 3, ..., z = 26, then "abc" would become 1*26^2 + 2*26 + 3, however then I realised that the text string can contain any text in any language (i.e. full unicode), so this isn't going to work. At this point I'm stuck. Any other ideas before I tell the client to sod off?
P.S. This strange requirement is due to a limitation in a proprietary system that can only do sorting by a numeric field. If the sorting is required by any other field type, it must be converted to some numerical representation - and then sorted. Don't ask.
You can make this work if you allow for arbitrary-precision real numbers, though that kinda feels like cheating. Unicode strings are sequences of characters drawn from 1,114,112 options. You can therefore think of them as decimal base-1,114,113 numbers: write 0., then write out your Unicode string, and you have a real number in base-1,114,113 (shift each character's numeric value up by one so that missing characters have the value 0). Comparing two of these numbers in base-1,114,113 compares the numbers lexicographically: if you scan the digits from left-to-right, the first digit that they disagree on tiebreaks between the two. This approach is completely infeasible unless you have an arbitrary-precision real number library.
If you just have IEEE-734 doubles, this approach won't work. One way to see this is that there are at most 264 possible doubles (or 280 of them if you allow for long doubles) because there are only 64 (80) bits in a double, but there are infinitely many different strings. That eliminates the possibility simply because there are too many strings to go around.
Unfortunately, you can't make this work if you have arbitrary-precision integers. The natural ordering on strings has the fun property that you can find pairs of strings that have infinitely many strings lexicographically between them. For example, notice that
a < ab < aab < aaab < aaaab < ... < b
Now imagine that you have a function that maps each string to an integer that obeys the rules you'd like. That would mean that
f(a) < f(ab) < f(aab) < f(aaab) < f(aaaab) < ... < f(b)
But that's not possible in the integers - you can't have two integers f(a) and f(b) with infinitely many integers between them. (The number of integers between f(a) and f(b) is at most f(b) - f(a) - 1).
So it seems like the answer is "this is possible if you have arbitrary-precision real numbers, it's not possible with doubles, and it's not possible with arbitrary-precision integers." I'd basically label that "not going to happen in practice" even though it's theoretically possible. :-)

Explanations about the mechanics of a simple factorial function

I'm new to Haskell, so I'm both naive and curious.
There is a definition of a factorial function:
factorial n = product [1..n]
I naively understand this as: make the product of every number between 1 and n. So, why does
factorial 0
return 1 (which is the good result as far as my maths are not too rusted)?
Thank you
That's because of how product is defined, something like:
product [] = 1
product (n:ns) = n * product ns
or equivalently
product = foldr (*) 1
via the important function foldr:
foldr f z [] = z
foldr f z (x:xs) = f x (foldr f z xs)
Read up on folding here. But basically, any recursion must have a base case, and product's base case (on an empty list) clearly has to be 1.
The story about empty product is long and interesting.
It has many sense to define it as 1.
Despite of that, there is some more debate about whether we are justified to define 00 as 1, although 00 can be thought of also as an empty product in most contexts. See the 00 debate here and also here.
Now I show an example, when empty product conventions can yield a surprising, unintuitive outcome.
How to define the concept of a prime, without the necessity to exclude 1 explicitly? It seems so unaesthetic, to say that "a prime is such and such, except for this and that". Can the concept of prime be defined with some handy definition which can exclude 1 in a "natural", "automatic" way, without mentioning the exclusion explicitly?
Let us try this approach:
Let us call a natural number c composite, iff c can be written as a product of some a1, ..., ⋅ an natural numbers, so that all of them must be different from c.
Let us call a natural number p prime, iff p cannot be written as a product of any a1, an natural numbers each differing from p.
Let us test whether this approach is any good:
6 = 6 ⋅ 1 3 ⋅ 26 is composite, this fact is witnessed by the following factorisation: 6 can be written as the product 3 ⋅ 2, or with other words, product of the ⟨3, 2⟩ sequence, notated as Π ⟨3, 2⟩.
Till now, our approach new is O.K.
5 = 5 ⋅ 1 1 ⋅ 55 is prime, there is no sequence ⟨a1, ... an⟩ such that
all its members a1, ... an would differ from 5
but the product itself, Π ⟨a1, ... an⟩ would equal 5.
Till now, our new approach is O.K.
Now let us investigate 1:
1 = Π ⟨⟩,
Empty product is a good witness, with it, 1 satisfies the definition of being a composite(!!!) Who is the witness? Where is the witnessing factorization? It is no other than the empty product Π ⟨⟩, the product of the empty sequence ⟨⟩.
Π ⟨⟩ equals 1
All factors of the empty product Π ⟨⟩, i.e. the members of the empty sequence ⟨⟩ satisfy that each of them differ from 1: simply because empty sequence ⟨⟩ does not have any members at all, thus none of its member can equal 1. (This argumentation is simply a vacuous truth, with members of the empty set).
thus 1 is a composite (with the trivial factorization of the Π ⟨⟩ empty product).
Thus, 1 is excluded being a prime, naturally and automatically, by definition. We have reached our goal. For this, we have exploited the convention about empty product being 1.
Some drawbacks: although we succeeded to exclude 1 being a prime, but at the same time, 0 "slipped in": 0 became a prime (at least in zero-divisor free rings, like natural numbers). Although this strange thing makes some theorems more concise formally (Goldbach conjecture, fundamental theorem of arithmetic), but I cannot stand for that it is not a drawback.
A bigger drawback, that some concepts of arithmetic seem to become untenable with this new approach.
In any case, I wanted only to demonstrate that defining the empty product as 1 can yield formalizing unintuitive things (which is not necessarily a problem, set theory abounds with unintuitive things, see how to produce gold for free), but at the same time, it can provide useful strength in some contexts.
It's traditional to define the product of all the elements of the empty list to be 1, just as it's traditional to define the sum of all the elements of the empty list to be 0. That way
(product list1) * (product list2) == product (list1 ++ list2)
among other convenient properties.
Also, your memory is correct, and 0! is defined to be 1. This also has many convenient properties, including being consistent with the definition of factorials in terms of the gamma function.
Not sure I understand your question, are you asking how to write such a function?
Just as an exercise, you could use pattern matching to approach it like this:
factorial :: Int->Int
factorial 0 = 1
factorial n = product [1..n]
The first line is the function declaration/type signature. The second two lines are equations defining the function - Haskell pattern matching matches up the actual runtime parameter to whichever equation is appropriate.
Of course as others have pointed out, the product function handles the zero case correctly for you.

Resources