Haskell, sort a list of Int [duplicate] - haskell

This question already has an answer here:
How should the forever function be used?
(1 answer)
Closed 4 years ago.
I need to sort a list of Ints, [Int]. However when I use sort it gives me error message:Variable not in scope: sort :: [Int] -> [a0] how do I get around this problem?
my code simplified:
data Point= P Shape Int
getInt (P _ i)=i
sorted::[Point]->[Int]
sorted ps= sort(map getInt ps)

If you're searching for a function of which you know (or guess) the name, but don't know where it is defined, Hayoo is your best† friend. Asked for sort, it'll give
sort :: Ord a => [a] -> [a]
base -Data.List
> The sort function implements a stable sorting algorithm. It is a special case of sortBy, which allows the programmer to supply their own comparison function.
sort :: Ord a => [a] -> [a]
base -GHC.OldList
> The sort function implements a stable sorting algorithm. It is a special case of sortBy, which allows the programmer to supply their own comparison function.
sort :: ByteString -> ByteString
bytestring -Data.ByteString.Char8 Data.ByteString
> O(n) Sort a ByteString efficiently, using counting sort.
...
Well, you don't need anything about byte strings (or Seqs or other more advanced types), nor do you want to touch a module called GHC.OldList (this is some legacy stuff that may be used to quickly make old code compatible with new GHC versions); the Data.List version is apparently fine. So import that:
import Data.List (sort)
main :: IO ()
main = print $ sort [3,1,2]
†Hayoo is one of two popular Haskell search engines and works IMO better for searching by name. The alternative, Hoogle, is a bit more fiddly but can additionally search also by type signature. I recommend the Stackage version.

Related

How to make a custom Attoparsec parser combinator that returns a Vector instead of a list?

{-# LANGUAGE OverloadedStrings #-}
import Data.Attoparsec.Text
import Control.Applicative(many)
import Data.Word
parseManyNumbers :: Parser [Int] -- I'd like many to return a Vector instead
parseManyNumbers = many (decimal <* skipSpace)
main :: IO ()
main = print $ parseOnly parseManyNumbers "131 45 68 214"
The above is just an example, but I need to parse a large amount of primitive values in Haskell and need to use arrays instead of lists. This is something that possible in the F#'s Fparsec, so I've went as far as looking at Attoparsec's source, but I can't figure out a way to do it. In fact, I can't figure out where many from Control.Applicative is defined in the base Haskell library. I thought it would be there as that is where documentation on Hackage points to, but no such luck.
Also, I am having trouble deciding what data structure to use here as I can't find something as convenient as a resizable array in Haskell, but I would rather not use inefficient tree based structures.
An option to me would be to skip Attoparsec and implement an entire parser inside the ST monad, but I would rather avoid it except as a very last resort.
There is a growable vector implementation in Haskell, which is based on the great AMT algorithm: "persistent-vector". Unfortunately, the library isn't that much known in the community so far. However to give you a clue about the performance of the algorithm, I'll say that it is the algorithm that drives the standard vector implementations in Scala and Clojure.
I suggest you implement your parser around that data-structure under the influence of the list-specialized implementations. Here the functions are, btw:
-- | One or more.
some :: f a -> f [a]
some v = some_v
where
many_v = some_v <|> pure []
some_v = (fmap (:) v) <*> many_v
-- | Zero or more.
many :: f a -> f [a]
many v = many_v
where
many_v = some_v <|> pure []
some_v = (fmap (:) v) <*> many_v
Some ideas:
Data Structures
I think the most practical data structure to use for the list of Ints is something like [Vector Int]. If each component Vector is sufficiently long (i.e. has length 1k) you'll get good space economy. You'll have
to write your own "list operations" to traverse it, but you'll avoid re-copying data that you would have to perform to return the data in a single Vector Int.
Also consider using a Dequeue instead of a list.
Stateful Parsing
Unlike Parsec, Attoparsec does not provide for user state. However, you
might be able to make use of the runScanner function (link):
runScanner :: s -> (s -> Word8 -> Maybe s) -> Parser (ByteString, s)
(It also returns the parsed ByteString which in your case may be problematic since it will be very large. Perhaps you can write an alternate version which doesn't do this.)
Using unsafeFreeze and unsafeThaw you can incrementally fill in a Vector. Your s data structure might look
something like:
data MyState = MyState
{ inNumber :: Bool -- True if seen a digit
, val :: Int -- value of int being parsed
, vecs :: [ Vector Int ] -- past parsed vectors
, v :: Vector Int -- current vector we are filling
, vsize :: Int -- number of items filled in current vector
}
Maybe instead of a [Vector Int] you use a Dequeue (Vector Int).
I imagine, however, that this approach will be slow since your parsing function will get called for every single character.
Represent the list as a single token
Parsec can be used to parse a stream of tokens, so how about writing
your own tokenizer and letting Parsec create the AST.
The key idea is to represent these large sequences of Ints as a single token. This gives you a lot more latitude in how you parse them.
Defer Conversion
Instead of converting the numbers to Ints at parse time, just have parseManyNumbers return a ByteString and defer the conversion until
you actually need the values. This much enable you to avoid reifying
the values as an actual list.
Vectors are arrays, under the hood. The tricky thing about arrays is that they are fixed-length. You pre-allocate an array of a certain length, and the only way of extending it is to copy the elements into a larger array.
This makes linked lists simply better at representing variable-length sequences. (It's also why list implementations in imperative languages amortise the cost of copying by allocating arrays with extra space and copying only when the space runs out.) If you don't know in advance how many elements there are going to be, your best bet is to use a list (and perhaps copy the list into a Vector afterwards using fromList, if you need to). That's why many returns a list: it runs the parser as many times as it can with no prior knowledge of how many that'll be.
On the other hand, if you happen to know how many numbers you're parsing, then a Vector could be more efficient. Perhaps you know a priori that there are always n numbers, or perhaps the protocol specifies before the start of the sequence how many numbers there'll be. Then you can use replicateM to allocate and populate the vector efficiently.

most frequently occurring string (element) in a list?

I have made a function which prints every possible subsequence of a string. Now I need to make a function which prints the most common. Any ideas on where I can start. Not asking for fully coded functions just a place to start. Also, only using prelude functions (including base).
for example, if I enter "jonjo" my functions will return ["jonjo","jonj","jon","jo","j","onjo","onj"...] etc. The most common substring would be "jo".
In the case where there would be two or more most occurring substrings, only the longest would be printed. If still equal, any one of the substrings will suffice.
The problem as it is stated can be reduced to finding the most frequent character, since it is obvious that, for example, the first character in any "most frequent substring" will be AT LEAST as frequent as the substring itself.
I suggest you take a look at
sort :: Ord a => [a] -> [a]
from base Data.List
group :: Eq a => [a] -> [[a]]
from base Data.List
length :: [a] -> Int
from base Prelude, base Data.List
and
maximum :: Ord a => [a] -> a
from base Prelude, base Data.List
If you can really ony use prelude functions, then I suggest you implement these yourself, or design a datastructure to make this efficient, such as a trie.

Haskell function to cast as a type

If I want a version of reads that returns a list of (Int, String), the example I see is
f x = reads x :: [(Int,String)]
I'd like to know if there's a way to do this in point-free style (f = g . reads?), or if this is something you just don't/can't do in Haskell.
I haven't seen any examples of using a type as an argument, so it may not be doable.
I think what you're looking for is f = reads :: String -> [(Int, String)].
The idiomatic way to do what you're describing is
f :: String -> [(Int, String)]
f = reads
The benefit to doing it this way is if you have a polymorphic type the result is what you'd expect (because of the Dreaded Monomorphism Restriction).
There is currently no way, but I have a suggested language extension, SignatureSections, which will allow you to write f = (:: [(Int, String)]) . reads.
Yes, there is actually.
fixIS :: [(Int,String)] -> [(Int,String)]
fixIS = id
I usually have a couple of functions lying around for when I want to resolve polymorphis (i.e. when I want to fix the inhibition monoid of a Netwire wire to (), or if I want to fix a Num a to Int). Obviously there's no point in doing this if you only need that once. The reason you might wanna do it this way is that it occasionally flows better in function definitions. Depending on how smart your haskell compiler is it might be less efficient though (in theory, the id could be dropped in most situations... not sure if GHC actively looks for that if id is given another name though).

haskell load module in list

Hey haskellers and haskellettes,
is it possible to load a module functions in a list.
in my concrete case i have a list of functions all checked with or
checkRules :: [Nucleotide] -> Bool
checkRules nucs = or $ map ($ nucs) [checkRule1, checkRule2]
i do import checkRule1 and checkRule2 from a seperate module - i don't know if i will need more of them in the future.
i'd like to have the same functionality look something like
-- import all functions from Rules as rules where
-- :t rules ~~> [([Nucleotide] -> Bool)]
checkRules :: [Nucleotide] -> Bool
checkRules nucs = or $ map ($ nucs) rules
the program sorts Pseudo Nucleotide Sequences in viable and nonviable squences according to given rules.
thanks in advance ε/2
Addendum:
So do i think right - i need:
genList :: File -> TypeSignature -> [TypeSignature]
chckfun :: (a->b) -> TypeSignature -> Bool
at compile time.
but i can't generate a list of all functions in the module - as they most probably will have not the same type signature and hence not all fit in one list. so i cannot filter given list with chckfun.
In order to do this i either want to check the written type signatures in the source file (?) or the inferenced types given by the compiler(?).
another problem that comes to my mind is: not every function written in the source file might get exported ?
Is this a problem a haskell beginner should try to solve after 5 months of learning - my brain is shaped like a klein's bottle after all this "compile time thinking".
There is a nice package on Hackage just for this: language-haskell-extract. In particular, the Template Haskell function functionExtractor takes a regular expression and returns a list of the matching top level bindings as (name, value) pairs. As long as they all have matching types, you're good to go.
{-# LANGUAGE TemplateHaskell #-}
import Language.Haskell.Extract
myFoo = "Hello"
myBar = "World"
allMyStuff = $(functionExtractor "^my")
main = print allMyStuff
Output:
[("myFoo", "Hello"), ("myBar", "World")]

Random-Pivot Quicksort in Haskell

Is it possible to implement a quicksort in Haskell (with RANDOM-PIVOT) that still has a simple Ord a => [a]->[a] signature?
I'm starting to understand Monads, and, for now, I'm kind of interpreting monads as somethink like a 'command pattern', which works great for IO.
So, I understand that a function that returns a random number should actually return a monadic value like IO, because, otherwise, it would break referential transparency. I also understand that there should be no way to 'extract' the random integer from the returned monadic value, because, otherwise, it would, again, break referential transparency.
But yet, I still think that it should be possible to implement a 'pure' [a]->[a] quicksort function, even if it uses random pivot, because, it IS referential transparent. From my point of view, the random pivot is just a implementation detail, and shouldn't change the function's signature
OBS: I'm not actually interested in the specific quicksort problem (so, I don't want to sound rude but I'm not looking for "use mergesort" or "random pivot doesn't increase performance in practice" kind of answers) I'm actually interested in how to implement a 'pure' function that uses 'impure' functions inside it, in cases like quicksort, where I can assure that the function actually is a pure one.
Quicksort is just a good example.
You are making a false assumption that picking the pivot point is just an implementation detail. Consider a partial ordering on a set. Like a quicksort on cards where
card a < card b if the face value is less but if you were to evaluate booleans:
4 spades < 4 hearts (false)
4 hearts < 4 spades (false)
4 hearts = 4 spades (false)
In that case the choice of pivots would determine the final ordering of the cards. In precisely the same way
for a function like
a = get random integer
b = a + 3
print b
is determined by a. If you are randomly choosing something then your computation is or could be non deterministic.
OK, check this out.
Select portions copied form the hashable package, and voodoo magic language pragmas
{-# LANGUAGE FlexibleInstances, UndecidableInstances, NoMonomorphismRestriction, OverlappingInstances #-}
import System.Random (mkStdGen, next, split)
import Data.List (foldl')
import Data.Bits (shiftL, xor)
class Hashable a where
hash :: a -> Int
instance (Integral a) => Hashable a where
hash = fromIntegral
instance Hashable Char where
hash = fromEnum
instance (Hashable a) => Hashable [a] where
hash = foldl' combine 0 . map hash
-- ask the authors of the hashable package about this if interested
combine h1 h2 = (h1 + h1 `shiftL` 5) `xor` h2
OK, so now we can take a list of anything Hashable and turn it into an Int. I've provided Char and Integral a instances here, more and better instances are in the hashable packge, which also allows salting and stuff.
This is all just so we can make a number generator.
genFromHashable = mkStdGen . hash
So now the fun part. Let's write a function that takes a random number generator, a comparator function, and a list. Then we'll sort the list by consulting the generator to select a pivot, and the comparator to partition the list.
qSortByGen _ _ [] = []
qSortByGen g f xs = qSortByGen g'' f l ++ mid ++ qSortByGen g''' f r
where (l, mid, r) = partition (`f` pivot) xs
pivot = xs !! (pivotLoc `mod` length xs)
(pivotLoc, g') = next g
(g'', g''') = split g'
partition f = foldl' step ([],[],[])
where step (l,mid,r) x = case f x of
LT -> (x:l,mid,r)
EQ -> (l,x:mid,r)
GT -> (l,mid,x:r)
Library functions: next grabs an Int from the generator, and produces a new generator. split forks the generator into two distinct generators.
My functions: partition uses f :: a -> Ordering to partition the list into three lists. If you know folds, it should be quite clear. (Note that it does not preserve the initial ordering of the elements in the sublists; it reverses them. Using a foldr could remedy this were it an issue.) qSortByGen works just like I said before: consult the generator for the pivot, partition the list, fork the generator for use in the two recursive calls, recursively sort the left and right sides, and concatenate it all together.
Convenience functions are easy to compose from here
qSortBy f xs = qSortByGen (genFromHashable xs) f xs
qSort = qSortBy compare
Notice the final function's signature.
ghci> :t qSort
qSort :: (Ord a, Hashable a) => [a] -> [a]
The type inside the list must implement both Hashable and Ord. There's the "pure" function you were asking for, with one logical added requirement. The more general functions are less restrictive in their requirements.
ghci> :t qSortBy
qSortBy :: (Hashable a) => (a -> a -> Ordering) -> [a] -> [a]
ghci> :t qSortByGen
qSortByGen
:: (System.Random.RandomGen t) =>
t -> (a -> a -> Ordering) -> [a] -> [a]
Final notes
qSort will behave exactly the same way for all inputs. The "random" pivot selection is. in fact, deterministic. But it is obscured by hashing the list and then seeding a random number generator, making it "random" enough for me. ;)
qSort also only works for lists with length less than maxBound :: Int, which ghci tells me is 9,223,372,036,854,775,807. I thought there would be an issue with negative indexes, but in my ad-hoc testing I haven't run into it yet.
Or, you can just live with the IO monad for "truer" randomness.
qSortIO xs = do g <- getStdGen -- add getStdGen to your imports
return $ qSortByGen g compare xs
ghci> :t qSortIO
qSortIO :: (Ord a) => [a] -> IO [a]
ghci> qSortIO "Hello world"
" Hdellloorw"
ghci> qSort "Hello world"
" Hdellloorw"
In such cases, where you know that the function is referentially transparent, but you can't proof it to the compiler, you may use the function unsafePerformIO :: IO a -> a from the module Data.Unsafe.
For instance, you may use unsafePerformIO to get an initial random state and then do anything using just this state.
But please notice: Don't use it if it's not really needed. And even then, think twice about it. unsafePerformIO is somewhat the root of all evil, since it's consequences can be dramatical - anything is possible from coercing different types to crashing the RTS using this function.
Haskell provides the ST monad to perform non-referentially-transparent actions with a referentially transparent result.
Note that it doesn't enforce referential transparency; it just insures that potentially non-referentially-transparent temporary state can't leak out. Nothing can prevent you from returning manipulated pure input data that was rearranged in a non-reproducible way. Best is to implement the same thing in both ST and pure ways and use QuickCheck to compare them on random inputs.

Resources