Haskell: Find object in list by property - haskell

Is there a way of finding a given "object" by one of it's property?
I tried pattern matching like i would have done in logic programming but I can't figure it out:
data Object = Object {
_prop1 :: type,
_prop2 :: color,
_prop3 :: pos
} deriving Eq
type Square = Maybe Object
type Board = [[Square]]
objectlist::Board
objectlist = [[ Just (Object type color pos), Just (Object type color pos)]
...
[ Just (Object type color pos), Just (Object type color pos)]
index_of :: (Int, Int)->Int
index_of (x,y) = fromJust $ elemIndex piece objectlist
where
piece = Piece _ _ (x,y)
Also, I think my approach to find the index is not good. I used it with a simple list but can't find how to do it with a 2 dim list.

As you stated as a comment in another answer you are looking for the index in a 2D list. Therefore I think the type of index_of should be (Int, Int) -> (Int, Int).
The function findIndex was suggested in the other answer to help you make index_of. What you need is a generic 2D version of it. Here is how you could implement findIndex2D:
import Data.List
import Data.Maybe
findIndex2D :: (a -> Bool) -> [[a]] -> Maybe (Int, Int)
findIndex2D pred xs = do
let maybeIndices = map (findIndex pred) xs
y <- findIndex isJust maybeIndices
x <- maybeIndices !! y
return (x, y)

You can use findIndex to get something of that effect.
index_of :: (Int, Int)->Int
index_of (x,y) = fromJust $ findIndex piece objectlist
where
piece (Piece _ _ pos) = pos == (x,y)

Related

Get all string splits

Say I have a string:
"abc7de7f77ghij7"
I want to split it by a substring, 7 in this case, and get all the left-right splits:
[ ("abc", "de7f77ghij7")
, ("abc7de", "f77ghij7")
, ("abc7de7f", "7ghij7")
, ("abc7de7f7", "ghij7")
, ("abc7de7f77ghij", "")
]
Sample implementation:
{-# LANGUAGE OverloadedStrings #-}
module StrSplits where
import qualified Data.Text as T
splits :: T.Text -> T.Text -> [(T.Text, T.Text)]
splits d s =
let run a l r =
case T.breakOn d r of
(x, "") -> reverse a
(x, y) ->
let
rn = T.drop (T.length d) y
an = (T.append l x, rn) : a
ln = l `T.append` x `T.append` d
in run an ln rn
in run [] "" s
main = do
print $ splits "7" "abc7de7f77ghij7"
print $ splits "8" "abc7de7f77ghij7"
with expected result:
[("abc","de7f77ghij7"),("abc7de","f77ghij7"),("abc7de7f","7ghij7"),("abc7de7f7","ghij7"),("abc7de7f77ghij","")]
[]
I'm not too happy about the manual recursion and let/case/let nesting. If my feeling that it doesn't look too good is right, is there a better way to write it?
Is there a generalized approach to solving these kinds of problems in Haskell similar to how recursion can be replaced with fmap and folds?
How about this?
import Data.Bifunctor (bimap)
splits' :: T.Text -> T.Text -> [(T.Text, T.Text)]
splits' delimiter string = mkSplit <$> [1..numSplits]
where
sections = T.splitOn delimiter string
numSplits = length sections - 1
mkSplit n = bimap (T.intercalate delimiter) (T.intercalate delimiter) $ splitAt n sections
I like to believe there's a way that doesn't involve indices, but you get the general idea. First split the string by the delimiter. Then split that list of strings at in two everywhere possible, rejoining each side with the delimiter.
Not the most efficient, though. You can probably do something similar with indices from Data.Text.Internal.Search if you want it to be fast. In this case, you wouldn't need to do the additional rejoining. I didn't experiment with it since I didn't understand what the function was returning.
Here's an indexless one.
import Data.List (isPrefixOf, unfoldr)
type ListZipper a = ([a],[a])
moveRight :: ListZipper a -> Maybe (ListZipper a)
moveRight (_, []) = Nothing
moveRight (ls, r:rs) = Just (r:ls, rs)
-- As Data.List.iterate, but generates a finite list ended by Nothing.
unfoldr' :: (a -> Maybe a) -> a -> [a]
unfoldr' f = unfoldr (\x -> (,) x <$> f x)
-- Get all ways to split a list with nonempty suffix
-- Prefix is reversed for efficiency
-- [1,2,3] -> [([],[1,2,3]), ([1],[2,3]), ([2,1],[3])]
splits :: [a] -> [([a],[a])]
splits xs = unfoldr' moveRight ([], xs)
-- This is the function you want.
splitsOn :: (Eq a) => [a] -> [a] -> [([a],[a])]
splitsOn sub xs = [(reverse l, drop (length sub) r) | (l, r) <- splits xs, sub `isPrefixOf` r]
Try it online!
Basically, traverse a list zipper to come up with a list of candidates for the split. Keep only those that are indeed splits on the desired item, then (un)reverse the prefix portion of each passing candidate.

Haskell: Convert String to [(String,Double)]

I parse an XML and get an String like this:
"resourceA,3-resourceB,1-,...,resourceN,x"
I want to map that String into a list of tuples (String,Double), like this:
[(resourceA,3),(resourceB,1),...,(resourceN,x)]
How is it possible to do this? I ve looked into the map function and also the split one. I am able to split the string by "-" but anything else...
This is the code i have so far:
split :: Eq a => a -> [a] -> [[a]]
split d [] = []
split d s = x : split d (drop 1 y) where (x,y) = span (/= d) s
it is just a function to split my string into a list of Stirng, but then i dont know how to continue.
What I want to do know is to loop over that new list that i have created with the split method and for each element create a tuple. I hace tried with the map function but i dont get it to compile even
So in Haskell you dont really mutate any value, instead you'll create a new list of pairs from the string you've described, so the solution would look something similar to the following:
import Data.List.Split
xmlList = splitOn "-" "resourceA,3-resourceB,4-resourceC,6"
commaSplit :: String -> [String]
commaSplit = splitOn ","
xmlPair :: [String] -> [(String, Double)] -- might be more efficient to use Text instead of String
xmlPair [x] = [(\x' -> ((head x') :: String, (read (last x')) :: Double )) (commaSplit x)]
xmlPair (x:xs) = xmlPair [x] ++ xmlPair xs
main :: IO ()
main = mapM_ (\(a,b) -> putStrLn (show a++" = "++ show b)) (xmlPair $ xmlList)
This is my quick and dirty way of showing things but I'm sure someone can always add a more detailed answer.

How to define a lambda function that filters list based on subtype of a sum type?

The example is taken from a "Haskell programming from first principles"
The goal of filter function is get rid of all the objects except those of 'DbDate' type.
On somone's github I found a way to filter sum types with list comprehension and pattern matching(1). Now I am trying to find a way to redefine this filter with a lambda function(2) or normal "case of" of "if then" function. I do not know how to properly check the type of arguments of a function when I deal with custom data type.
Book doesn't introduce the reader to any super specific library functions, just standard maps, folds, filters and other stuff you'd find in prelude.
import Data.Time
data DatabaseItem = DbString String
| DbNumber Integer
| DbDate UTCTime
deriving (Eq, Ord, Show)
--List that needs to be filtered
theDatabase :: [DatabaseItem]
theDatabase =
[ DbDate (UTCTime (fromGregorian 1911 5 1)
(secondsToDiffTime 34123))
, DbNumber 9001
, DbString "Hello, world!"
, DbDate (UTCTime (fromGregorian 1921 5 1)
(secondsToDiffTime 34123))
]
--1 works fine, found on someone's git hub
filterDbDate :: [DatabaseItem] -> [UTCTime]
filterDbDate dbes = [x | (DbDate x) <- dbes]
--2 Looking for the eqivalents with lambda or "case" or "if then"
--pattern is not satisfactory
filterDbDate :: [DatabaseItem] -> [UTCTime]
filterDbDate dbes = filter (\(DbDate x) -> True) theDatabase
filter has the type (a -> Bool) -> [a] -> [a] so it is not able to change the type of your list.
According to The Haskell 98 Report (section 3.11) the list comprehension used in the code you found on github desugars to:
filterDbDate2 :: [DatabaseItem] -> [UTCTime]
filterDbDate2 dbes = let extractTime (DbDate time) = [time]
extractTime _ = []
in concatMap extractTime theDatabase
You can rewrite extractTime to use case ... of:
filterDbDate3 :: [DatabaseItem] -> [UTCTime]
filterDbDate3 dbes = let extractTime item = case item of (DbDate time) -> [time]
_ -> []
in concatMap extractTime theDatabase
And replace it by a lambda:
filterDbDate4 :: [DatabaseItem] -> [UTCTime]
filterDbDate4 dbes = concatMap (\item ->
case item of
(DbDate time) -> [time]
_ -> [])
theDatabase
But imho your original solution using list comprehension looks the best:
filterDbDate dbes = [x | (DbDate x) <- dbes]
As #Niko has already said in his answer, filter cannot change the type. However, there is a variant of filter which can: Data.Maybe.mapMaybe :: (a -> Maybe b) -> [a] -> [b]. The idea is that if you want to keep an element, then you return Just newvalue from the lambda; otherwise you return Nothing. In that case, you could rewrite filterDbDate as:
import Data.Maybe
filterDbDate dbes = mapMaybe (\x -> case x of { DBDate d -> Just d; _ -> Nothing }) dbes
Personally, I would say that this is the second-clearest way to write this function (after the list comprehension method).
You were indeed on the right track, as pattern matching is an easy way of solving this, however you will get error as your pattern-matching is not comprehensive. Also, note that if you use filter, you will still get a list of [DatabaseItem] as filter never changes the type. You can however use map to do it. So:
Case Of
You can have a case .. of inside your lambda function:
filterDbDate' :: [DatabaseItem] -> [UTCTime]
filterDbDate' = map (\(DbDate x) -> x) .filter (\x ->
case x of
DbDate x -> True
_ -> False)
Recursion + Pattern Matching
However I think it's more clear using a recursion:
filterDbDate'' :: [DatabaseItem] -> [UTCTime]
filterDbDate'' [] = []
filterDbDate'' ((DbDate d):ds) = d : filterDbDate ds
filterDbDate'' (_:ds) = filterDbDate ds
Best Way
To be honest, when you have to mix up filter and map, and your lambdas are easy like this one, list comprehensions like yours are the cleanest way:
filterDbDate ds = [d | (DbDate d) <- ds]

Monadic excerise Haskell. I can't deal with that

I am trying to write my function which extract numbers from string, for example:
"321 43 123 213" -> [321, 43, 123, 3212]
"dsa" -> Error
"123 da" -> Error
And I would like to do it using readEither and in monadic way ( I try to understand monads). My attemption:
import Text.Read
unit :: Either String [Int]
unit = Right []
extractInt :: String -> Either String [Int]
extractInt s = helper (words s) where
helper (h:t) = (bind readEither h) . (helper t)
helper [] = Right []
bind :: (String -> Either String Int) -> String -> (Either String [Int] -> Either String [Int])
bind f x z = bind' (f x) z where
bind' (Left s) _ = Left s
bind' (Right i) (Right l) = Right (l ++ [i])
bind' (Left s) _ = Left s
Please help me solve my problem.
Please say something my solution.
Please say my how to do it correctly. ;)
Error:
Couldn't match expected type `a0 -> Either String [Int]'
with actual type `Either a1 [t0]'
In the return type of a call of `Right'
Probable cause: `Right' is applied to too many arguments
In the expression: Right [1]
In an equation for `helper': helper [] = Right [1]
Failed, modules loaded: none.
If you want "something with >>=" your helper function should look like:
helper [] = Right []
helper (w:ws) = readEither w >>= \i -> fmap (i:) (helper ws)
Explanation: Clearly, for an empty list of words, we want an empty list of integers. For a nonempty list, we do readEither on the first word, which gives us an Either String Int. The bind (>>=) will pass the resulting integer to the function on the right hand side, but only if the result was Right If it was Left this is the overall result of the helper.
Now, the function on the right hand side of (>>=) applies the helper to the remaining words. As we know, this will result in Either String [Int]. Then it prepends the integer that resulted from conversion of the first word to the list in the Right result, if there is one. If, however, helper returned a Left value, the fmap won't change anything, and so this will be the overall result.
So the 2nd line with the (>>=) expands approxiamtely to the following code:
case readEither w of
Left err -> Left err
Right int -> case helper ws of
Left err -> Left err
Right ints -> Right (int:ints)
You could use the mapM function to monadically map over the words:
extractInt :: String -> Either String [Int]
extractInt s = mapM readEither (words s)
If any one call to readEither happens to return Left, then the function will do so too. Is that what you are looking for?

Optimising Haskell data reading from file

I am trying to implement Kosaraju's graph algorithm, on a 3.5m line file where each row is two (space separated) Ints representing a graph edge. To start I need to create a summary data structure that has the node and lists of its incoming and outgoing edges. The code below achieves that, but takes over a minute, whereas I can see from posts on the MOOC forum that people using other languages are completing in <<10s. (getLines is taking 10s compared to under 1s in benchmarks I read about.)
I'm new to Haskell and have implemented an accumulation method using foldl' (the ' was a breakthrough in making it terminate at all), but it feels rather imperative in style, and I'm hoping that that's the reason why it is running slow. Moreover, I'm currently planning to use a similar pattern to conduct the depth-first-search, and I fear it will all just become too slow.
I have found this presentation and blog that talk about these sort of issues but at too expert a level.
import System.IO
import Control.Monad
import Data.Map.Strict as Map
import Data.List as L
type NodeName = Int
type Edges = [NodeName]
type Explored = Bool
data Node = Node Explored (Edges, Edges) deriving (Show)
type Graph1 = Map NodeName Node
getLines :: FilePath -> IO [[Int]]
getLines = liftM (fmap (fmap read . words) . lines) . readFile
getLines' :: FilePath -> IO [(Int,Int)]
getLines' = liftM (fmap (tuplify2 . fmap read . words) . lines) . readFile
tuplify2 :: [a] -> (a,a)
tuplify2 [x,y] = (x,y)
main = do
list <- getLines "testdata.txt" -- [String]
--list <- getLines "SCC.txt" -- [String]
let
list' = createGraph list
return list'
createGraph :: [[Int]] -> Graph1
createGraph xs = L.foldl' build Map.empty xs
where
build :: Graph1-> [Int] -> Graph1
build = \acc (x:y:_) ->
let tmpAcc = case Map.lookup x acc of
Nothing -> Map.insert x (Node False ([y],[])) acc
Just a -> Map.adjust (\(Node _ (fwd, bck)) -> (Node False ((y:fwd), bck))) x acc
in case Map.lookup y tmpAcc of
Nothing -> Map.insert y (Node False ([],[x])) tmpAcc
Just a -> Map.adjust (\(Node _ (fwd, bck)) -> (Node False (fwd, (x:bck)))) y tmpAcc
Using maps:
Use IntMap or HashMap when possible. Both are significantly faster for Int keys than Map. HashMap is usually faster than IntMap but uses more RAM and has a less rich library.
Don't do unnecessary lookups. The containers package has a large number of specialized functions. With alter the number of lookups can be halved compared to the createGraph implementation in the question.
Example for createGraph:
import Data.List (foldl')
import qualified Data.IntMap.Strict as IM
type NodeName = Int
type Edges = [NodeName]
type Explored = Bool
data Node = Node Explored Edges Edges deriving (Eq, Show)
type Graph1 = IM.IntMap Node
createGraph :: [(Int, Int)] -> Graph1
createGraph xs = foldl' build IM.empty xs
where
addFwd y (Just (Node _ f b)) = Just (Node False (y:f) b)
addFwd y _ = Just (Node False [y] [])
addBwd x (Just (Node _ f b)) = Just (Node False f (x:b))
addBwd x _ = Just (Node False [] [x])
build :: Graph1 -> (Int, Int) -> Graph1
build acc (x, y) = IM.alter (addBwd x) y $ IM.alter (addFwd y) x acc
Using vectors:
Consider the efficient construction functions (the accumulators, unfolds, generate, iterate, constructN, etc.). These may use mutation behind the scenes but are considerably more convenient to use than actual mutable vectors.
In the more general case, use the laziness of boxed vectors to enable self-reference when constructing a vector.
Use unboxed vectors when possible.
Use unsafe functions when you're absolutely sure about the bounds.
Only use mutable vectors when there aren't pure alternatives. In that case, prefer the ST monad to IO. Also, avoid creating many mutable heap objects (i. e. prefer mutable vectors to immutable vectors of mutable references).
Example for createGraph:
import qualified Data.Vector as V
type NodeName = Int
type Edges = [NodeName]
type Explored = Bool
data Node = Node Explored Edges Edges deriving (Eq, Show)
type Graph1 = V.Vector Node
createGraph :: Int -> [(Int, Int)] -> Graph1
createGraph maxIndex edges = graph'' where
graph = V.replicate maxIndex (Node False [] [])
graph' = V.accum (\(Node e f b) x -> Node e (x:f) b) graph edges
graph'' = V.accum (\(Node e f b) x -> Node e f (x:b)) graph' (map (\(a, b) -> (b, a)) edges)
Note that if there are gaps in the range of the node indices, then it'd be wise to either
Contiguously relabel the indices before doing anything else.
Introduce an empty constructor to Node to signify a missing index.
Faster I/O:
Use the IO functions from Data.Text or Data.ByteString. In both cases there are also efficient functions for breaking input into lines or words.
Example:
import qualified Data.ByteString.Char8 as BS
import System.IO
getLines :: FilePath -> IO [(Int, Int)]
getLines path = do
lines <- (map BS.words . BS.lines) `fmap` BS.readFile path
let pairs = (map . map) (maybe (error "can't read Int") fst . BS.readInt) lines
return [(a, b) | [a, b] <- pairs]
Benchmarking:
Always do it, unlike me in this answer. Use criterion.
Based pretty much on András' suggestions, I've reduced a 113 second task down to 24 (measured by stopwatch as I can't quite get Criterion to do anything yet) (and then down to 10 by compiling -O2)!!! I've attended some courses this last year that talked about the challenge of optimising for large datasets but this was the first time I faced a question that actually involved one, and it was as non-trivial as my instructors' suggested. This is what I have now:
import System.IO
import Control.Monad
import Data.List (foldl')
import qualified Data.IntMap.Strict as IM
import qualified Data.ByteString.Char8 as BS
type NodeName = Int
type Edges = [NodeName]
type Explored = Bool
data Node = Node Explored Edges Edges deriving (Eq, Show)
type Graph1 = IM.IntMap Node
-- DFS uses a stack to store next points to explore, a list can do this
type Stack = [(NodeName, NodeName)]
getBytes :: FilePath -> IO [(Int, Int)]
getBytes path = do
lines <- (map BS.words . BS.lines) `fmap` BS.readFile path
let
pairs = (map . map) (maybe (error "Can't read integers") fst . BS.readInt) lines
return [(a,b) | [a,b] <- pairs]
main = do
--list <- getLines' "testdata.txt" -- [String]
list <- getBytes "SCC.txt" -- [String]
let list' = createGraph' list
putStrLn $ show $ list' IM.! 66
-- return list'
bmark = defaultMain [
bgroup "1" [
bench "Sim test" $ whnf bmark' "SCC.txt"
]
]
bmark' :: FilePath -> IO ()
bmark' path = do
list <- getLines path
let
list' = createGraph list
putStrLn $ show $ list' IM.! 2
createGraph' :: [(Int, Int)] -> Graph1
createGraph' xs = foldl' build IM.empty xs
where
addFwd y (Just (Node _ f b)) = Just (Node False (y:f) b)
addFwd y _ = Just (Node False [y] [])
addBwd x (Just (Node _ f b)) = Just (Node False f (x:b))
addBwd x _ = Just (Node False [] [x])
build :: Graph1 -> (Int, Int) -> Graph1
build acc (x, y) = IM.alter (addBwd x) y $ IM.alter (addFwd y) x acc
And now on with the rest of the exercise....
This is not really an answer, I would rather comment András Kovács post, if I add those 50 points...
I have implemented the loading of the graph in both IntMap and MVector, in a attempt to benchmark mutability vs. immutability.
Both program use Attoparsec for the parsing. There is surely more economic way to do it, but Attoparsec is relatively fast compared to its high abstraction level (the parser can stand in one line). The guideline is to avoid String and read. read is partial and slow, [Char] is slow and not memory efficient, unless properly fused.
As András Kovács noted, IntMap is better than Map for Int keys. My code provides another example of alter usage. If the node identifier mapping is dense, you may also want to use Vector and Array. They allow O(1) indexing by the identifier.
The mutable version handle on demand the exponential growth of the MVector. This avoid to precise an upper bound on node identifiers, but introduce more complexity (the reference on the vector may change).
I benchmarked with a file of 5M edges with identifiers in the range [0..2^16]. The MVector version is ~2x faster than the IntMap code (12s vs 25s on my computer).
The code is here [Gist].
I will edit when more profiling is done on my side.

Resources