Split string to substring - haskell

I want to create function that split string to list of substrings where each substring have length of k:
*Main> split_string_to_kmers "some_text" 3
["som","ome","me_","e_t","_te","tex","ext"]
Here is my solution:
split_string_to_kmers s k = split_string_to_kmers_helper s k []
where split_string_to_kmers_helper [] k acc = acc
split_string_to_kmers_helper s k acc
| length s >= k = split_string_to_kmers_helper (tail s) k (acc ++ [(take k s)])
| otherwise = acc
I am just wondering if there is a way to rewrite my code so it will be more haskell specific.

I guess this slightly different enough.
import Data.List (tails)
mySplit :: String -> Int -> [String]
mySplit str k = filter (\s -> length s == k) $ map (take k) (tails str)
You could make this more efficient by combining the filter and the map. But that is up to you.

Simple solution is next (not the same tail of list):
import Data.List.Split(chop)
splitRepN n = chop (\xs -> (take n xs,tail xs))
And we have next results:
> splitRepN 3 "some_text"
["som","ome","me_","e_t","_te","tex","ext","xt","t"]
And we cut short tails for full solution:
splitRepN' n = takeWhile ((== n). length) . splitRepN n
> splitRepN' 3 "some_text"
["som","ome","me_","e_t","_te","tex","ext"]

Related

rotate a string in haskell with some exceptions

I want to rotate a string in haskell, so if I give "Now I want to scream" to rotate [[want to scream now I],[scream now I want to]] , if the string start with "I" or "to" then must eliminate it. Till now I still have problems with the rotation.
reverseWords :: String -> String
reverseWords = unwords . reverse . words
shiftt :: [a] -> Int -> [a]
shiftt l n = drop n l ++ take n l
rot::String->[String]
rot l = [ reverseWords l i | i <- [0 .. (length l) -1]]
create a list of all rotations, then filter out based on your predicate. For example,
rotations x = take (length x) $ iterate rot1 x
where rot1 = drop 1 x ++ take 1 x
filteredRots = map unwords . filter (\x -> length (head x) > 2) . rotations . words
and use as
> filteredRots "Now I want to scream"
["Now I want to scream","want to scream Now I","scream Now I want to"]
Prelude>

Generate next lexicographical string in Haskell

If I was given a string like skhfbvqa, how would I generate the next string? For this example, it would be skhfbvqb, and the next string of that would be skhfbvqc, and so on. The given string (and the answer) will always be N characters long (in this case, N=8).
What I tried:
I tried to generate the entire (infinite) list of possible combinations, and get the required (next) string of the given string, but unsurprisingly, it's so slow, that I don't even get the answer for N=6.
I used list comprehension:
allStrings = [ c : s | s <- "" : allStrings, c <- ['a'..'z'] ]
main = do
input <- readFile "k.in"
putStrLn . head . tail . dropWhile (not . (==) input) . map reverse $ allStrings
(Please excuse my incredibly bad Haskell-ing :) Still a noob)
So my question is, how can I do this? If there are multiple methods, a comparison between them is much appreciated. Thanks!
Here's a version with base conversion (this way you could add and subtract arbitrarily if you like):
encode x base = encode' x [] where
encode' x' z | x' == 0 = z
| otherwise = encode' (div x' base) ((mod x' base):z)
decode num base =
fst $ foldr (\a (b,i) -> (b + a * base^i,i + 1)) (0,0) num
Output:
*Main> map (\x -> toEnum (x + 97)::Char)
$ encode (decode (map (\x -> fromEnum x - 97) "skhfbvqa") 26 + 1) 26
"skhfbvqb"
I would go and make a helper function f :: Integer -> String and one g :: String -> Integer, where f 1 = "a", ... f 27 = "aa", f 28 = "ab" and so on and the inverse g.
Then incrementString = f . succ . g
Note: I omitted the implementation of f on purpose for learning
Update
for a different approach you could define a increment with carry function inc' :: Char -> (Char, Bool), and then
incString :: String -> String
incString = reverse . incString'
where incString' [] = []
incString' (x:xs) = case inc' x of (x',True) -> x': incString' xs
(x',False) -> x':xs
Note: this function is not tail recursive!
I found this to work. It just uses pattern matching to see if the string begins with a z and adds an additional a accordingly.
incrementString' :: String -> String
incrementString' [] = ['a']
incrementString' ('z':xs) = 'a' : incrementString' xs
incrementString' (x:xs) = succ x : xs
incrementString :: String -> String
incrementString = reverse . incrementString' . reverse

Implementing Backtracking on Haskell

I have a problem making Backtracking on Haskell, I know how to do recursive functions but I get troubles when I try to get multiple solutions or the best one (backtracking).
There's a list with some strings, then I need to get the solutions to get from a string to another one changing one letter from the string, I will get the list, the first string and the last one. If there is solution return the count of steps that it did, if there is not solution it returns -1. here's an example:
wordF ["spice","stick","smice","stock","slice","slick","stock"] "spice" "stock"
Then I have my list and I need to start with "spice" and get to "stock"
and the best solution is ["spice","slice","slick","stick","stock"] with four steps to get from "spice" to "stock". then it return 4.
Another solution is ["spice","smice","slice","slick","stick","stock"] with five steps to get to "stock" then it return `5. But this is a wrong solution because there's another one that's better with lesser steps than this one.
I'm having troubles making a backtracking to get the best solution, because I don't know how to make that my code search another solutions and just not one..
Here's a code that i tried to make but i get some errors, btw i dont know if my way to "make" backtracking is good or if there are some mistakes that im not seeing..
wordF :: [String] -> String -> String -> (String, String, Int)
wordF [] a b = (a, b, -1)
wordF list a b | (notElem a list || notElem b list) = (a, b, -1)
| otherwise = (a, b, (wordF2 list a b [a] 0 (length list)))
wordF2 :: [String] -> String -> String -> [String] -> Int -> Int -> Int
wordF2 list a b list_aux cont maxi | (cont==maxi) = 1000
| (a==b) = length list_aux
| (a/=b) && (cont<maxi) && notElemFound && (checkin /= "ThisWRONG") && (wording1<=wording2) = wording1
| (a/=b) && (cont<maxi) && notElemFound && (checkin /= "ThisWRONG") && (wording1>wording2) = wording2
| (a/=b) && (checkin == "ThisWRONG") = wordF2 list a b list_aux (cont+1) maxi
where
checkin = (check_word2 a (list!!cont) (list!!cont) 0)
wording1 = (wordF2 list checkin b (list_aux++[checkin]) 0 maxi)
wording2 = (wordF2 list checkin b (list_aux++[checkin]) 1 maxi)
notElemFound = ((any (==(list!!cont)) list_aux) == False)
check_word2 :: String -> String -> String -> Int -> String
check_word2 word1 word2 word3 dif | (dif > 1) = "ThisWRONG"
| ((length word1 == 1) && (length word2 == 1) && (head word1 == head word2)) = word3
| ((length word1 == 1) && (length word2 == 1) && (head word1 /= head word2) && (dif<1)) = word3
| ((head word1) == (head word2)) = check_word2 (tail word1) (tail word2) word3 dif
| otherwise = check_word2 (tail word1) (tail word2) word3 (dif+1)
My first function wordF2 get the list, the start, the end, an auxiliary list to get the current solution with the first element that always will be there ([a]), a counter with 0, and the max size of the counter (length list)..
and the second function check_word2 it checks if a word can pass to another word, like "spice" to "slice" if it cant like "spice" to "spoca" it returns "ThisWRONG".
This solution gets an error of pattern match failure
Program error: pattern match failure: wordF2 ["slice","slick"] "slice" "slick" ["slice"] 0 1
I was trying with little cases and nothing, and I'm restricting that i get a wrong position of the list with the count and the max.
Or may be I dont know how to implement backtracking on haskell to get multiple solutions, the best solution, etc..
UPDATE: I did a solution but its not backtracking
wordF :: [String] -> String -> String -> (String, String, Int)
wordF [] a b = (a, b, -1)
wordF list a b | (notElem a list || notElem b list) = (a, b, -1)
| otherwise = (a, b, (wordF1 list a b))
wordF1 :: [String] -> String -> String -> Int
wordF1 list a b | ((map length (wordF2 (subconjuntos2 (subconjuntos list) a b))) == []) = -1
| (calculo > 0) = calculo
| otherwise = -1
where
calculo = (minimum (map length (wordF2 (subconjuntos2 (subconjuntos list) a b))))-1
wordF2 :: [[String]] -> [[String]]
wordF2 [[]] = []
wordF2 (x:xs) | ((length xs == 1) && ((check_word x) == True) && ((check_word (head xs)) == True)) = x:xs
| ((length xs == 1) && ((check_word x) == False) && ((check_word (head xs)) == True)) = xs
| ((length xs == 1) && ((check_word x) == True) && ((check_word (head xs)) == False)) = [x]
| ((length xs == 1) && ((check_word x) == False) && ((check_word (head xs)) == False)) = []
| ((check_word x) == True) = x:wordF2 xs
| ((check_word x) == False ) = wordF2 xs
check_word :: [String] -> Bool
check_word [] = False
check_word (x:xs) | ((length xs == 1) && ((check_word2 x (head xs) 0) == True)) = True
| ((length xs >1) && ((check_word2 x (head xs) 0) == True)) = True && (check_word xs)
| otherwise = False
check_word2 :: String -> String -> Int -> Bool
check_word2 word1 word2 dif | (dif > 1) = False
| ((length word1 == 1) && (length word2 == 1) && (head word1 == head word2)) = True
| ((length word1 == 1) && (length word2 == 1) && (head word1 /= head word2) && (dif<1)) = True
| ((head word1) == (head word2)) = check_word2 (tail word1) (tail word2) dif
| otherwise = check_word2 (tail word1) (tail word2) (dif+1)
subconjuntos2 :: [[String]] -> String -> String -> [[String]]
subconjuntos2 [] a b = []
subconjuntos2 (x:xs) a b | (length x <= 1) = subconjuntos2 xs a b
| ((head x == a) && (last x == b)) = (x:subconjuntos2 xs a b)
| ((head x /= a) || (last x /= b)) = (subconjuntos2 xs a b)
subconjuntos :: [a] -> [[a]]
subconjuntos [] = [[]]
subconjuntos (x:xs) = [x:ys | ys <- sub] ++ sub
where sub = subconjuntos xs
Mmm may be its inefficient but at least it does the solution..
i search all posible solutions, i compare head == "slice" and last == "stock", then i filter the ones that are solution and print the shorter one,
thanks and if you guys have any suggest say it :)
Not thoroughly tested, but this hopefully will help:
import Data.Function (on)
import Data.List (minimumBy, delete)
import Control.Monad (guard)
type Word = String
type Path = [String]
wordF :: [Word] -> Word -> Word -> Path
wordF words start end =
start : minimumBy (compare `on` length) (generatePaths words start end)
-- Use the list monad to do the nondeterminism and backtracking.
-- Returns a list of all paths that lead from `start` to `end`
-- in steps that `differByOne`.
generatePaths :: [Word] -> Word -> Word -> [Path]
generatePaths words start end = do
-- Choose one of the words, nondeterministically
word <- words
-- If the word doesn't `differByOne` from `start`, reject the choice
-- and backtrack.
guard $ differsByOne word start
if word == end
then return [word]
else do
next <- generatePaths (delete word words) word end
return $ word : next
differsByOne :: Word -> Word -> Bool
differsByOne "" "" = False
differsByOne (a:as) (b:bs)
| a == b = differsByOne as bs
| otherwise = as == bs
Example run:
>>> wordF ["spice","stick","smice","stock","slice","slick","stock"] "spice" "stock"
["spice","slice","slick","stick","stock"]
The list monad in Haskell is commonly described as a form of nondeterministic, backtracking computation. What the code above is doing is allowing the list monad to take on the responsibility of generating alternatives, testing whether they satisfy criteria, and backtracking on failure to the most recent choice point. The bind of the list monad, e.g. word <- words, means "nondeterministically pick one of the words. guard means "if the choices so far don't satisfy this condition, backtrack and make a different choice. The result of a list monad computation is the list of all the results that stem from choices that did not violate any guards.
If this looks like list comprehensions, well, list comprehensions are the same thing as the list monad—I chose to express it with the monad instead of comprehensions.
There have been several articles published recently on classic brute-force search problems.
Mark Dominus published a simple example of using lists for a simple exhaustive search.
Justin Le followed up with a small modification to the previous article that simplified tracking the current state of the search.
I followed up with a further modification that allowed measuring the gains from early rejection of part of the search tree.
Note that the code in my article is quite slow because it's measuring the amount of work done as well as doing it. My article has good examples for how to quickly reject parts of the search tree, but it should be considered only an illustration - not production code.
A brute force approach using recursion:
import Data.List (filter, (\\), reverse, delete, sortBy)
import Data.Ord (comparing)
neighbour :: String -> String -> Bool
neighbour word = (1 ==) . length . (\\ word)
process :: String -> String -> [String] -> [(Int, [String])]
process start end dict =
let
loop :: String -> String -> [String] -> [String] -> [(Int,[String])] -> [(Int,[String])]
loop start end dict path results =
case next of
[] -> results
xs ->
if elem end xs
then (length solution, solution) : results
else results ++ branches xs
where
next = filter (neighbour start) dict'
dict' = delete start dict
path' = start : path
branches xs = [a | x <- xs, a <- loop x end dict' path' results]
solution = reverse (end : path')
in
loop start end dict [] []
shortestSolution :: Maybe Int
shortestSolution = shortest solutions
where
solutions = process start end dict
shortest s =
case s of
[] -> Nothing
xs -> Just $ fst $ head $ sortBy (comparing fst) xs
start = "spice"
end = "stock"
dict = ["spice","stick","smice","slice","slick","stock"]
Notes:
This code computes all possibles solutions (process) and select the shortest one (shortestSolution), as Carl said, you might want to prune parts of the search tree for better performance.
Using a Maybe instead of returning -1 when a function can fail to return results is preferred.
Another way using a tree with breadth-first search:
import Data.Tree
import Data.List( filter, (\\), delete )
import Data.Maybe
node :: String -> [String] -> Tree String
node label dict = Node{ rootLabel = label, subForest = branches label (delete label dict) }
branches :: String -> [String] -> [Tree String]
branches start dict = map (flip node dict) (filter (neighbour start) dict)
neighbour :: String -> String -> Bool
neighbour word = (1 ==) . length . (\\ word)
-- breadth first traversal
shortestBF tree end = find [tree] end 0
where
find ts end depth
| null ts = Nothing
| elem end (map rootLabel ts) = Just depth
| otherwise = find (concat (map subForest ts)) end (depth+1)
result = shortestBF tree end
tree :: Tree String
tree = node start dict
start = "spice"
end = "stock"
dict = ["spice","stick","smice","slice","slick","stock"]

Haskell - how to generate permutations

How can I create a function which lazily makes permutations for the chars '_' and '*' like this:
For example:
Main> function 3
["___","*__","_*_","__*","**_","_**","*_*","***"]
First element is made only from _, the next 3 are permutations that lists: *__, the second 3 are permutations that lists **_, and the last element contains only *.
How can I do that?
Here's another "correct order" version:
function :: Int -> [String]
function c = concatMap helper $ zip (reverse [0..c]) [0..c]
helper :: (Int, Int) -> [String]
helper (c, 0) = [replicate c '_']
helper (0, c) = [replicate c '*']
helper (cUnderscores, cAsterisks) = map ('_' :) (helper (cUnderscores - 1, cAsterisks))
++ map ('*' :) (helper (cUnderscores, cAsterisks - 1))
You might want to look at replicateM.
let k = ["_", "*"]
let p = [ a ++ b ++ c | a <- k, b <- k, c <- k ]
The “correct order” version:
import Data.List
function k = concatMap (nub . permutations . pat) [0..k]
where pat x = replicate x '*' ++ replicate (k-x) '_'
I don’t know how to step from one permutation to another in constant time, though.

Retrieve strings from Matrix

I'm stuck with my homework task, somebody help, please..
Here is the task:
Find all possible partitions of string into words of some dictionary
And here is how I'm trying to do it:
I use dynamical programming concept to fill matrix and then I'm stuck with how to retrieve data from it
-- Task5_2
retrieve :: [[Int]] -> [String] -> Int -> Int -> Int -> [[String]]
retrieve matrix dict i j size
| i >= size || j >= size = []
| index /= 0 = [(dict !! index)]:(retrieve matrix dict (i + sizeOfWord) (i + sizeOfWord) size) ++ retrieve matrix dict i (next matrix i j) size
where index = (matrix !! i !! j) - 1; sizeOfWord = length (dict !! index)
next matrix i j
| j >= (length matrix) = j
| matrix !! i !! j > 0 = j
| otherwise = next matrix i (j + 1)
getPartitionMatrix :: String -> [String] -> [[Int]]
getPartitionMatrix text dict = [[ indiceOfWord (getWord text i j) dict 1 | j <- [1..(length text)]] | i <- [1..(length text)]]
--------------------------
getWord :: String -> Int -> Int -> String
getWord text from to = map fst $ filter (\a -> (snd a) >= from && (snd a) <= to) $ zip text [1..]
indiceOfWord :: String -> [String] -> Int -> Int
indiceOfWord _ [] _ = 0
indiceOfWord word (x:xs) n
| word == x = n
| otherwise = indiceOfWord word xs (n + 1)
-- TESTS
dictionary = ["la", "a", "laa", "l"]
string = "laa"
matr = getPartitionMatrix string dictionary
test = retrieve matr dictionary 0 0 (length string)
Here is a code that do what you ask for. It doesn't work exactly like your solution but should work as fast if (and only if) both our dictionary lookup were improved to use tries as would be reasonable. As it is I think it may be a bit faster than your solution :
module Partitions (partitions) where
import Data.Array
import Data.List
data Branches a = Empty | B [([a],Branches a)] deriving (Show)
isEmpty Empty = True
isEmpty _ = False
flatten :: Branches a -> [ [ [a] ] ]
flatten Empty = []
flatten (B []) = [[]]
flatten (B ps) = concatMap (\(word, bs) -> ...) ps
type Dictionary a = [[a]]
partitions :: (Ord a) => Dictionary a -> [a] -> [ [ [a] ] ]
partitions dict xs = flatten (parts ! 0)
where
parts = listArray (0,length xs) $ zipWith (\i ys -> starting i ys) [0..] (tails xs)
starting _ [] = B []
starting i ys
| null words = ...
| otherwise = ...
where
words = filter (`isPrefixOf` ys) $ dict
go word = (word, parts ! (i + length word))
It works like this : At each position of the string, it search all possible words starting from there in the dictionary and evaluates to a Branches, that is either a dead-end (Empty) or a list of pairs of a word and all possible continuations after it, discarding those words that can't be continued.
Dynamic programming enter the picture to record every possibilities starting from a given index in a lazy array. Note that the knot is tied : we compute parts by using starting, which uses parts to lookup which continuations are possible from a given index. This only works because we only lookup indices after the one starting is computing and starting don't use parts for the last index.
To retrieve the list of partitions from this Branches datatype is analogous to the listing of all path in a tree.
EDIT : I removed some crucial parts of the solution in order to let the questioner search for himself. Though that shouldn't be too hard to complete with some thinking. I'll probably put them back with a somewhat cleaned up version later.

Resources