Splitting a String in Haskell

Splitting a String in Haskell - haskell

I want to split a String in Haskell.
My inicial String would look something like
["Split a String in Haskell"]
and my expected output would be:
["Split","a","String","in","Haskell"].
From what i've seen, words and lines don't work here, because i have the type [String] instead of just String.
I've tried Data.List.Split, but no luck there either.

import Data.List
split = (>>= words)
main = print $ split ["Split a String in Haskell"]
map words makes [["Split","a","String","in","Haskell"]] from ["Split a String in Haskell"], and concat makes [x] from [[x]]. And concat (map f xs) is equal to xs >>= f. And h xs = xs >>= f is equal to h = (>>= f).
Another way, more simple would be
split = words . head

Related

Capitalizing first letter of words while removing spaces (Haskell)

I'm just starting out in Haskell and this is like the third thing I'm writing, so, naturally, I'm finding myself a little stumped.
I'm trying to write a bit of code that will take a string, delete the spaces, and capitalize each letter of that string.
For example, if I input "this is a test", I would like to get back something like: "thisIsATest"
import qualified Data.Char as Char
toCaps :: String -> String
toCaps [] = []
toCaps xs = filter(/=' ') xs
toCaps (_:xs) = map Char.toUpper xs
I think the method I'm using is wrong. With my code in this order, I am able to remove all the spaces using the filter function, but nothing becomes capitalize.
When I move the filter bit to the very end of the code, I am able to use the map Char.toUpper bit. When I map that function Char.toUpper, it just capitalizes everything "HISISATEST", for example.
I was trying to make use of an if function to say something similar to
if ' ' then map Char.toUpper xs else Char.toLower xs, but that didn't work out for me. I haven't utilized if in Haskell yet, and I don't think I'm doing it correctly. I also know using "xs" is wrong, but I'm not sure how to fix it.
Can anyone offer any pointers on this particular problem?

I think it might be better if you split the problem into smaller subproblems. First we can make a function that, for a given word will capitalize the first character. For camel case, we thus can implement this as:
import Data.Char(toUpper)
capWord :: String -> String
capWord "" = ""
capWord (c:cs) = toUpper c : cs
We can then use words to obtain the list of words:
toCaps :: String -> String
toCaps = go . words
where go [] = ""
go (w:ws) = concat (w : map capWord ws)
For example:
Prelude Data.Char> toCaps "this is a test"
"thisIsATest"
For Pascal case, we can make use of concatMap instead:
toCaps :: String -> String
toCaps = concatMap capWord . words

Inspired by this answer from Will Ness, here's a way to do it that avoids unnecessary Booleans and comparisons:
import qualified Data.Char as Char
toCaps :: String -> String
toCaps = flip (foldr go (const [])) id
where go ' ' acc _ = acc Char.toUpper
go x acc f = f x:acc id
Or more understandably, but perhaps slightly less efficient:
import qualified Data.Char as Char
toCaps :: String -> String
toCaps = go id
where go _ [] = []
go _ (' ':xs) = go Char.toUpper xs
go f (x :xs) = f x:go id xs

There are a number of ways of doing it, but if I were trying to keep it as close to how you've set up your example, I might do something like:
import Data.Char (toUpper)
toCaps :: String -> String
toCaps [] = [] -- base case
toCaps (' ':c:cs) = toUpper c : toCaps cs -- throws out the space and capitalizes next letter
toCaps (c:cs) = c : toCaps cs -- anything else is left as is
This is just using basic recursion, dealing with a character (element of the list) at a time, but if you wanted to use higher-order functions such as map or filter that work on the entire list, then you would probably want to compose them (the way that Willem suggested is one way) and in that case you could probably do without using recursion at all.
It should be noted that this solution is brittle in the sense that it assumes the input string does not contain leading, trailing, or multiple consecutive spaces.

Inspired by Joseph Sible 's answer, a coroutines solution:
import Data.Char
toCamelCase :: String -> String
toCamelCase [] = []
toCamelCase (' ': xs) = toPascalCase xs
toCamelCase (x : xs) = x : toCamelCase xs
toPascalCase :: String -> String
toPascalCase [] = []
toPascalCase (' ': xs) = toPascalCase xs
toPascalCase (x : xs) = toUpper x : toCamelCase xs
Be careful to not start the input string with a space, or you'll get the first word capitalized as well.

How can split a string with two conditions?

So basically I want to split my string with two conditions , when have a empty space or a diferent letter from the next one.
An example:
if I have this string ,"AAA ADDD DD", I want to split to this, ["AAA","A","DDD","DD"]
So I made this code:
sliceIt :: String -> [String]
sliceIt xs = words xs
But it only splits the inicial string when an empty space exists.
How can I also split when a caracter is next to a diferent one?
Can this problem be solve easier with recursion?

So you want to split by words and then group equal elements in each split. You have the functions for doing so,
import Data.List
sliceIt :: String -> [String]
sliceIt s = concatMap group $ words s
sliceItPointFree = concatMap group . words -- Point free notation. Same but cooler

split :: String -> [String]
split [] = []
split (' ':xs) = split xs
split (x:xs) = (takeWhile (== x) (x:xs)) : (split $ dropWhile (== x) (x:xs))
So this is a recursive definition where there are 2 cases:
If head is a space then ignore it.
Otherwise, take as many of the same characters as you can, then call the function on the remaining part of the string.

Haskell - Removing non-letter characters but ignoring white spaces?

I am very new to Haskell. I am trying to return a list of strings from a given string (which could contain non-letter characters) but I get a single string in the list.
The below code shows What I have tried so far:
toLowerStr xs = map toLower xs
--drop non-letters characters
dropNonLetters xs = words $ (filter (\x -> x `elem` ['a'..'z'])) $ toLowerStr xs
lowercase all the characters by using toLower function
remove non-letter characters by using filter function
return a list of strings by using words function
I think the filter function is removing the white spaces and therefore it becomes a single string. I tried using isSpace function but I don't know exactly how to implement it in this case.
What is it that I am doing wrong? I get this output:
λ> dropNonLetters "ORANGE, apple! APPLE!!"
["orangeappleapple"]
But I want to achieve the below output:
λ> dropNonLetters "ORANGE, apple! APPLE!!"
["orange","apple","apple"]

I think the filter function is removing the white spaces and therefore it becomes a single string.
That is correct. As filter predicate you write \x -> x `elem` ['a'..'z']. ['a'..'z'] is a list that contains lowercase letters, so for whitespace, the predicate will fail, and thus you should allow spaces as well.
We can for instance add the space character to the list:
dropNonLetters xs = words $ (filter (\x -> x `elem` (' ':['a'..'z'])))) $ toLowerStr xs
But this is inelegant and does not really explain itself. The Data.Char module however ships with two functions that are interesting here: isLower :: Char -> Bool, and isSpace :: Char -> Bool. We can use this like:
dropNonLetters xs = words $ (filter (\x -> isLower x || isSpace x)) $ toLowerStr xs
isLower and isSpace are not only more "descriptive" and elegant. Usually these functions will be faster than a membership check (which will usually be done in O(n)), and furthermore it will also take into account tabs, new lines, etc.
We can also perform an eta-reduction on the function:
dropNonLetters = words . (filter (\x -> isLower x || isSpace x)) . toLowerStr
This then produces:
Prelude Data.Char> dropNonLetters "ORANGE, apple! APPLE!!"
["orange","apple","apple"]
I advise you to rename the function dropNonLetters, since now it does not fully explain that it will generate a list of words. Based on the name, I would think that it only drops non-letters, not that it converts the string to lowercase nor that it constructs words.

here's an example of separating characters into separate string lists:
sortNumbers :: [Char] -> [String]
sortNumbers args = filter (\strings ->strings/= "") $ zipWith (\x numbers -> filter (\char -> char == numbers) x) (repeat args)
['1'..'9']

Project Euler 8 - I don't understand it

I looked up for a solution in Haskell for the 8th Euler problem, but I don't quite understand it.
import Data.List
import Data.Char
euler_8 = do
str <- readFile "number.txt"
print . maximum . map product
. foldr (zipWith (:)) (repeat [])
. take 13 . tails . map (fromIntegral . digitToInt)
. concat . lines $ str
Here is the link for the solution and here you can find the task.
Could anyone explain me the solution one by one?

Reading the data
readFile reads the file "number.txt". If we put a small 16 digit number in a file called number.txt
7316
9698
8586
1254
Runing
euler_8 = do
str <- readFile "number.txt"
print $ str
Results in
"7316\n9698\n8586\n1254"
This string has extra newline characters in it. To remove them, the author splits the string into lines.
euler_8 = do
str <- readFile "number.txt"
print . lines $ str
The result no longer has any '\n' characters, but is a list of strings.
["7316","9698","8586","1254"]
To turn this into a single string, the strings are concatenated together.
euler_8 = do
str <- readFile "number.txt"
print . concat . lines $ str
The concatenated string is a list of characters instead of a list of numbers
"7316969885861254"
Each character is converted into an Int by digitToInt then converted into an Integer by fromInteger. On 32 bit hardware using a full-sized Integer is important since the product of 13 digits could be larger than 2^31-1. This conversion is mapped onto each item in the list.
euler_8 = do
str <- readFile "number.txt"
print . map (fromIntegral . digitToInt)
. concat . lines $ str
The resulting list is full of Integers.
[7,3,1,6,9,6,9,8,8,5,8,6,1,2,5,4]
Subsequences
The author's next goal is to find all of the 13 digit runs in this list of integers. tails returns all of the sublists of a list, starting at any position and running till the end of the list.
euler_8 = do
str <- readFile "number.txt"
print . tails
. map (fromIntegral . digitToInt)
. concat . lines $ str
This results in 17 lists for our 16 digit example. (I've added formatting)
[
[7,3,1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[3,1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[6,9,6,9,8,8,5,8,6,1,2,5,4],
[9,6,9,8,8,5,8,6,1,2,5,4],
[6,9,8,8,5,8,6,1,2,5,4],
[9,8,8,5,8,6,1,2,5,4],
[8,8,5,8,6,1,2,5,4],
[8,5,8,6,1,2,5,4],
[5,8,6,1,2,5,4],
[8,6,1,2,5,4],
[6,1,2,5,4],
[1,2,5,4],
[2,5,4],
[5,4],
[4],
[]
]
The author is going to pull a trick where we rearrange these lists to read off 13 digit long sub lists. If we look at these lists left-aligned instead of right-aligned we can see the sub sequences running down each column.
[
[7,3,1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[3,1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[6,9,6,9,8,8,5,8,6,1,2,5,4],
[9,6,9,8,8,5,8,6,1,2,5,4],
[6,9,8,8,5,8,6,1,2,5,4],
[9,8,8,5,8,6,1,2,5,4],
[8,8,5,8,6,1,2,5,4],
[8,5,8,6,1,2,5,4],
[5,8,6,1,2,5,4],
[8,6,1,2,5,4],
[6,1,2,5,4],
[1,2,5,4],
[2,5,4],
[5,4],
[4],
[]
]
We only want these columns to be 13 digits long, so we only want to take the first 13 rows.
[
[7,3,1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[3,1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[6,9,6,9,8,8,5,8,6,1,2,5,4],
[9,6,9,8,8,5,8,6,1,2,5,4],
[6,9,8,8,5,8,6,1,2,5,4],
[9,8,8,5,8,6,1,2,5,4],
[8,8,5,8,6,1,2,5,4],
[8,5,8,6,1,2,5,4],
[5,8,6,1,2,5,4],
[8,6,1,2,5,4],
[6,1,2,5,4],
[1,2,5,4]
]
foldr (zipWith (:)) (repeat []) transposes a list of lists (explaining it belongs to perhaps another question). It discards the parts of the rows longer than the shortest row.
euler_8 = do
str <- readFile "number.txt"
print . foldr (zipWith (:)) (repeat [])
. take 13 . tails
. map (fromIntegral . digitToInt)
. concat . lines $ str
We are now reading the sub-sequences across the lists as usual
[
[7,3,1,6,9,6,9,8,8,5,8,6,1],
[3,1,6,9,6,9,8,8,5,8,6,1,2],
[1,6,9,6,9,8,8,5,8,6,1,2,5],
[6,9,6,9,8,8,5,8,6,1,2,5,4]
]
The problem
We find the product of each of the sub-sequences by mapping product on to them.
euler_8 = do
str <- readFile "number.txt"
print . map product
. foldr (zipWith (:)) (repeat [])
. take 13 . tails
. map (fromIntegral . digitToInt)
. concat . lines $ str
This reduces the lists to a single number each
[940584960,268738560,447897600,1791590400]
From which we must find the maximum.
euler_8 = do
str <- readFile "number.txt"
print . maximum . map product
. foldr (zipWith (:)) (repeat [])
. take 13 . tails
. map (fromIntegral . digitToInt)
. concat . lines $ str
The answer is
1791590400

If you're not familiar with the functions used, the first thing you should do is examine the types of each function. Since this is function composition, you apply from inside out (i.e. operations occur right to left, bottom to top when reading). We can walk through this line by line.
Starting from the last line, we'll first examine the types.
:t str
str :: String -- This is your input
:t lines
lines :: String -> [String] -- Turn a string into an array of strings splitting on new line
:t concat
concat :: [[a]] -> [a] -- Merge a list of lists into a single list (hint: type String = [Char])
Since type String = [Char] (so [String] is equivalent to [[Char]]), this line is converting the multi-line number into a single array of number characters. More precisely, it first creates an array of strings based on the full string. That is, one string per new line. It then merges all of these lines (now containing only number characters) into a single array of characters (or a single String).
The next line takes this new String as input. Again, let's observe the types:
:t digitToInt
digitToInt :: Char -> Int -- Convert a digit char to an int
:t fromIntegral
fromIntegral :: (Num b, Integral a) => a -> b -- Convert integral to num type
:t map
map :: (a -> b) -> [a] -> [b] -- Perform a function on each element of the array
:t tails
tails :: [a] -> [[a]] -- Returns all final segments of input (see: http://hackage.haskell.org/package/base-4.8.0.0/docs/Data-List.html#v:tails)
:t take
take :: Int -> [a] -> [a] -- Return the first n values of the list
If we apply these operations to our string current input, the first thing that happens is we map the composed function of (fromIntegral . digitToInt) over each character in our string. What this does is turn our string of digits into a list of number types. EDIT As pointed out below in the comments, the fromIntegral in this example is to prevent overflow on 32-bit integer types. Now that we have converted our string into actual numeric types, we start by running tails on this result. Since (by the problem statement) all values must be adjacent and we know that all of the integers are non-negative (by virtue of being places of a larger number), we take only the first 13 elements since we want to ensure our multiplication is groupings of 13 consecutive elements. How this works is difficult to understand without considering the next line.
So, let's do a quick experiment. After converting our string into numeric types, we now have a big list of lists. This is actually kind of hard to think about what we actually have here. For sake of understanding, the contents of the list are not very important. What is important is its size. So let's take a look at an artificial example:
(map length . take 13 . tails) [1..1000]
[1000,999,998,997,996,995,994,993,992,991,990,989,988]
You can see what we have here is a big list of 13 elements. Each element is a list of size 1000 (i.e. the full dataset) down to 988 in descending order. So this is what we currently have for input into the next line which is, arguably, the most difficult-- yet most important-- line to understand. Why understanding this is important should become clear as we walk through the next line.
:t foldr
foldr :: (a -> b -> b) -> b -> [a] -> b -- Combine values into a single value
:t zipWith
zipWith :: (a -> b -> c) -> [a] -> [b] -> [c] -- Generalization of zip
:t (:)
(:) :: a -> [a] -> [a] -- Cons operator. Add element to list
:t repeat
repeat :: a -> [a] -- Infinite list containing specified value
Remember how I mentioned we had a list of 13 elements before (of varying-sized lists)? This is important now. The line is going to iterate over that list and apply (zipWith (:)) to it. The (repeat []) is such that each time zipWith is called on a subsequence, it starts with an empty list as its base. This allows us to construct a list of lists containing our adjacent subsequences of length 13.
Finally, we get to the last line which is pretty easy. That said, we should still be mindful of our types
:t product
product :: Num a => [a] -> a -- Multiply all elements of a list together and return result
:t maximum
maximum :: Ord a => [a] -> a -- Return maximum element in the list
The first thing we do is map the product function over each subsequence. When this has completed we end up with a list of numeric types (hey, we finally don't have a list of lists anymore!). These values are the products of each subsequence. Finally, we apply the maximum function which returns only the largest element in the list.

EDIT: I found out later what the foldr expression was for. (See comments bellow my answer).
I think that this could be expressed in different way - You can simply add a guard at the end of the list.
My verbose version of that solution would be:
import Data.List
import Data.Char
euler_8 = do
let len = 13
let str1 = "123456789\n123456789"
-- Join lines
let str2 = concat (lines str1)
-- Transform the list of characters into a list of numbers
let lst1 = map (fromIntegral . digitToInt) str2
-- EDIT: Add a guard at the end of list
let lst2 = lst1 ++ [-1]
-- Get all tails of the list of digits
let lst3 = tails lst2
-- Get first 13 digits from each tail
let lst4 = map (take len) lst3
-- Get a list of products
let prod = map product lst4
-- Find max product
let m = maximum prod
print m

How to capitalize a string using control lens?

I'm playing with the lens package and I'm trying to capitalize a string using only lens.
Basically I want to call toUpper on each first element of every words. That seems to be easy to with it, but I can't figure out at all how to do it. Do I need a traversable ? How do I split by spaces etc ...

It's not really an isomorphism to call words then unwords because it'll convert repeated spaces to single ones, but let's pretend:
words :: Iso' String [String]
words = iso Prelude.words Prelude.unwords
Now we can capitalize words by building a lens which focuses on the first letter of each word and applying over and toUpper
capitalize :: String -> String
capitalize = over (words . traverse . _head) toUpper

capitalize xs = xs & words <&> _head %~ toUpper & unwords
Okay, that's the solution, but how to get there? Lets remove some lens parts. Exchange (<&>) with fmap and (&) with ($):
capitalize xs = unwords $ fmap (_head %~ toUpper) $ words $ xs
This looks familar. _head %~ f will apply f on the first element of the list. At the end, this is (almost*) equivalent to
capitalize xs = unwords $ fmap (\(x:xs) -> toUpper x : xs) $ words $ xs
which you are probably familiar with.
* _head also takes care of the empty list case

A solution that doesn't collapse repeated spaces:
import Control.Lens
import Data.List.Split
import Data.List.Split.Lens
import Data.Char
capitalize :: String -> String
capitalize = view $ splitting (whenElt isSpace) traversed.to (over _head toUpper)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Splitting a String in Haskell - haskell

Related

Capitalizing first letter of words while removing spaces (Haskell)

How can split a string with two conditions?

Haskell - Removing non-letter characters but ignoring white spaces?

Project Euler 8 - I don't understand it

How to capitalize a string using control lens?

Categories

Resources