How to search a pattern from a file/String in Haskell - search

** old**
Suppose we have a pattern ex. "1101000111001110".
Now I have a pattern to be searched ex. "1101". I am new to Haskell world, I am trying it at my end. I am able to do it in c but need to do it in Haskell.
Given Pattern := "1101000111001110"
Pattern To Be Searched :- "110
Desired Output:-"Pattern Found"`
** New**
import Data.List (isInfixOf)
main = do x <- readFile "read.txt"
putStr x
isSubb :: [Char] -> [Char] -> Bool
isSubb sub str = isInfixOf sub str
This code reads a file named "read", which contains the following string 110100001101. Using isInfixOf you can check the pattern "1101" in the string and result will be True.
But the problem is i am not able to search "1101" in the string present in "read.txt".
I need to compare the "read.txt" string with the user provided string. i.e
one string is their in the file "read.txt"
and second string user will provid (user defined) and we will perform search and find whether user defined string is present in the string present in "read.txt"

Answer to new:
To achieve this, you have to use readLn:
sub <- readLn
readLn accepts input until a \n is encountered and <- binds the result to sub. Watch out that if the input should be a string you have to explicitly type the "s around your string.
Alternatively if you do not feel like typing the quotation marks every time, you can use getLine in place of readLn which has the type IO String which becomes String after being bound to sub
For further information on all functions included in the standard libraries of Haskell see Hoogle. Using Hoogle you can search functions by various criteria and will often find functions which suit your needs.
Answer to old:
Use the isInfixOf function from Data.List to search for the pattern:
import Data.List (isInfixOf)
isInfixOf "1101" "1101000111001110" -- outputs true
It returns true if the first sequence exists in the second and false otherwise.
To read a file and get its contents use readFile:
contents <- readFile "filename.txt"
You will get the whole file as one string, which you can now perform standard functions on.
Outputting "Pattern found" should be trivial then.

Related

Print contains data type how to disable

using gogol package,
follow example got
> exampleGetValue
-- ValueRange' {_vrValues = Just [String "2018/1/1",String "2018/1/2"], _vrRange = Just "'\24037\20316\34920\&1'!A1:1", _vrMajorDimension = Just VRMDRows}
> exampleGetValue >>= return . view vrValues
-- [String "2018/1/1",String "2018/1/2"]
> mapM_ (print) (exampleGetValue >>= return . view vrValues)
String "2018/1/1"
String "2018/1/2"
Why there will be a string of words
How to do I can only show
2018/1/1
2018/1/2
Take a look at
[String "2018/1/1",String "2018/1/2"]
the result of
> exampleGetValue >>= return . view vrValues
Here the strings you are interested in, like "2018/1/1" are contained in another datatype String, which has, I assume, an automatically derived show instance, which will print the name of the Data constructor String.
You need to unpack the strings somehow to get rid of the printing of the word String.
As this is stackoverflow, and we are considered to provide answers, I will give you one possibility now, but before you read it, try to do it yourself:
unpackString (String w) = w
mapM_ (print . unpackString) (exampleGetValue >>= return . view vrValues)
You have to determine the type signature for unpackString yourself, as you didn't provided any types.

Use only Integers of a file with text

My function works ok. But I want to use this function with a file's text. The text file has a word before an integer list. How can I do this?
This is the function:
broke :: Integer -> Integer
broke n = pollard 1 2 n 2 2
The contents of the file is:
Word (11,12)
I want to apply the function broke to the first number.
Well this might be kind of a cheat, but the contents of that file is a valid Haskell expression so you could use Read to do it:
import System.IO (readFile)
data Word = Word (Integer,Integer)
deriving (Read)
main = do
contents <- readFile "path/to/file" -- or wherever your file is
let Word (x,y) = read contents
print $ broke x
The reason this works is that deriving (Read) automatically writes a parser for you, so you get the function read :: String -> Word for free. So this technique is only going to work for files whose contents look like Haskell -- otherwise you will need to write your own parser.

How can I write a parser using Parsec that only accepts unique elements?

I have recently started learning Haskell and have been trying my hand at Parsec. However, for the past couple of days I have been stuck with a problem that I have been unable to find the solution to. So what I am trying to do is write a parser that can parse a string like this:
<"apple", "pear", "pineapple", "orange">
The code that I wrote to do that is:
collection :: Parser [String]
collection = (char '<') *> (string `sepBy` char ',')) <* (char '>')
string :: Parser String
string = char '"' *> (many (noneOf ['\"', '\r', '\n', '"'])) <* char '"'
This works fine for me as it is able to parse the string that I have defined above. Nevertheless, I would now like to enforce the rule that every element in this collection must be unique and that is where I am having trouble. One of the first results I found when searching on the internet was this one, which suggest the usage of the nub function. Although the problem stated in that question is not the same, it would in theory solve my problem. But what I don't understand is how I can apply this function within a Parser. I have tried adding the nub function to several parts of the code above without any success. Later I also tried doing it the following way:
collection :: Parser [String]
collection = do
char '<'
value <- (string `sepBy` char ','))
char '>'
return nub value
But this does not work as the type does not match what nub is expecting, which I believe is one of the problems I am struggling with. I am also not entirely sure whether nub is the right way to go. My fear is that I am going in the wrong direction and that I won't be able to solve my problem like this. Is there perhaps something I am missing? Any advice or help anyone could provide would be greatly appreciated.
The Parsec Parser type is an instance of MonadPlus which means that we can always fail (ie cause a parse error) whenever we want. A handy function for this is guard:
guard :: MonadPlus m => Bool -> m ()
This function takes a boolean. If it's true, it return () and the whole computation (a parse in this case) does not fail. If it's false, the whole thing fails.
So, as long as you don't care about efficiency, here's a reasonable approach: parse the whole list, check for whether all the elements are unique and fail if they aren't.
To do this, the first thing we have to do is write a predicate that checks if every element of a list is unique. nub does not quite do the right thing: it return a list with all the duplicates taken out. But if we don't care much about performance, we can use it to check:
allUnique ls = length (nub ls) == length ls
With this predicate in hand, we can write a function unique that wraps any parser that produces a list and ensures that list is unique:
unique parser = do res <- parser
guard (allUnique res)
return res
Again, if guard is give True, it doesn't affect the rest of the parse. But if it's given False, it will cause an error.
Here's how we could use it:
λ> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\">"
Right ["apple","pear","pineapple","orange"]
λ> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\",\"apple\">"
Left "<interactive>" (line 1, column 46):unknown parse error
This does what you want. However, there's a problem: there is no error message supplied. That's not very user friendly! Happily, we can fix this using <?>. This is an operator provided by Parsec that lets us set the error message of a parser.
unique parser = do res <- parser
guard (allUnique res) <?> "unique elements"
return res
Ahhh, much better:
λ> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\",\"apple\">"
Left "<interactive>" (line 1, column 46):
expecting unique elements
All this works but, again, it's worth noting that it isn't efficient. It parses the whole list before realizing elements aren't unique, and nub takes quadratic time. However, this works and it's probably more than good enough for parsing small to medium-sized files: ie most things written by hand rather than autogenerated.

BioHaskell: Read FASTA file

Using BioHaskell, how can I read a FASTA file containing aminoacid sequences?
I want to be able to:
Get a list of String sequences
Get a Map String String (from Data.Map ) from the FASTA comment (assumed to be unique) to the sequence String
Use the sequences in algorithms implemented in BioHaskell.
Note: This question intentionally does not show research effort as it was immediately answered in a Q&A-style manner.
Extracting raw sequence strings
We will assume from now on that the file aa.fa contains some aminoacid FASTA sequences. Let's start with a simple example that extracts a list of sequences.
import Bio.Sequence.Fasta (readFasta)
import Bio.Sequence.SeqData (seqdata)
import qualified Data.ByteString.Lazy.Char8 as LB
main = do
sequences <- readFasta "aa.fa"
let listOfSequences = map (LB.unpack . seqdata) sequences :: [String]
-- Just for show, we will print one sequence per line here
-- This will basically execute putStrLn for each sequence
mapM_ putStrLn listOfSequences
readFasta returns IO [Sequence Unknown]. Basically that means there is no information about whether the sequences contain Aminoacids or nucleotides.
Note that we use LB.unpack instead of show here, because show adds double quotes (") at the beginning and the end of the resulting String. Using LB.unpack works, because in the current BioHaskell version 0.5.3., SeqData is just defined as lazy ByteString.
We can fix this by using castToAmino or castToNuc:
Converting to AA/Nucleotide sequences
let aaSequences = map castToAmino sequences :: [Sequence Amino]
Note that those function currently (BioHaskell version 0.5.3) do not perform any validity checks. You can use the [Sequence Amino] or [Sequence Nuc] in the BioHaskell algorithms.
Lookup sequence by FASTA header
We will now assume that our aa.fa contains a sequence
>abc123
MGLIFARATNA...
Now, we will build a Map String String (we will use Data.Map.Strict in this example) from the FASTA file. We can use this map to lookup the sequence.
The lookup will yield a Maybe String. The intended behaviour in this example is to print the sequence if it was found, or not to print anything if nothing was found in the Map.
As Data.Maybe is a Monad, we can use Data.Foldable.mapM_ for this task.
import Bio.Sequence.Fasta (readFasta)
import Bio.Sequence.SeqData (Sequence, seqdata, seqheader)
import qualified Data.ByteString.Lazy.Char8 as LB
import Data.Foldable (mapM_)
import qualified Data.Map.Strict as Map
-- | Convert a Sequence to a String tuple (sequence label, sequence)
sequenceToMapTuple :: Sequence a -> (String, String)
sequenceToMapTuple s = (LB.unpack $ seqheader s, LB.unpack $ seqdata s)
main = do
sequences <- readFasta "aa.fa"
-- Build the sequence map (by header)
let sequenceMap = Map.fromList $ map sequenceToMapTuple sequences
-- Lookup the sequence for the "abc123" header
mapM_ print $ Map.lookup "abc123" sequenceMap
Edit: Thanks to #GabrielGonzalez suggestion, the final example now uses Data.Foldable.mapM_ instead of Data.Maybe.fromJust

Haskell: Delimit a string by chosen sub-strings and whitespace

Am still new to Haskell, so apologize if there is an obvious answer to this...
I would like to make a function that splits up the all following lists of strings i.e. [String]:
["int x = 1", "y := x + 123"]
["int x= 1", "y:= x+123"]
["int x=1", "y:=x+123"]
All into the same string of strings i.e. [[String]]:
[["int", "x", "=", "1"], ["y", ":=", "x", "+", "123"]]
You can use map words.lines for the first [String].
But I do not know any really neat ways to also take into account the others - where you would be using the various sub-strings "=", ":=", "+" etc. to break up the main string.
Thank you for taking the time to enlighten me on Haskell :-)
The Prelude comes with a little-known handy function called lex, which is a lexer for Haskell expressions. These match the form you need.
lex :: String -> [(String,String)]
What a weird type though! The list is there for interfacing with a standard type of parser, but I'm pretty sure lex always returns either 1 or 0 elements (0 indicating a parse failure). The tuple is (token-lexed, rest-of-input), so lex only pulls off one token. So a simple way to lex a whole string would be:
lexStr :: String -> [String]
lexStr "" = []
lexStr s =
case lex s of
[(tok,rest)] -> tok : lexStr rest
[] -> error "Failed lex"
To appease the pedants, this code is in terrible form. An explicit call to error instead of returning a reasonable error using Maybe, assuming lex only returns 1 or 0 elements, etc. The code that does this reliably is about the same length, but is significantly more abstract, so I spared your beginner eyes.
I would take a look at parsec and build a simple grammar for parsing your strings.
how about using words .)
words :: String -> [String]
and words wont care for whitespaces..
words "Hello World"
= words "Hello World"
= ["Hello", "World"]

Resources