I need to manipulate the binary encoding as '0' and '1' of simple strings given as input, using ascii 7-bits.
For the encoding I have used the function Data.ByteString.Lazy.Builder.string7 :: String -> Builder
However, I have not found a way to convert back the resulting Builder object into a string of '0' and '1'. Is it possible ? Is there another way ?
Subsidiary question: And if I wanted it in hexadecimal form as text ?
There's an unpackChars function in Data.ByteString.Lazy.Internal. There's also a non-lazy counterpart in Data.ByteString.Internal.
import qualified Data.ByteString.Lazy.Builder as Build
import qualified Data.ByteString.Lazy as BS
import qualified Data.ByteString.Lazy.Internal as BSI
--> BSI.unpackChars $ Build.toLazyByteString $ Build.string7 "010101"
--"010101"
You can also use map (chr . fromIntegral) . BS.unpack instead of unpackChars, but unpackChars is probably faster.
Alternatively, as Michael Snoyman commented below, you could use Data.ByteString.Char8 or its lazy version and you'll get the right conversions to begin with.
Related
This is the code I have:
import qualified System.IO as IO
writeSurrogate :: IO ()
writeSurrogate = do
IO.writeFile "/home/sibi/surrogate.txt" ['\xD800']
Executing the above code gives error:
text-tests: /home/sibi/surrogate.txt: commitBuffer: invalid argument (invalid character)
The reason being is that it is prevented by the GHC itself as they are surrogate code points: https://github.com/ghc/ghc/blob/21f0f56164f50844c2150c62f950983b2376f8b6/libraries/base/GHC/IO/Encoding/Failure.hs#L114
I want to write some test files which needs to have that data. Right now, I'm using Python to achieve what I want - But I would love to know if there is an way (workaround using Haskell) to achieve this.
Sure, just write the bytes you want:
import Data.ByteString as BS
main = BS.writeFile "surrogate.txt" (pack [0xd8, 0x00])
I have searched pursuit, only two of them seems matchs well :
charList from purescript-optlicative (module: Node.Optlicative.Internal)
toChars from purescript-yarn (module: Data.String.Yarn)
And both yarn and optlicative is not available with psc-package (using psc-package 0.4.0 and {"set": "psc-0.12.0", "source": "https://github.com/purescript/package-sets.git"} )
Related question: How do I convert a list of chars to a string in purescript
I would first convert to Array Char via toCharArray, then convert to list:
import Data.List as List
import Data.String.CodeUnits as String
...
List.fromFoldable $ String.toCharArray "abcd"
NOTE: as of purescript-strings v4.0.0, toCharArray is exported from Data.String.CodeUnits, but before that it was in Data.String. Adjust according to the compiler/library version you're using.
Incidentally: are you sure you need a list and not an array? Lists are way less idiomatic in PureScript than in Haskell. Arrays are way more common.
I'm going to have to write an answer instead of a comment to the previous answer because I don't yet have 50 reputation points. 50 reputation points are required before I can comment.
To convert a string to a list of characters, you can use the following code:
import Data.String.CodeUnits (toCharArray) --from package purescript-strings#4.0.0
import Data.List (fromFoldable, List)
import Data.Function ( ($) ) --from package purescript-prelude#4.0.1
convertStringToListOfChars :: String -> List Char
convertStringToListOfChars str = fromFoldable $ toCharArray str
From the REPL, using it gives the following result:
> convertStringToListOfChars "abcde"
('a' : 'b' : 'c' : 'd' : 'e' : Nil)
Trying to write a module which returns the external IP address of my computer.
Using Network.Wreq get function, then applying a lense to obtain responseBody, the type I end up with is Data.ByteString.Lazy.Internal.ByteString. As I want to filter out the trailing "\n" of the result body, I want to use this for a regular expression subsequently.
Problem: That seemingly very specific ByteString type is not accepted by regex library and I found no way to convert it to a String.
Here is my feeble attempt so far (not compiling).
{-# LANGUAGE OverloadedStrings #-}
module ExtIp (getExtIp) where
import Network.Wreq
import Control.Lens
import Data.BytesString.Lazy
import Text.Regex.Posix
getExtIp :: IO String
getExtIp = do
r <- get "http://myexternalip.com/raw"
let body = r ^. responseBody
let addr = body =~ "[^\n]*\n"
return (addr)
So my question is obviously: How to convert that funny special ByteString to a String? Explaining how I can approach such a problem myself is also appreciated. I tried to use unpack and toString but have no idea what to import to get those functions if they exist.
Being a very sporadic haskell user, I also wonder if someone could show me the idiomatic haskell way of defining such a function. The version I show here does not account for possible runtime errors/exceptions, after all.
Short answer: Use unpack from Data.ByteString.Lazy.Char8
Longer answer:
In general when you want to convert a ByteString (of any variety) to a String or Text you have to specify an encoding - e.g. UTF-8 or Latin1, etc.
When retrieving an HTML page the encoding you are suppose to use may appear in the Content-type header or in the response body itself as a <meta ...> tag.
Alternatively you can just guess at what the encoding of the body is.
In your case I presume you are accessing a site like http://whatsmyip.org and you only need to parse out your IP address. So without examining the headers or looking through the HTML, a safe encoding to use would be Latin1.
To convert ByteStrings to Text via an encoding, have a look at the functions in Data.Text.Encoding
For instance, the decodeLatin1 function.
I simply do not understand why you insist on using Strings, when you have already a ByteString at hand that is the faster/more efficient implementation.
Importing regex gives you almost no benefit - for parsing an ip-address I would use attoparsec which works great with ByteStrings.
Here is a version that does not use regex but returns a String - note I did not compile it for I have no haskell setup where I am right now.
{-# LANGUAGE OverloadedStrings #-}
module ExtIp (getExtIp) where
import Network.Wreq
import Control.Lens
import Data.ByteString.Lazy.Char8 as Char8
import Data.Char (isSpace)
getExtIp :: IO String
getExtIp = do
r <- get "http://myexternalip.com/raw"
return $ Char8.unpack $ trim (r ^. responseBody)
where trim = Char8.reverse . (Char8.dropWhile isSpace) . Char8.reverse . (Char8.dropWhile isSpace)
Using BioHaskell, how can I read a FASTA file containing aminoacid sequences?
I want to be able to:
Get a list of String sequences
Get a Map String String (from Data.Map ) from the FASTA comment (assumed to be unique) to the sequence String
Use the sequences in algorithms implemented in BioHaskell.
Note: This question intentionally does not show research effort as it was immediately answered in a Q&A-style manner.
Extracting raw sequence strings
We will assume from now on that the file aa.fa contains some aminoacid FASTA sequences. Let's start with a simple example that extracts a list of sequences.
import Bio.Sequence.Fasta (readFasta)
import Bio.Sequence.SeqData (seqdata)
import qualified Data.ByteString.Lazy.Char8 as LB
main = do
sequences <- readFasta "aa.fa"
let listOfSequences = map (LB.unpack . seqdata) sequences :: [String]
-- Just for show, we will print one sequence per line here
-- This will basically execute putStrLn for each sequence
mapM_ putStrLn listOfSequences
readFasta returns IO [Sequence Unknown]. Basically that means there is no information about whether the sequences contain Aminoacids or nucleotides.
Note that we use LB.unpack instead of show here, because show adds double quotes (") at the beginning and the end of the resulting String. Using LB.unpack works, because in the current BioHaskell version 0.5.3., SeqData is just defined as lazy ByteString.
We can fix this by using castToAmino or castToNuc:
Converting to AA/Nucleotide sequences
let aaSequences = map castToAmino sequences :: [Sequence Amino]
Note that those function currently (BioHaskell version 0.5.3) do not perform any validity checks. You can use the [Sequence Amino] or [Sequence Nuc] in the BioHaskell algorithms.
Lookup sequence by FASTA header
We will now assume that our aa.fa contains a sequence
>abc123
MGLIFARATNA...
Now, we will build a Map String String (we will use Data.Map.Strict in this example) from the FASTA file. We can use this map to lookup the sequence.
The lookup will yield a Maybe String. The intended behaviour in this example is to print the sequence if it was found, or not to print anything if nothing was found in the Map.
As Data.Maybe is a Monad, we can use Data.Foldable.mapM_ for this task.
import Bio.Sequence.Fasta (readFasta)
import Bio.Sequence.SeqData (Sequence, seqdata, seqheader)
import qualified Data.ByteString.Lazy.Char8 as LB
import Data.Foldable (mapM_)
import qualified Data.Map.Strict as Map
-- | Convert a Sequence to a String tuple (sequence label, sequence)
sequenceToMapTuple :: Sequence a -> (String, String)
sequenceToMapTuple s = (LB.unpack $ seqheader s, LB.unpack $ seqdata s)
main = do
sequences <- readFasta "aa.fa"
-- Build the sequence map (by header)
let sequenceMap = Map.fromList $ map sequenceToMapTuple sequences
-- Lookup the sequence for the "abc123" header
mapM_ print $ Map.lookup "abc123" sequenceMap
Edit: Thanks to #GabrielGonzalez suggestion, the final example now uses Data.Foldable.mapM_ instead of Data.Maybe.fromJust
I an trying to process a file which contains russian symbols. When reading and after writing some text to the file I get something like:
\160\192\231\229\240\225\224\233\228\230\224\237
How can I get normal symbols?
If you are getting strings with backslashes and numbers in, then it sounds like you might be calling "print" when you want to call "putStr".
If you deal with Unicode, you might try utf8-string package
import System.IO hiding (hPutStr, hPutStrLn, hGetLine, hGetContents, putStrLn)
import System.IO.UTF8
import Codec.Binary.UTF8.String (utf8Encode)
main = System.IO.UTF8.putStrLn "Вася Пупкин"
However it didn't work well in my windows CLI garbling the output because of codepage. I expect it to work fine on other Unix-like systems if your locale is set correctly. However writing to file should be successfull on all systems.
UPDATE:
An example on encoding package usage.
I have got success.
{-# LANGUAGE ImplicitParams #-}
import Network.HTTP
import Text.HTML.TagSoup
import Data.Encoding
import Data.Encoding.CP1251
import Data.Encoding.UTF8
openURL x = do
x <- simpleHTTP (getRequest x)
fmap (decodeString CP1251) (getResponseBody x)
main :: IO ()
main = do
tags <- fmap parseTags $ openURL "http://www.trade.su/search?ext=1"
let TagText r = partitions (~== "<input type=checkbox>") tags !! 1 !! 4
appendFile "out" r