haskell convert unicode sequence to utf 8 - haskell
I am working on http client in haskell (that's my first "non exersize" project).
There is an api which returns json with all text using unicode, something like
\u041e\u043d\u0430 \u043f\u0440\u0438\u0432\u0435\u0434\u0435\u0442 \u0432\u0430\u0441 \u0432 \u0434\u043b\u0438\u043d\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a
I want to decode this json to utf-8, to print some data from json message.
I searched for existing libraries, but find Nothing for this purpose.
So I wrote function to convert data (I am using lazy bytestrings because I got data with this type from wreq lib)
ununicode :: BL.ByteString -> BL.ByteString
ununicode s = replace s where
replace :: BL.ByteString -> BL.ByteString
replace str = case (Map.lookup (BL.take 6 str) table) of
(Just x) -> BL.append x (replace $ BL.drop 6 str)
(Nothing) -> BL.cons (BL.head str) (replace $ BL.tail str)
table = Map.fromList $ zip letters rus
rus = ["Ё", "ё", "А", "Б", "В", "Г", "Д", "Е", "Ж", "З", "И", "Й", "К", "Л", "М",
"Н", "О", "П", "Р", "С", "Т", "У", "Ф", "Х", "Ц", "Ч", "Ш", "Щ", "Ъ", "Ы",
"Ь", "Э", "Ю", "Я", "а", "б", "в", "г", "д", "е", "ж", "з", "и", "й", "к",
"л", "м", "н", "о", "п", "р", "с", "т", "у", "ф", "х", "ц", "ч", "ш", "щ",
"ъ", "ы", "ь", "э", "ю", "я"]
letters = ["\\u0401", "\\u0451", "\\u0410", "\\u0411", "\\u0412", "\\u0413",
"\\u0414", "\\u0415", "\\u0416", "\\u0417", "\\u0418", "\\u0419",
"\\u041a", "\\u041b", "\\u041c", "\\u041d", "\\u041e", "\\u041f",
"\\u0420", "\\u0421", "\\u0422", "\\u0423", "\\u0424", "\\u0425",
"\\u0426", "\\u0427", "\\u0428", "\\u0429", "\\u042a", "\\u042b",
"\\u042c", "\\u042d", "\\u042e", "\\u042f", "\\u0430", "\\u0431",
"\\u0432", "\\u0433", "\\u0434", "\\u0435", "\\u0436", "\\u0437",
"\\u0438", "\\u0439", "\\u043a", "\\u043b", "\\u043c", "\\u043d",
"\\u043e", "\\u043f", "\\u0440", "\\u0441", "\\u0442", "\\u0443",
"\\u0444", "\\u0445", "\\u0446", "\\u0447", "\\u0448", "\\u0449",
"\\u044a", "\\u044b", "\\u044c", "\\u044d", "\\u044e", "\\u044f"]
But it doesn't work as I expected. It replaces text, but instead of cyrrilic letters I got something like
345 ?C1;8:C5< 8=B5#2LN A #4=52=8:>2F0<8 8=B5#5A=KE ?#>D5AA89 8 E>118
The second problem that I can't debug my function.
When I try just call it with custom string I got error Data.ByteString.Lazy.head: empty ByteString
I gave no idea about reason why it's empty.
It work's fine during normal program execution:
umailGet env params = do
r <- apiGet env (("method", "umail.get"):params)
x <- return $ case r of
(Right a) -> a
(Left a) -> ""
return $ ununicode $ x
and than in Main
r2 <- umailGet client []
print $ r2
And the last problem is that all api can return any unicode symbol, so this solution is bad by design.
Of course function implementation seems to be bad to, so after solving the main problem, I am going to rewrite it using foldr.
UPDATED:
It seems like I had desribed problem not enough clear.
So I am sending request via wreq lib, and get a json answer. For example
{"result":"12","error":"\u041d\u0435\u0432\u0435\u0440\u043d\u044b\u0439 \u0438\u0434\u0435\u043d\u0442\u0438\u0444\u0438\u043a\u0430\u0442\u043e\u0440 \u0441\u0435\u0441\u0441\u0438\u0438"}
That's not the result of haskell representetion of result, thare are real ascii symbols. I got the same text using curl or firefox. 190 bytes/190 ascii symbols.
Using this site for example http://unicode.online-toolz.com/tools/text-unicode-entities-convertor.php I can convert it to cyrrilic text {"result":"12","error":"Неверный идентификатор сессии"}
And I need to implement something like this service using haskell (or find a package where it had been already implemented), where response like this has type Lazy Bytestring.
I also tried to change types to use Text instead of ByteString (both Lazy and strict), changed first line to ununicode s = encodeUtf8 $ replace $ L.toStrict $ LE.decodeUtf8 s
And with that new implementation I am getting an error when executing my program
Data.Text.Internal.Fusion.Common.head: Empty stream. Sot it looks like I have error in my replacing function, maybe if I fix it, it also will fix the main problem.
I am not sure if you are falling in the "print unicode" trap (see here) - for en/decoding there already exists hackage: Data.Text.Encoding decodeUtf8 :: ByteString -> Text and encodeUtf8 :: Text -> ByteString should do the task.
Edit:
I have played around with text/bytestring for some time to reproduce your "\u1234" characters - well i couldn't
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Data.Text (Text)
import qualified Data.Text.Encoding as E
import qualified Data.Text.IO as T
import Data.ByteString (ByteString)
import qualified Data.ByteString.Char8 as B
inputB :: ByteString
inputB = "ДЕЖЗИЙКЛМНОПРСТУФ"
inputT :: Text
inputT = "ДЕЖЗИЙКЛМНОПРСТУФ"
main :: IO ()
main = do putStr "T.putStrLn inputT: " ; T.putStrLn inputT
putStr "B.putStrLn inputB: " ; B.putStrLn inputB
putStr "print inputB: " ; print inputB
putStr "print inputT: " ; print inputT
putStr "B.putStrLn $ E.encodeUtf8 inputT: " ; B.putStrLn $ E.encodeUtf8 inputT
putStr "T.putStrLn $ E.decodeUtf8 inputB: " ; T.putStrLn $ E.decodeUtf8 inputB
putStr "print $ E.decodeUtf8 inputB: " ; print $ E.decodeUtf8 inputB
putStr "print $ E.encodeUtf8 inputT: " ; print $ E.encodeUtf8 inputT
here is the result of it:
T.putStrLn inputT: ДЕЖЗИЙКЛМНОПРСТУФ
B.putStrLn inputB:
rint inputB: "\DC4\NAK\SYN\ETB\CAN\EM\SUB\ESC\FS\GS\RS\US !\"#$"
print inputT: "\1044\1045\1046\1047\1048\1049\1050\1051\1052\1053\1054\1055\1056\1057\1058\1059\1060"
B.putStrLn $ E.encodeUtf8 inputT: ДЕЖЗИЙКЛМНОПРСТУФ
T.putStrLn $ E.decodeUtf8 inputB:
rint $ E.decodeUtf8 inputB: "\DC4\NAK\SYN\ETB\CAN\EM\SUB\ESC\FS\GS\RS\US !\"#$"
print $ E.encodeUtf8 inputT: "\208\148\208\149\208\150\208\151\208\152\208\153\208\154\208\155\208\156\208\157\208\158\208\159\208\160\208\161\208\162\208\163\208\164"
honestly I don't know why I get the "rint" lines after the bytestring printlines that yield no result.
Related
putStr and putStrLn messing with the output [duplicate]
This question already has an answer here: Wrong IO actions order using putStr and getLine (1 answer) Closed 1 year ago. I was learning haskell from Learn You a Haskell For Great Good book. There was this code import Control.Monad import Data.Char main = forever $ do putStr "Give me some input: " l <- getLine putStrLn $ map toUpper l when i am running this code in gitbash at first it is just asking for any input after giving the input text and hitting enter( say the input text was soham) it is showing Give me some input: SOHAM. Then i changed the code to import Control.Monad import Data.Char main = forever $ do putStrLn "Give me some input: " l <- getLine putStrLn $ map toUpper l and after running it is showing me Give me some input: and asking for an input. after giving the same input soham it is showing SOHAM Again changing the code to import Control.Monad import Data.Char main = forever $ do putStr "Give me some input: " l <- getLine putStr $ map toUpper l It is just taking input again and again and when i am pressing the end of file key(ctrl+C) it is showing all the output one after another side by side but the out puts are like the original code. Why such variations are happening ?
This is likely due to buffering: with LineBuffering it will flush in whenever a new line is output. This thus means that if you use putStr, and the string itself does not contain a new line '\n' character, it will buffer the ouput and wait until a new line is written to actually write the output to the console You can set it to NoBuffering to write the content immediately to the console. You can change the buffering settings for the stdout with hSetBuffering :: Handle -> BufferMode -> IO (): import Control.Monad import Data.Char import System.IO main = do hSetBuffering stdout NoBuffering forever $ do putStr "Give me some input: " l <- getLine putStrLn $ map toUpper l another option is to flush the buffer only for putStr with hFlush :: Handle -> IO (), and thus not change the buffering policy itself: import Control.Monad import Data.Char import System.IO main = do $ forever putStr "Give me some input: " hFlush stdout l <- getLine putStrLn $ map toUpper l
How to output progress information in spite of Haskell's laziness? [duplicate]
This question already has an answer here: GHCi and compiled code seem to behave differently (1 answer) Closed 1 year ago. Today I want Haskell to behave like any imperative language, look at this: import Data.HashMap.Strict as HashMap import Data.Text.IO import Data.Text import Data.Functor ((<&>)) putStr "Reading data from file ..." ls <- lines <$> readFile myFile putStrLn " done." putStr "Processing data ..." let hmap = HashMap.fromList $ ls <&> \l -> case splitOn " " l of [k, v] -> (k, v) _ -> error "expecting \"key value\"" putStrLn " done." Basically, the user should know what the program is doing at the moment. The result of this code is the immediate output of > Reading data from file ... done. > Sorting data ... done. ... and then it starts doing the actual work, the output defeating its purpose. I am well aware that it's a feature. Haskell is declarative and order of evaluation is determined by actual dependencies, not by line numbers in my .hs-file. Thus I try the following approach: putStr "Reading data from file ..." lines <- lines <$> readFile myFile putStrLn $ lines `seq` " done." putStr "Processing data ..." let hmap = HashMap.fromList $ ls <&> \l -> case splitOn " " l of [k, v] -> (k, v) _ -> error "expecting \"key value\"" putStrLn $ hmap `seq` " done." The idea: seq only returns once its first argument has been evaluated to Weak Head Normal Form. And it works, kind of. The output of my program is now nothing for a while and then, once the work as been done, all the IO occurs. Is there a way out of this? EDIT: I changed the question in reply to Ben's answer. The imports should now make more sense and the program really runs. DanielWagner commented about this related question: GHCi and compiled code seem to behave differently which indeed solves my problem. putStrLn $ hmap `seq` " done." does exactly what it's supposed to. I am only missing flushing stdout. So this actually does what I need: putStr "Reading data from file ..." hFlush stdout -- from System.IO lines <- lines <$> readFile myFile putStrLn $ lines `seq` " done." putStr "Processing data ..." hFlush stdout let hmap = HashMap.fromList $ ls <&> \l -> case splitOn " " l of [k, v] -> (k, v) _ -> error "expecting \"key value\"" putStrLn $ hmap `seq` " done."
You haven't given us the actual code that you say has this behaviour: The output of my program is now nothing for a while and then, once the work as been done, all the IO occurs. How do I know it's not the code you're running? Your code doesn't compile in order to be run at all! A few problems: You get a type error from lines, because it's in the standard Prelude but that version works on String, and you're working with Text. You haven't imported splitOn from anywhere The obvious splitOn to import is from Data.Text, but that has type Text -> Text -> [Text] i.e. it returns a list of Text splitting at all occurrences of the separator. You're obviously expecting a pair, splitting only on the first separator. So at the very minimum this is code you were running in ghci after more imports/definitions that you haven't shown us. Changing it as little as I could and get it to run gave me this: {-# LANGUAGE OverloadedStrings #-} import qualified Data.HashMap.Strict as HashMap import qualified Data.Text.IO as StrictIO import qualified Data.Text as Text myFile = "data.txt" main = do putStr "Reading data from file ..." lines <- Text.lines <$> StrictIO.readFile myFile putStrLn $ lines `seq` " done." putStr "Processing data ..." let hmap = HashMap.fromList $ Text.breakOn " " <$> lines putStrLn $ hmap `seq` " done." I generated a very simple data file with 5,000,000 lines and ran the program with runhaskell foo.hs, and there are in fact noticeable pauses between the appearance of the reading/processing messages and the "done" appearing on each line. I see no reason why all of the IO would be delayed appear at once (including the result of the first putStrLn. How are you actually running this code (or rather, the full and/or different code that actually runs)? In the post you've written it as input for GHCi rather than a full program (judging by the imports and IO statements at the same level, with no do block or definition of any top level functions). The only thought I had is that perhaps your data file is much smaller such that the processing takes a barely perceptible amount of time, and the initial startup processing of the Haskell code itself by ghci or runhaskell is the only noticeable delay; then I can imagine there being a slight delay followed by the printing of all the messages seemingly at once.
Handling exceptions (ExceptT) in chain of actions
I am trying to use an exception to skip parts of the code here. Instead of getting caught by catcheE and resuming normal behavior all following actions in the mapM_ chain get skipped. I looked at this question and it appears that catchE ~ main and checkMaybe ~ intercept. I also checked the implementation of mapM_to be sure it does what i want it to, but i don't understand how the Left value can escape dlAsset to affect the behavior of mapM_. I refactored this from a version where i simply used an empty string as an exception marker for the failed lookup. In that version checkMaybe just returned a Right value immediately and it worked (matching on "" to 'catch') import Data.HashMap.Strict as HM hiding (map) import qualified Data.ByteString.Lazy as BS import qualified Data.ByteString.Char8 as BSC8 import qualified JSONParser as P -- my module retrieveAssets :: (Text -> Text) -> ExceptT Text IO () retrieveAssets withName = withManager $ (lift ((HM.keys . P.assets) <$> P.raw) ) >>= mapM_ f where f = \x -> dlAsset x "0.1246" (withName x) dlAsset :: Text -> Text -> Text -> ReaderT Manager (ExceptT Text IO) () dlAsset name size dest = do req <- lift $ (P.assetLookup name size <$> P.raw) >>= checkMaybe name >>= parseUrl . unpack -- lookup of a url res <- httpLbs req lift $ (liftIO $ BS.writeFile (unpack dest) $ responseBody res) `catchE` (\_ -> return ()) -- always a Right value? where checkMaybe name a = case a of Nothing -> ExceptT $ fmap Left $ do BSC8.appendFile "./resources/images/missingFiles.txt" $ BSC8.pack $ (unpack name) ++ "\n" putStrLn $ "lookup of " ++ (unpack name) ++ " failed" return name Just x -> lift $ pure x (had to reformat to become somewhat readable here) edit: i'd like to understand what actually happens here, that would probably help me more than knowing which part of the code is wrong.
The problem is that your call to catchE only covered the very last line of dlAsset. It needs to be moved to the left of the do-notation indentation level to cover all of the do notation.
Elegant haskell case/error handling in sequential monads
Because I oversimplified in my other question before, I would like to give a more clear example here. How can I handle situations where I have to check for certian conditions in a sequential way without nesting multiple cases? With "sequential way" I mean getting a value (e.g. from stdin), checking this value for a certain condition and depending on the outcome getting another value and so on. Example: sequen :: IO String sequen = do a <- getLine case a of "hi" -> do putStrLn "hello!" b <- getLine case b of "how are you?" -> do putStrLn "fine, thanks" return "nice conversation" _ -> return "error 2" _ -> return "error 1" I know that there are better ways to write such a chat bot, it should just demonstrate the sequential nature of the problem. As you can see, with every nested case, the code also gets indented deeper. Is there a way to better structure such code? I'm thinking of handling the "errors" on one place and describing the "success-path" without the error handling distributed all over it.
Of course. This is precisely what EitherT was made for. You can get it from Control.Monad.Trans.Either in the eitherT package. import Control.Monad.Trans.Class import Control.Monad.Trans.Either main = do e <- runEitherT $ do a <- lift getLine case a of "hi" -> lift $ putStrLn "hello!" _ -> left 1 b <- lift getLine case b of "how are you?" -> lift $ putStrLn "fine, thanks!" _ -> left 2 return "nice conversation" case e of Left n -> putStrLn $ "Error - Code: " ++ show n Right str -> putStrLn $ "Success - String: " ++ str EitherT aborts the current code block whenever it encounters a left statement, and people typically use this to indicate error conditions. The inner block's type is EitherT Int IO String. When you runEitherT it, you get IO (Either Int String). The Left type corresponds to the case where it failed with a left and the Right value means it successfully reached the end of the block.
I wrote a series of posts a while back going over my own learnings of the Either & EitherT types. You can read it here: http://watchchrislearn.com/blog/2013/12/01/working-entirely-in-eithert/ I use the errors package to get a bunch of nice helpers around using EitherT (left and right functions for instance to return lifted versions of Left and Right). By extracting your potential failure conditions into their own helpers, you can make the mainline of your code read totally sequentially, with no case statements checking results. From that post, you can see how the runEitherT section is a sequential chunk of work, it just happens to have the failure mechanics of EitherT. Obviously this code is fairly contrived to show how MaybeT plays inside of EitherT as well. In real code it'd just be the story you were wanting to tell, with a single Left/Right at the end. import Control.Error import Control.Monad.Trans -- A type for my example functions to pass or fail on. data Flag = Pass | Error main :: IO () main = do putStrLn "Starting to do work:" result <- runEitherT $ do lift $ putStrLn "Give me the first input please:" initialText <- lift getLine x <- eitherFailure Error initialText lift $ putStrLn "Give me the second input please:" secondText <- lift getLine y <- eitherFailure Pass (secondText ++ x) noteT ("Failed the Maybe: " ++ y) $ maybeFailure Pass y case result of Left val -> putStrLn $ "Work Result: Failed\n " ++ val Right val -> putStrLn $ "Work Result: Passed\n " ++ val putStrLn "Ok, finished. Have a nice day" eitherFailure :: Monad m => Flag -> String -> EitherT String m String eitherFailure Pass val = right $ "-> Passed " ++ val eitherFailure Error val = left $ "-> Failed " ++ val maybeFailure :: Monad m => Flag -> String -> MaybeT m String maybeFailure Pass val = just $ "-> Passed maybe " ++ val maybeFailure Error _ = nothing
Since you are necessarily in the IO monad, you are better off using the IO monad's error handling capabilities instead of stacking an error monad on top of IO. It avoids all of the heavy lifting: import Control.Monad ( unless ) import Control.Exception ( catch ) import Prelude hiding ( catch ) import System.IO.Error ( ioeGetErrorString ) main' = do a <- getLine unless (a == "hi") $ fail "error 1" putStrLn "hello!" b <- getLine unless (b == "how are you?") $ fail "error 2" putStrLn "fine, thanks" return "nice conversation" main = catch main' $ return . ioeGetErrorString In this case, your errors are simply Strings, which are thrown by IO's fail, as a userError. If you want to throw some other type, you will need to use throwIO instead of fail.
At some point the EitherT package was deprecated (though transformers-either offers a similar API). Fortunately there's an alternative to EitherT that doesn't even require installing a separate package. The standard Haskell installation comes with the Control.Monad.Trans.Except module (from the transformers package, which is bundled with GHC), which behaves almost identically to EitherT. The resulting code is almost identical to the code in Gabriella Gonzalez's answer, but using runExceptT instead of runEitherT and throwE instead of left. import Control.Monad.Trans.Class import Control.Monad.Trans.Except main = do e <- runExceptT $ do a <- lift getLine case a of "hi" -> lift $ putStrLn "hello!" _ -> throwE 1 b <- lift getLine case b of "how are you?" -> lift $ putStrLn "fine, thanks!" _ -> throwE 2 return "nice conversation" case e of Left n -> putStrLn $ "Error - Code: " ++ show n Right str -> putStrLn $ "Success - String: " ++ str (Note that the aforementioned transformers-either package is in fact a wrapper for ExceptT designed for providing compatibility with code that still uses EitherT.)
Warning: fellow Haskell newbie answering. You can avoid this sort of staircasing with the Maybe monad. Good example at the start of this chapter However, you'd want something similar with a monadic Either (presumably there is one) since you're returning error codes. The basic idea being that once you've got a "Left 1" error you'll short-circuit any future steps (because of lazy evaluation).
Idiomatic way to conditionally process IO in Haskell
I'm writing a little shell script in Haskell which can take an optional argument. However, if the argument is not present, I'd like to get a line from stdin in which to ask for a value. What would be the idiomatic way to do this in Haskell? #!/usr/bin/env runhaskell import Control.Applicative ((<$>)) import Data.Char (toLower) import IO (hFlush, stdout) import System.Environment (getArgs) main :: IO () main = do args <- getArgs -- here should be some sort of branching logic that reads -- the prompt unless `length args == 1` name <- lowerCase <$> readPrompt "Gimme arg: " putStrLn name lowerCase = map toLower flushString :: String -> IO () flushString s = putStr s >> hFlush stdout readPrompt :: String -> IO String readPrompt prompt = flushString prompt >> getLine Oh, and if there's a way to do it with something from Control.Applicative or Control.Arrow I'd like to know. I've become quite keen on these two modules. Thanks!
main :: IO () main = do args <- getArgs name <- lowerCase <$> case args of [arg] -> return arg _ -> readPrompt "Gimme arg: " putStrLn name
This doesn't fit your specific use case, but the question title made me think immediately of when from Control.Monad. Straight from the docs: when :: Monad m => Bool -> m () -> m () Conditional execution of monadic expressions. Example: main = do args <- getArgs -- arg <- something like what FUZxxl did.. when (length args == 1) (putStrLn $ "Using command line arg: " ++ arg) -- continue using arg... You can also use when's cousin unless in similar fashion.