System.Directory.getDirectoryContents unicode support - haskell

The following code prints something like °Ð½Ð´Ð¸Ñ-ÐÑпаниÑ
getDirectoryContents "path/to/directory/that/contains/files/with/nonASCII/names"
>>= mapM_ putStrLn
Looks like it is a ghc bug and it is fixed already in repository. But what to do until everybody upgrade ghc?
The last time I encountered such the problem (it was few years ago, btw), I used utf8-string package to convert strings, but I don't remember how I did it, and ghc unicode support was changed visibly last years.
So, what is the best (or at least working) way to get directory contents with full unicode support?
ghc version 7.0.4
locale en_US.UTF-8

Here's a simple workaround using decodeString and encodeString from utf8-string.
import System.Directory
import qualified Codec.Binary.UTF8.String as UTF8
main = do
getDirectoryContents "." >>= mapM_ (putStrLn . UTF8.decodeString)
putStrLn "------------"
readFile (UTF8.encodeString "brøken-file-nåme.txt") >>= putStrLn
Output:
.
..
brøken-file-nåme.txt
Broken.hs
------------
hello

I would recommend looking at system-filepath, which provides an abstract datatype for representing filepaths. I've used it extensively for some internal code and it works wonderfully.

Related

Split a string by a chosen character in haskell

I'm trying to split a string every time there is a chosen character. So if I receive "1,2,3,4,5", and my chosen character is "," the result is a list such as ["1","2","3","4","5"].
I've been looking through the already answered questions in here and they point me to using splitOn. However, when i try to import Data.List.Split in order to use it, Haskell gives me the following error: Could not find module ‘Data.List.Split’ . When I tried to just use splitOnwithout importing the module, it showed me Variable not in scope: splitOn.
So my questions are,
Is it normal that i'm getting this error? Is splitOn a viable option or should I just try something else?
What other simple solutions are there?
I can just write something that will do this for me but I'm wondering why I'm not able to import Data.List.Split and if there are other simpler options out there that I'm not seeing. Thank you!
If you're using GHC it comes with the standard Prelude and the modules in the base package, and perhaps a few other packages.
Most packages, like the split package (which contains the Data.List.Split module), aren't part of GHC itself. You'll have to import them as an explicit compilation step. This is easiest done with a build tool. Most Haskellers use either Cabal or Stack.
With Stack, for example, you can add the split package to your package.yaml file:
dependencies:
- base >= 4.7 && < 5
- split
You can also load an extra package when you use Stack to start GHCi. This is useful for ad-hoc experiments.
‘Data.List.Split’ is not in prelude and needs to be installed as a dependency package.
Install command depends on environment you are using:
‘stack install split’ for stack
‘cabal install split’ for cabal
Basically this is a foldring job. So you may simply do like
λ> foldr (\c (s:ss) -> if c == ',' then "":s:ss else (c:s):ss) [""] "1,2,3,42,5"
["1","2","3","42","5"]
So;
splitOn x = foldr (\c (s:ss) -> if c == x then "":s:ss else (c:s):ss) [""]
However this will give us reasonable but perhaps not wanted results such as;
λ> splitOn ',' ",1,2,3,42,5,"
["","1","2","3","42","5",""]
In this particular case it might be nice to trim the unwanted characters off of the string in advance. In Haskell though, this functionality i guess conventionally gets the name
dropAround :: (Char -> Bool) -> String -> String
dropAround b = dropWhile b . dropWhileEnd b
λ> dropAround (==',') ",1,2,3,42,5,"
"1,2,3,42,5"
accordingly;
λ> splitOn (',') . dropAround (==',') $ ",1,2,3,42,5,"
["1","2","3","42","5"]

In TemplateHaskell, how do I figure out that an imported module has been renamed?

I am writing a bit of TemplateHaskell for stringing together QuickCheck style specifications. I require every module containing properties to export a symbol called ''axiom_set''. Then, my checkAxioms function finds all the ''axiom_set'' symbols from modules imported where I call checkAxioms.
checkAxioms :: DecsQ
checkAxioms = do
ModuleInfo ms <- reifyModule =<< thisModule
forM_ ms $ \mi#(Module _ m) -> do
runIO . print =<< lookupValueName (modString m ++ ".axiom_set")
The above code should find all the imported "axiom_set" symbols. However, if Module.Axioms defines axiom_set but that I imported as follows
import Module.Axioms as MA
my code can't find MA.axiom_set. Any advice?
I don't think there's a way to do that. This seems to be a limitation of TemplateHaskell.
It kinda makes sense to have only fully qualified names in ModuleInfos list of imported modules, but the fact that we can't use those fully qualified names in lookupValueName is bad. I think we need a variant of lookupValueName or lookupName that takes a Module as argument.
I suggest openning an issue at GHC issue tracker: https://ghc.haskell.org/trac/ghc/newticket We have ongoing work to improve TH in the next major release. Part of the work is about improving the package documentation, exported functions etc. This can be one of the improvements.

Error reading and writing same file simultaneously in Haskell

I need to modify a file in-place. So I planned to read file contents, process them, then write the output to the same file:
main = do
input <- readFile "file.txt"
let output = (map toUpper input)
-- putStrLn $ show $ length output
writeFile "file.txt" output
But the problem is, it works as expected only if I uncomment the 4th line - where I just output number of characters to console. If I don't uncomment it, I get
openFile: resource busy (file is locked)
Is there a way to force reading of that file?
The simplest thing might be strict ByteString IO:
import qualified Data.ByteString.Char8 as B
main = do
input <- B.readFile "file.txt"
B.writeFile "file.txt" $ B.map toUpper input
As you can see, it's the same code -- but with some functions replaced with ByteString versions.
Lazy IO
The problem that you're running into is that some of Haskell's IO functions use "Lazy IO", which has surprising semantics. In almost every program I would avoid lazy IO.
These days, people are looking for replacements to Lazy IO like Conduit and the like, and lazy IO is seen as an ugly hack which unfortunately is stuck in the standard library.

Syntax What does $$ mean in Haskell?

"Ugh," you might think... "Another syntax question, here let me google that for you noob." But alas! I have googled it, and I am still stumped!
Found in this code from the yesod blog
import System.IO
import Data.Enumerator
import Data.Enumerator.Binary
main =
withFile "output.txt" WriteMode $ \output ->
run_ $ enumFile "input.txt" $$ iterHandle output
However the "$$" operator is new to me. The Haskell 2010 report only mentions it once as an operator symbol. What does it do?
In Haskell, operators like $$ are not part of the syntax, they are user-definable functions. Hence, you need to look up the API documenation for Yesod to see what $$ is. In particular, the function $$ from your example is documented here.
There's Hoogle, which is pretty good but unfortunately doesn't know many packages.
Hayoo knows much more, but its interface seems quirky, and it doesn't seem to offer a command-line tool like hoogle does.
If you have an idea what package you're dealing with, you can directly go to its documentation—e.g. the docs of the enumerator package, with the module list at the bottom. Also, these docs always have an index, and let you view the source code via the source links.
As a last resort, use cabal unpack enumerator and grep through the code.
Just use hoogle and be sure to tell it what packages you are using - it works fine.
http://haskell.org/hoogle/?hoogle=%28%24%24%29+%2Benumerator

Need a tutorial for using GHC to parse and typecheck Haskell

I'm working on a project for analyzing Haskell code. I decided to use GHC to parse the source and infer types rather than write my own code to do that. Right now, I'm slogging through the Haddock docs, but it's slow going. Does anyone know of a good tutorial?
EDIT: To clarify, I'm not looking for something like hlint. I'm writing my own tool to analyze the runtime characteristics of Haskell code, so it's like I'm writing a different hlint. What I'm looking for is basically an expansion of the wiki page GHC As a library.
Ah! found a much better entry point into the docs at:
http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-6.12.1/GHC.html
I updated the wikipage with this example:
Here we demonstrate calling parseModule, typecheckModule, desugarModule, getNamesInScope, and getModuleGraph. This works for haskell-platform, ghc-6.12.1.
bugs: libdir is hardcoded. See ghc-paths above.
--A.hs
--invoke: ghci -package ghc A.hs
import GHC
import Outputable
--import GHC.Paths ( libdir )
import DynFlags ( defaultDynFlags )
libdir = "/usr/local/lib/ghc-6.12.1"
targetFile = "B.hs"
main = do
res <- example
print $ showSDoc ( ppr res )
example =
defaultErrorHandler defaultDynFlags $ do
runGhc (Just libdir) $ do
dflags <- getSessionDynFlags
setSessionDynFlags dflags
target <- guessTarget targetFile Nothing
setTargets [target]
load LoadAllTargets
modSum <- getModSummary $ mkModuleName "B"
p <- parseModule modSum
t <- typecheckModule p
d <- desugarModule t
l <- loadModule d
n <- getNamesInScope
c <- return $ coreModule d
g <- getModuleGraph
mapM showModule g
return $ (parsedSource d,"/n-----/n", typecheckedSource d)
--B.hs
module B where
main = print "Hello, World!"
Adam, this is pretty tough sledding. Ever since its launch in 2006, the GHC API has been somewhat underdocumented. What I would recommend is to try to find some small applications that have been written using the GHC API. The right place to ask is probably the GHC users' mailing list.
One such program is ghctags, which ships with the GHC source tree. I wrote the original version, but I can't recommend it—there are so many footprints on the code that I can no longer follow it. The best I can say is that although it's hard to follow, it's at least small and hard to follow—much simpler than all of GHC.
If parsing is the most important thing, I recommend haskell-src-exts. It parses all of Haskell98, a whole pile of extensions and is very easy to use.
The Haskell wiki and GHC documentation probably has what you are looking for if you search for the articles. Also a tool you might be interested in is hlint, for checking the syntax and other things about your source code.

Resources