Why are unpack and show defined differently in Data.Text (and behave differently for non-ASCII characters?) - haskell

unpack and show are two ways to convert Text to a String. They, however, behave and are defined differently for non-ASCII characters:
Prelude Data.Text> putStrLn $ unpack $ pack "你好我的朋友"
你好我的朋友
Prelude Data.Text> putStrLn $ show $ pack "你好我的朋友"
"\20320\22909\25105\30340\26379\21451"
With show, I believe, returning a string of codepoints, while unpack displays the actual characters. I have found this to be a nuisance while coding, as I had defined functions that take a Show instance and wanted to pass in Text, and expected it to return the actual non-ASCII characters as a String.
What was the design intent for this behavior? Why were show and unpack defined differently?
The source can be found at http://hackage.haskell.org/packages/archive/text/0.11.1.5/doc/html/src/Data-Text.html.

This is a general thing about Show: it's intended rather to produce a kind of preview of objects that can double as a portable serialisation, readable as Haskell code. Obviously, 你好我的朋友 is not valid Haskell (unless you define it as a variable, which you actually can!), so it would not be acceptable as output of show. It would be quite ok if it produced "你好我的朋友" (in fact, I would prefer that), but this might cause Platform etc. problems when you're not throughoutly using full UTF-8 in all of your work chain, so the safer expansion to ASCII was chosen.
If you want the nice non-escaped plain-string output as the GHCi echo, you can use the new custom-pretty-printer feature. I already wrote something about that here.

Related

Preprocessor for haskell source: is cpp the only option?

I can see from plenty of Q&As that cpp is the usual preprocessor for Haskell source; but that it isn't a good fit for the job. What other options are there?
Specifically:
Haskell syntax is newline-sensitive and space/indent-sensitive -- unlike C, so cpp just tramples on whitespace;
' in Haskell might surround a character literal, but also might be part of an identifier (in which case it won't be paired) -- but cpp complains if not a char literal;
\ gets a trailing space inserted -- which is not a terrible inconvenience, but I'd prefer not.
I'm trying to produce a macro to generate an instance from parameters for a newtype type and corresponding data constructor. It needs to generate both the instance head and constraints and a method binding. By just slotting the constructors into an instance skeleton.
(Probably Template Haskell could do this; but it seems rather a large hammer.)
cpphs seems to be just about enough for my (limited) purposes. I'm adding this answer for the record; an answer suggesting cpphs (and some sensible advice to prefer Template Haskell) was here and then gone.
But there's some gotchas that meant at first sight I'd overlooked how it helped.
Without setting any options, it behaves too much like cpp to be helpful. At least:
It doesn't complain about unpaired '. Indeed you can #define dit ' and that will expand happily.
More generally, it doesn't complain about any nonsense input: it grimly carries on and produces some sort of output file without warning you about ill-formed macro calls.
It doesn't insert space after \.
By default, it smashes together multiline macro expansions, so tramples on whitespace just as much.
Its tokenisation seems to get easily confused between Haskell vs C. specifically, using C-style comments /* ... */ seems to upset not only those lines, but a few lines below. (I had a #define I wanted to comment out; should have used Haskell style comments {- ... -} -- but then that appears in the output.)
The calling convention for macros is C style, not Haskell. myMacro(someArg) -- or myMacro (someArg) seems to work; but not myMacro someArg. So to embed a macro call inside a Haskell expression probably needs surrounding the lot in extra parens. Looks like (LISP).
A bare macro call on a line by itself myInstance(MyType, MyConstr) would not be valid Haskell. The dear beastie seems to get easily confused, and fails to recognise that's a macro call.
I'm nervous about # and ## -- because in cpp they're for stringisation and catenation. I did manage to define (##) = (++) and it seemed to work; magicHash# identifiers seemed ok; but I didn't try those inside macro expansion.
Remedies
(The docos don't make this at all obvious.)
To get multi-line output from a multi-line macro def'n, and preserving spaces/indentation (yay!) needs option --layout. So I have my instance definition validly expanded and indented.
If your tokenisation is getting confused, maybe --text will help: this will "treat input as plain text, not Haskell code" -- although it does still tolerate ' and \ better. (I didn't encounter any downsides from using --text -- the Haskell code seemed to get through unscathed, and the macros expanded.)
If you have a C-style comment that you don't want to appear in output, use --strip.
There's an option --hashes, which I imagine might interact badly with magicHash#.
The output file starts with a header line #line .... The compiler won't like that; suppress with --noline.
I would say that Template Haskell is the most perfect tool for this purpose. It is the standard set of combinators for constructing correct Haskell source code. After that there is GHC.Generics, which might allow you to write a single instance that would cover any type which is an instance of Generic.

Meaning of single colon in Haskell :t

As far as I've been able to gather, the single colon in Haskell is used in list comprehension. Why then does it show up in the :t command? Also in the :quit command? There isn't any list comprehension being done, is there?
The :t (short for :type) syntax is special to GHCi, and is not part of the Haskell language syntax. This is similar to how the SQLite interpreter accepts .tables as a command, even though this isn't valid a SQL statement. If you type :?, you can see a complete list of all the commands GHCi understands.
As for using the colon in actual Haskell code:
A colon by itself is a list constructor. This is a reserved name, and can never be redefined.
You should know that function names always start lowercase, while constructor names always start uppercase. Well, in a similar way, an infix constructor must start with a colon, whereas a normal infix operator must not start with a colon (but may contain colons elsewhere).
So, for example, "?:?" is a legal operator name, and :?? is a legal constructor operator name.
x ?:? y = ...whatever...
data Foobar = Int :?? Bool

Haskell the function 'main' is not defined?

Here is my basic program, but it states the function 'main' is not defined in module 'Main' how can I fix this?
here is my program
main = do
-- variable
a <- getLine
putStrLn a
Your code is missing indentation, Haskell uses indentation to figure out where a block ends.
main = do
a <- getLine
putStrLn a
Above is the proper indented form of your code; you should probably read the article here which explains it far better than I.
This error message means simply that the compiler didn't find a definition of your function main.
To run your compiled program, rather than interact with it in ghci (which I'd recommend you do as a beginner), you need main::IO ().
If you don't give your module a name, it automagically does the equivalent of inserting module Main where at the top of your file.
I can't think of any way to produce this error other than to
accidentally comment out main with -- or {- other comment syntax -}
spell the word main incorrectly
accidentally compile an empty file.
(
Although your question appears to show incorrect indentation, that's because this site does not treat tabs as 8 characters wide. I suspect you indented the main by four spaces to get it to format as code in your question. In any case the compiler didn't give an error message consistent with an indentation error.
I'd like to recommend you use spaces rather than tabs for indentation, as it's unfailingly irritating to have to debug the whitespace of your program.
Most editors can be configured to turn a tab key press into an appropriate number of spaces, giving you the same line-it-up functionality with none of the character count discrepancies.
)

How to make Haskell or ghci able to show Chinese characters and run Chinese characters named scripts?

I want to make a Haskell script to read files in my /home folder. However there are many files named with Chinese characters, and Haskell and Ghci cannot manage it. It seems Haskell and Ghci aren't good at displaying UTF-8 characters.
Here is what I encountered:
Prelude> "让Haskell或者Ghci能正确显示汉字并且读取汉字命名的文档"
"\35753Haskell\25110\32773Ghci\33021\27491\30830\26174\31034\27721\23383\24182\19988\35835\21462\27721\23383\21629\21517\30340\25991\26723"
Prelude> putStrLn "\35753Haskell\25110\32773Ghci\33021\27491\30830\26174\31034\27721\23383\24182\19988\35835\21462\27721\23383\21629\21517\30340\25991\26723"
让Haskell或者Ghci能正确显示汉字并且读取汉字命名的文档
GHC handles unicode just fine. These are the things you should know about it:
It uses your system encoding for converting from byte to characters and back when reading from or writing to the console. Since it did the conversion from bytes to characters properly in your example, I'd say your system encoding is set properly.
The show function on String has a limited output character set. The show function is used by GHCI to print the result of evaluating an expression, and by the print function to convert the value passed in to a String representation.
The putStr and putStrLn functions are for actually writing a String to the console exactly as it was provided to them.
Thanks to Carl, i used putStrLn as a wrapper around my fuction:
ghci> let removeNonUppercase st = [c | c <- st, c `elem` ['А'..'Я']]
ghci> putStrLn (removeNonUppercase "Ха-ха-ха! А-ха-ха!")
ХА
Everything works fine!

GHCi usage question

I am studying Haskell and use Emacs+Haskell mode as my editor.
After playing some simple expressions in GHCi, I am wondering whether these IDE/editor functionality that exist in Visual Stuido for F#:
Can I send the content in the clipboard into the interpreter? Currently I can only :load the file into the interpreter. This is inconvienent when I gradually write functions in a script file. Like 'Alt+Enter' in visual stuido.
After compiling, I hope to see the signature of the function, e.g.
let double x = x + x
so that I can better understand the type inference mechanism in Haskell.
On Windows, there's WinGHCi, a gui including (poor, but often sufficient) support for copy and paste. Dunno about the command line version.
Use :type double (or the shortcut :t double) to get the type signature of double. There's also :info which applies to values (including functions) as well as types and typeclasses (e.g. :info Bool lists the definition of Bool and all typeclasses it is an instance of) and says where it was defined.
Regarding question 2, to see the inferred type of an expression every time you type one in, you can give inside ghci :set +t . I think you could also put that in a .ghci file, inside your home directory, as described in http://www.haskell.org/ghc/docs/6.12.2/html/users_guide/ghci-dot-files.html .
As far as I know, there is no support for sending the clipoards to the interpreter "out of the box", but it should not take more than couple of lines of elisp. I'd look in the support modes for other languages and copied it from there if I were you.
Regarding the types, you could type C-c C-t or C-c C-i on any symbol in your code, which would trigger ":t <symbol>" and ":i <symbol>" commands in the ghci process
TAIM claims to send selected expressions in vim to ghci(haven't tried it)
I'm not sure about function signatures inside the editor but in ghci its ":t func"
Actually looking at their youtube video it looks like TAIM may be able to select ":t func" in vim and send it to interpreter.

Resources