Perl6 (Rakudo) - How to handle special characters from file? - io

How can I read special characters from a external file ? Here a simple .txt file in French, which content is the first paragraph of https://fr.lipsum.com/ : as you can see on my screenshot, the file encoding is UTF-8 but the accents are not displayed correctly.
I tried various encodings within notepad++ and in my perl6 script, like these :
enc => "utf8"
enc => "latin1"
With Python or Ruby scripts I don't encounter the problem. I can't found any precise example about that matter, probably because perl 6 is still quite recent (??). Thank you.
My script as it is displayed in the screenshot :
my $text_contents = slurp "testfile.txt", enc => "utf8";
say $text_contents;
prompt;
Final edit : the solution is to enable an option, available in beta state with Windows 10 1803, to make the OS handle unicode characters properly : see answers and comments below ...

If you're not using Windows
This SO is either entirely or almost entirely irrelevant to you.
If you're using Windows 10
Check the "Beta: Use Unicode UTF-8 for worldwide language support" option checkbox.
At least at the time I originally wrote this answer, text near this Unicode related checkbox claimed it's for programs that do not support Unicode, but you should just ignore that.[1]
At the time I originally wrote this answer the checkbox was found under control panel, "Region" entry, "Administrative" tab, "Change system locale" button.
Microsoft may have changed this stuff since I wrote this answer, and may change it again, eg by moving and/or renaming the checkbox, or making things more involved than just clicking a single checkbox.
Per their comment below this answer, the OP notes:
For those who are interested in that particular option, it can be found in the "legacy" Control panel of windows -> Region -> Administrative -> Edit settings...
If you're using an older version of Windows
Arguably, the good news is that Raku and Rakudo have some of the world's best modern support for Unicode, and the OK news is that it relies on Microsoft correctly supporting Unicode, which they're now trying to do.
The bad news is that they made a lot of mistakes in older versions of Windows (and even in Windows 10, which they're now trying to fix), so any solution will be constrained by those mistakes. (Perhaps the biggest problem is Microsoft's doublespeak on the topic[1], but let's hope we can work around that.)
That all said, please read the following and then either return to searching for solutions or post a fresh SO question and we'll try to help.
Quoting Wikipedia's page Unicode in Microsoft Windows:
they are still in 2018 improving their operating system support for UTF-8
Microsoft got off on the wrong foot with their Unicode support last century. The good news is that they have at last begun digging their way out of the hole they dug for themselves and everyone else.
But they're definitely not there yet -- not at the time of originally writing this answer, and, I suspect not for another N years -- at least inasmuch as things don't work correctly out of the box for many end users. I think this is the root of most problems with Unicode on Windows.
Older languages like Python, Ruby and Perl came up with a range of hacks that hid the many problems with Microsoft's older UTF8 support from most users in simple scenarios by using what Microsoft ironically described as "Unicode support".
This has always come with the trade-off that things get very hairy or even completely unworkable for more complex applications in many locales around the world. (So much so that even the mighty Microsoft finally capitulated in 2018.)
In essence, until this new Microsoft effort to get with the program, software that ran on Windows has had no alternative but to either use their fundamentally broken "Unicode support" or to actually support Unicode properly.[1]
Raku and Rakudo focused on the latter, and problems with it when run on Windows are related to this conflicting with Microsoft's old broken approach. Fortunately Microsoft is now getting with the program and so we may be able find a way to get around problems you have with Unicode on Windows provided you are patient.
In particular, if you are using an older Windows version, please expect it to not work at first with modern Unicode aware software unless you are lucky. We'll still help if we can, but it'll likely involve you being patient with us and Microsoft and Rakudo and vice-versa.
Footnotes
[1] At the time I originally wrote this answer, there is text near the checkbox that it's for programs that do not support Unicode. This is entirely the opposite of what's really going on, but hey, it's Microsoft.

Related

TI-Basic Problems

I recently bought a TI-84 Plus CE, and have been making programs using TI-BASIC.
I'm trying to make a simple text editor, and I need to convert character codes to characters. However, it seems that the char() command doesn't exist?
Please Help!
I don't believe that 84+ TI-BASIC supports ascii in this way (though I know that 68k BASIC had the ord() command) but one thing you could do is store all the typeable glyphs into a string (see: prgmGLYPHS on TI-Basic Developer for example) and then use inString() and sub() to store/retrieve their values. It's not pretty, and it's not fast, but it works. Here's an example using only the uppercase letters:
:"ABCDEFGHIJKLMNOPQRSTUVWXYZ→Str1
:Input ">",Str2
:Disp inString(Str1,Str2
:Input ">",N
:Disp sub(Str1,N,1
Note: The following pertains to my experience with the TI-84+ SE. The 84+ CE runs on a newer processor than the zilog Z80, so YMMV:
I expect what you're doing is storing your text in a List. Another thing that might be more efficient and secure is storing your text as an AppVar. These are allocated blocks of RAM/ROM that you can read to/write to at will...as long as you have a library to do it. With the 84+ SE you needed to use Celtic3 (or Doors CS, which includes Celtic as a library) to do that. I haven't used the 84+ CE enough to tell you what exists there, as the assembly code is entirely different. According to this reddit post the best way to do that is use the C toolchain, but I don't have experience with this either.

Delphi caption is all in Chinese instead of English [duplicate]

We have large commercial app that we want to convert from Delphi 6 to 2010. Approx 10 3rd party component sets, all with source code... I have heard warnings about Unicode with 2010 - Does anyone have experience and or suggestions?
There are many resources available that you can read and that you will assist in the migration from Delphi 6 to Delphi 2009/2010 (Unicode).
You can use these articles as a guide.
Unicode Migration Statistics Tool (This utility will hopefully assist you in collecting useful statistics
on how hard (or not) it would be to migrate your older applications to
Unicode.)
Delphi 2009 and Unicode
Delphi 2009 strings explained by example
Upgrading a major project to Delphi 2009
Delphi and Unicode
Dr. Bob Delphi 2009 Unicode
Delphi 2009 - Unicode in Type Libraries
On Strings and Unicode in Delphi 2009
Delphi in a Unicode World Part I: What is Unicode, Why do you need it, and How do you work with it in Delphi?
Delphi in a Unicode World Part II: New RTL Features and Classes to Support Unicode
Delphi in a Unicode World Part III: Unicodifying Your Code
CodeRage 4 : Using Unicode and Other Encodings in your Programs
Bye.
You'll find some useful answers in these StackOverflow questions:
Move project from Delphi 3 to Delphi 2010
When and Why Should I Use TStringBuilder?
Convert function to delphi 2009/2010 (unicode)
Unicode problems with Delphi 2009 / 2010 and windows API calls
Also, for what it is worth, I purchased Marco Cantu's Delphi 2009 Handbook. It was all I needed to make a relatively smooth converstion from Delphi 4 to Delphi 2009 in only a few weeks.
I do, however, recommend that you ensure your 3rd party packages have a Delphi 2009 upgrade, or you may have some real difficulties. Converting your own code is one thing. Converting someone else's is another.
I use two 3rd party packages, both with source code. Both had upgrades available, and the developer of one of them wrote that he had a lot of trouble upgrading his very complex component to the Unicode of Delphi 2009. It took him a few months, but he completed it. And as a result, I had little trouble with my implementation of his component when I did my upgrade.
i've been in the same circumstance recently. you mostly need to pay attention to the "edges" of the app. INI files, file I/O, log files, etc. win API calls from delphi work since they've now connected the unicode API calls instead. check each 3rd party component set to make sure they're at least ready for Delphi 2009...better yet 2010. even my use of databases simply wasn't an issue...nearly everything worked right away. it just wasn't a big deal. anything that relies on the size of a character should be reviewed.
really the transition of most concern is 2007_or_earlier --> 2009_or_later.
there are plenty of discussions/blog entries about it. you could read, read, read...or you could get started & see what happens. (i did some of both). i'm sure there are "stack overflow" issues discussing your question. i'm not pretending to give a detailed description of what could happen.
it's simply not as scary as it sounds.
Approx 10 3rd party component sets, all with source code.
One thing I'd add is if the component doesn't support Delphi 2009/2010, don't try to upgrade it by hacking the code.
Following is what I posted on How do the new string types work in Delphi 2009/2010?:
See Delphi and Unicode, a white paper written by Marco Cantù and I guess
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!), written by Joel.
One pitfall is that the default Win32 API call has been mapped to use the W (wide string) version instead of the A (ANSI) version, for example ShellExecuteA If your code is doing tricky pointer code assuming internal layout of AnsiString, it will break. A fallback is to substitute PChar with PAnsiChar, Char with AnsiChar, string with AnsiString, and append A at the end of Win32 API call for that portion of code. After the code actually compiles and runs normally, you could refactor your code to use string (UnicodeString).
I must add this article from Carey Jensen to others mentioned. It is labeled: "Delphi Unicode Migration for Mere Mortals: Stories and Advice from the Front Lines" (in english).
http://www.danysoft.com/free/delphiunicodemigration.pdf
As you can see in the title of it, you will find many experiences and hints and tips. I think it is the answer to your question. After carefully read it, you will sure knows what to do next. :)
Found in: http://www.danysoft.com/productos/migrar-aplicaciones-a-delphi-xe-o-cbuilder-xe.html
Another point to take care of, is the usage of Variant types with strings and the VarType function testing for strings: one needs to use varUString instead of varString.
Assuming AValue is of type Variant and has being assigned a Unicode string value, the following won't work:
if VarType(AValue) = varString then ...
and needs to be changed to
if VarType(AValue) = varUString then ...

CFileDialog getpathName not reading Japanese

I have a folder name in Japanese. CFileDialog getpathNameis returning some question marks when the folder is selected. Is there some way to solve it?
If your app is build with MBCS support rather than Unicode support, the japanese path will be handled correctly only if your "Language for non-Unicode programs" (aka system locale) is set to Japanese, which is the case for your Japanese users but might not be the case for you if you are not Japanese.
If your system locale is not Japanese, the path is translated to your codepage before it is returned by GetPathName(). It will either contain replacement (?) chars or garbage. Most likely a mix of both.
Here are a few possibilities available:
Don't do anything. Your app should work fine for Japanese most users. Or not...
Test your app under a Japanese codepage. To do so, either temporarily change your Language for non-Unicode programs (requires a reboot) or (much easier) test your app under AppLocale. (Note: Yes, it runs fine under Windows 7. This article may help if you have problems).
Switch it to Unicode. According to the size of your codebase, this can be a very tedious task mostly depending on inputs and outputs and whether you use _T("blah") string literals in your code. Of course, there are more aspects to it but these ones are the most important. BTW, all new projects should be done with Unicode support in mind.
Handle this path problem specifically. Since we're speaking of a file dialog, the whole dialog should be opened as Unicode. Which means you'll probably have to explicitely call the Unicode version of the underlying Win32 API rather than simply CFileDialog. It's not so complicated but the risk is that you are only solving the first problem in a row. After you have your Japanese path correctly, you'll have to deal with Japanese text input by user,... So I don't think this solution is a good one.
Solution #2 is certainly the quickest way to identify small issues. Solution #3 is for sure the best one on the long run. But make sure you actually need it because it may be tedious for existing apps.

An Emacs alternative which exposes a text model to a scriptable environment?

everybody out there!
I'm currently seeking for a technical solution to create a nice literate programming environment. Unfortunately, most editors are too much hard coded, and their functionalities just cover most famous needs, and can't cleanly cover special needs.
I came to Emacs (later after some others), but I also came to numerous troubles with Emacs (I will not talk about these, this is not the topic).
However, there is one thing I like with Emacs and which was indeed matching what I was looking for: it exposes a full text model to a scriptable environment, and the overall UI is designed so it is well suited to either graphical UIs or text UIs (because it is mostly text based). And last but not least, this is scriptable with a kind of LISP, and LISP indeed seems a good choice to me, in the area of text manipulation and interpretation.
I've searched the web for a text editor which would expose a full text model to a scriptable environment, but I have not found anything. I guess this is not an everyday request on the web, so it is probably better to ask some humans about it, better than to ask a robot.
I was, but in short, I'm looking for: an editor which exposes a full text model [*], and which exposes this model to a script engine (preferably LISP, but I would enjoy Python as well, or any others after all).
[*] Talking about text model, I mean: text attributes (optionally font face), text visibility, text read-write property, and text content iteration, at a level as low as the character basis.
Have a nice day! :)
--
Yannick Duchêne
JEdit seems to be very scriptable with Java, BeanShell, Jython and other languages targeting the JVM. Most of its functionality is implemented with OSGI plugins. If you really like LISP, maybe you could even try with Clojure! :-)
Emacs, Climacs, Portable Hemlock (and to some extent Hemlock).
I am sure there are other editors around that exposes a full text model to a script engine that are NOT in "the emacs family", but I don't know them.
Oh, yes, there's the VMS editor framework, but I cannot recall its name.
What Vatine said, plus there's a very minimal Scheme editor built into Fluxus, which I extended with Emacs key-bindings (in my personal copy), so I know it would work as something close to a stubbed implementation (if you rip out all the OpenGL stuff).
Edit:
Looks like I was working with fluxus-0.8, which doesn't even seem to be on the site anymore. If you end up needing to go that low-level to start, let me know and I'll send it over.
Not sure if this is useful, but there is a long list of Emacs-like editors: http://www.finseth.com/emacs.html
Btw., Craig A. Finseth also wrote a book on implementing an Emacs-like editor: http://www.finseth.com/craft/
The Book as PDF.
Report of an (unsuccessfully) ending quest :
Although a possible technical choice I could figure will not work for me (see later), I still point it here, if this can ever be useful to someone running UNIX-Like (I'm running Windows).
Context and state of the “ art ” : near to all (or all) so called Emacsen and Emacs clone, has nothing to compare with Emacs. They just mimics terms like major mode an minor mode, mimics key-bindings, and most of time also, the UI look and feel. But the core is not there. I've learned these are called “ Emacs Ersatz ”.
Disclaimer : for some reasons, I have not tested Climax and Hemlock, so the latter comment does not apply to these.
EFuns : the last one I came to, was EFuns, but unfortunately, I could not compile it on Windows (I suspect something is wrong with the sources, some directory are missing in the archive). Interested parties may get it here : EFuns, an Emacs-like scripted in OCaml. Fortunately for UNIX-Like users, binaries are provided (not for Windows).
Implementations List : to complete the list Rainer Joswig pointed to, here is another one, shorter, but more up-to-date : [ Sorry I can't post this link, it seems I'm not allowed to post more than one link - I'm sorry for interested parties (sad) ]

Open source spell check

Was evaluating adding spell check to a product I own. As per my research the major decisions that need to be made:
The library to use.
Dictionary( this can be region specific, British english, American etc).
Exclusion lists. Anytime a typo is detected its possible that its not a typo but is
verbiage specific to the user. At this point the users should be given the ability to
add this to his custom exclusion list.
Besides a per user custom list also a list of exclusion based on the user space of the
clients of the tool. That is terms/acronyms in the users work domain. For example FX will not be a typo for currency traders.
The open questions I had are listed below and if I could get input into them that would be very useful.
For 1, I was thinking of hunspell, whcih is the open source library offered under MPL and is used by firefox and OpenOffice family of products. Any horror stories out there using this?
Any grey areas with the licensing? The spell checking will happen on a windows client.
Dictionaries are available from a variety of sources some free under MPL while some are not. Any suggestions on good sources for free dictionaries.
Multi lingual support and what needs to be worked out to support them?
For 4, how are custom dictionaries kept in sync with the server side and the clientside? The spell check needs to happen on the clientside so are they pushed down with the initial launch everytime or are they synced up ever so often?
As already mentioned Hunspell is a state of the art spell checker. It is the Open Office, Thunderbird, Firefox and Google Chrome spell checker. Ports to all major programming languages are available. It works with the Open Office Directories, so a lot of languages are supported.
I've used Hunspell for a few things, and I don't really have any horror stories with it. I've only used it with English (American) though, but it claims to work with other languages.
As for licensing, it offers a choice of GPL, LGPL, and MPL. If you don't like the MPL, you can always choose to use the LGPL.
There are several pupular options that widely used: myspell, aspell. Check them.
http://en.wikipedia.org/wiki/MySpell
http://en.wikipedia.org/wiki/GNU_Aspell
Here is a good demonstration by Peter Norvig: I find this simple explanation much more intuitive. Follow the links in the doc as well for more indepth analysis.
http://norvig.com/spell-correct.html

Resources