I'm on Linux, and I just started using Dyalog APL. However, I would like to use the Control key instead of the Super key to input special characters. How can I do this?
It depends on your exact platform and so a full answer is too long to embed here. Is all explained in detail on apl.wiki/Typing_glyphs_on_Linux.
Related
I recently bought a TI-84 Plus CE, and have been making programs using TI-BASIC.
I'm trying to make a simple text editor, and I need to convert character codes to characters. However, it seems that the char() command doesn't exist?
Please Help!
I don't believe that 84+ TI-BASIC supports ascii in this way (though I know that 68k BASIC had the ord() command) but one thing you could do is store all the typeable glyphs into a string (see: prgmGLYPHS on TI-Basic Developer for example) and then use inString() and sub() to store/retrieve their values. It's not pretty, and it's not fast, but it works. Here's an example using only the uppercase letters:
:"ABCDEFGHIJKLMNOPQRSTUVWXYZ→Str1
:Input ">",Str2
:Disp inString(Str1,Str2
:Input ">",N
:Disp sub(Str1,N,1
Note: The following pertains to my experience with the TI-84+ SE. The 84+ CE runs on a newer processor than the zilog Z80, so YMMV:
I expect what you're doing is storing your text in a List. Another thing that might be more efficient and secure is storing your text as an AppVar. These are allocated blocks of RAM/ROM that you can read to/write to at will...as long as you have a library to do it. With the 84+ SE you needed to use Celtic3 (or Doors CS, which includes Celtic as a library) to do that. I haven't used the 84+ CE enough to tell you what exists there, as the assembly code is entirely different. According to this reddit post the best way to do that is use the C toolchain, but I don't have experience with this either.
I'm trying to implement Uniscribe for Linux to display complex text, such as Arabic. It seems a huge task to do.
What I need to do is to implement the APIs below:
Check if the string is Complex.
Get string width.
Get next segment.
Get next break.
...
I want to use/refer to open source code to do it.
I have read HarfBuzz source code for weeks, but didn't get the APIs for them. Is it feasible to use ONLY HarfBuzz to implement them?
It seems I should use Pango, but I can't do it due to its license. Is there any substitution? MIT license is OK.
Is ICU helpful to me?
Check out https://github.com/HOST-Oman/libraqm as smallest thing that might meet your needs.
How can I read special characters from a external file ? Here a simple .txt file in French, which content is the first paragraph of https://fr.lipsum.com/ : as you can see on my screenshot, the file encoding is UTF-8 but the accents are not displayed correctly.
I tried various encodings within notepad++ and in my perl6 script, like these :
enc => "utf8"
enc => "latin1"
With Python or Ruby scripts I don't encounter the problem. I can't found any precise example about that matter, probably because perl 6 is still quite recent (??). Thank you.
My script as it is displayed in the screenshot :
my $text_contents = slurp "testfile.txt", enc => "utf8";
say $text_contents;
prompt;
Final edit : the solution is to enable an option, available in beta state with Windows 10 1803, to make the OS handle unicode characters properly : see answers and comments below ...
If you're not using Windows
This SO is either entirely or almost entirely irrelevant to you.
If you're using Windows 10
Check the "Beta: Use Unicode UTF-8 for worldwide language support" option checkbox.
At least at the time I originally wrote this answer, text near this Unicode related checkbox claimed it's for programs that do not support Unicode, but you should just ignore that.[1]
At the time I originally wrote this answer the checkbox was found under control panel, "Region" entry, "Administrative" tab, "Change system locale" button.
Microsoft may have changed this stuff since I wrote this answer, and may change it again, eg by moving and/or renaming the checkbox, or making things more involved than just clicking a single checkbox.
Per their comment below this answer, the OP notes:
For those who are interested in that particular option, it can be found in the "legacy" Control panel of windows -> Region -> Administrative -> Edit settings...
If you're using an older version of Windows
Arguably, the good news is that Raku and Rakudo have some of the world's best modern support for Unicode, and the OK news is that it relies on Microsoft correctly supporting Unicode, which they're now trying to do.
The bad news is that they made a lot of mistakes in older versions of Windows (and even in Windows 10, which they're now trying to fix), so any solution will be constrained by those mistakes. (Perhaps the biggest problem is Microsoft's doublespeak on the topic[1], but let's hope we can work around that.)
That all said, please read the following and then either return to searching for solutions or post a fresh SO question and we'll try to help.
Quoting Wikipedia's page Unicode in Microsoft Windows:
they are still in 2018 improving their operating system support for UTF-8
Microsoft got off on the wrong foot with their Unicode support last century. The good news is that they have at last begun digging their way out of the hole they dug for themselves and everyone else.
But they're definitely not there yet -- not at the time of originally writing this answer, and, I suspect not for another N years -- at least inasmuch as things don't work correctly out of the box for many end users. I think this is the root of most problems with Unicode on Windows.
Older languages like Python, Ruby and Perl came up with a range of hacks that hid the many problems with Microsoft's older UTF8 support from most users in simple scenarios by using what Microsoft ironically described as "Unicode support".
This has always come with the trade-off that things get very hairy or even completely unworkable for more complex applications in many locales around the world. (So much so that even the mighty Microsoft finally capitulated in 2018.)
In essence, until this new Microsoft effort to get with the program, software that ran on Windows has had no alternative but to either use their fundamentally broken "Unicode support" or to actually support Unicode properly.[1]
Raku and Rakudo focused on the latter, and problems with it when run on Windows are related to this conflicting with Microsoft's old broken approach. Fortunately Microsoft is now getting with the program and so we may be able find a way to get around problems you have with Unicode on Windows provided you are patient.
In particular, if you are using an older Windows version, please expect it to not work at first with modern Unicode aware software unless you are lucky. We'll still help if we can, but it'll likely involve you being patient with us and Microsoft and Rakudo and vice-versa.
Footnotes
[1] At the time I originally wrote this answer, there is text near the checkbox that it's for programs that do not support Unicode. This is entirely the opposite of what's really going on, but hey, it's Microsoft.
In my application I have unicode strings, I need to tell in which language the string is in,
I want to do it by narrowing list of possible languages by determining in which range the characters of string are.
Ranges I have from http://jrgraphix.net/research/unicode_blocks.php
And possible languages from http://unicode-table.com/en/
The problem is that algorithm has to detect all languages, does someone know more wide mapping of unicode ranges to languages ?
Thanks
Wojciech
This is not really possible, for a couple of reasons:
Many languages share the same writing system. Look at English and Dutch, for example. Both use the Basic Latin alphabet. By only looking at the range of code points, you simply cannot distinguish between them.
Some languages use more characters, but there is no guarantee that a
specific piece of text contains them. German, for example, uses the
Basic Latin alphabet plus "ä", "ö", "ü" and "ß". While these letters
are not particularly rare, you can easily create whole sentences
without them. So, a short text might not contain them. Thus, again,
looking at code points alone is not enough.
Text is not always "pure". An English text may contain French letters
because of a French loanword (e.g. "déjà vu"). Or it may contain
foreign words, because the text is talking about foreign things (e.g.
"Götterdämmerung is an opera by Richard Wagner...", or "The Great
Wall of China (万里长城) is..."). Looking at code points alone would be
misleading.
To sum up, no, you cannot reliably map code point ranges to languages.
What you could do: Count how often each character appears in the text and heuristically compare with statistics about known languages. Or analyse word structures, e.g. with Markov chains. Or search for the words in dictionaries (taking inflection, composition etc. into account). Or a combination of these.
But this is hard and a lot of work. You should rather use an existing solution, such as those recommended by deceze and Esailija.
I like the suggestion of using something like google translate -- as they will be doing all the work for you.
You might be able to build a rule-based system that gets you part of the way there. Build heuristic rules for languages and see if that is sufficient. Certain Tibetan characters do indicate Tibetan, and there are unique characters in many languages that will be a give away. But as the other answer pointed out, a limited sample of text may not be that accurate, as you may not have a clear indicator.
Languages will however differ in the frequencies that each character appears, so you could have a basic fingerprint of each language you need to classify and make guesses based on letter frequency. This probably goes a bit further than a rule-based system. Probably a good tool to build this would be a text classification algorithm, which will do all the analysis for you. You would train an algorithm on different languages, instead of having to articulate the actual rules yourself.
A much more sophisticated version of this is presumably what Google does.
I have a windows DLL that currently only supports ASCII and I need to update it to work with Unicode strings. This DLL currently uses char* strings in a number of places, along with making a number of ASCII Windows API calls (like GetWindowTextA, RegQueryValueExA, CreateFileA, etc).
I want to switch to using the unicode/ascii macros defined in VC++. So instead of char or CHAR I'd use TCHAR. For char* I'd use LPTSTR. And I think things like sprintf_s would be changed to _stprintf_s.
I've never really dealt with unicode before, so I'm wondering if there are any common pitfalls I should look out for while doing this. Should it just be as simple as replacing the types and method names with the proper macros be enough, or are there other complications to look out for?
First read this article by Joel Spolsky: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Then run through these links on Stack Overflow: What do I need to know about Unicode?
Generally, you are looking for any code that assumes one character = one byte (memory/buffer allocation, etc). But the links above will give you a pretty good rundown of the details.
The biggest danger is likely to be buffer sizes. If your memory allocations are made in terms of sizeof(TCHAR) you'll probably be OK, but if there is code where the original programmer was assuming that characters were 1 byte each and they used integers in malloc statements, that's hard to do a global search for.