I have a folder name in Japanese. CFileDialog getpathNameis returning some question marks when the folder is selected. Is there some way to solve it?
If your app is build with MBCS support rather than Unicode support, the japanese path will be handled correctly only if your "Language for non-Unicode programs" (aka system locale) is set to Japanese, which is the case for your Japanese users but might not be the case for you if you are not Japanese.
If your system locale is not Japanese, the path is translated to your codepage before it is returned by GetPathName(). It will either contain replacement (?) chars or garbage. Most likely a mix of both.
Here are a few possibilities available:
Don't do anything. Your app should work fine for Japanese most users. Or not...
Test your app under a Japanese codepage. To do so, either temporarily change your Language for non-Unicode programs (requires a reboot) or (much easier) test your app under AppLocale. (Note: Yes, it runs fine under Windows 7. This article may help if you have problems).
Switch it to Unicode. According to the size of your codebase, this can be a very tedious task mostly depending on inputs and outputs and whether you use _T("blah") string literals in your code. Of course, there are more aspects to it but these ones are the most important. BTW, all new projects should be done with Unicode support in mind.
Handle this path problem specifically. Since we're speaking of a file dialog, the whole dialog should be opened as Unicode. Which means you'll probably have to explicitely call the Unicode version of the underlying Win32 API rather than simply CFileDialog. It's not so complicated but the risk is that you are only solving the first problem in a row. After you have your Japanese path correctly, you'll have to deal with Japanese text input by user,... So I don't think this solution is a good one.
Solution #2 is certainly the quickest way to identify small issues. Solution #3 is for sure the best one on the long run. But make sure you actually need it because it may be tedious for existing apps.
Related
I recently bought a TI-84 Plus CE, and have been making programs using TI-BASIC.
I'm trying to make a simple text editor, and I need to convert character codes to characters. However, it seems that the char() command doesn't exist?
Please Help!
I don't believe that 84+ TI-BASIC supports ascii in this way (though I know that 68k BASIC had the ord() command) but one thing you could do is store all the typeable glyphs into a string (see: prgmGLYPHS on TI-Basic Developer for example) and then use inString() and sub() to store/retrieve their values. It's not pretty, and it's not fast, but it works. Here's an example using only the uppercase letters:
:"ABCDEFGHIJKLMNOPQRSTUVWXYZ→Str1
:Input ">",Str2
:Disp inString(Str1,Str2
:Input ">",N
:Disp sub(Str1,N,1
Note: The following pertains to my experience with the TI-84+ SE. The 84+ CE runs on a newer processor than the zilog Z80, so YMMV:
I expect what you're doing is storing your text in a List. Another thing that might be more efficient and secure is storing your text as an AppVar. These are allocated blocks of RAM/ROM that you can read to/write to at will...as long as you have a library to do it. With the 84+ SE you needed to use Celtic3 (or Doors CS, which includes Celtic as a library) to do that. I haven't used the 84+ CE enough to tell you what exists there, as the assembly code is entirely different. According to this reddit post the best way to do that is use the C toolchain, but I don't have experience with this either.
How can I read special characters from a external file ? Here a simple .txt file in French, which content is the first paragraph of https://fr.lipsum.com/ : as you can see on my screenshot, the file encoding is UTF-8 but the accents are not displayed correctly.
I tried various encodings within notepad++ and in my perl6 script, like these :
enc => "utf8"
enc => "latin1"
With Python or Ruby scripts I don't encounter the problem. I can't found any precise example about that matter, probably because perl 6 is still quite recent (??). Thank you.
My script as it is displayed in the screenshot :
my $text_contents = slurp "testfile.txt", enc => "utf8";
say $text_contents;
prompt;
Final edit : the solution is to enable an option, available in beta state with Windows 10 1803, to make the OS handle unicode characters properly : see answers and comments below ...
If you're not using Windows
This SO is either entirely or almost entirely irrelevant to you.
If you're using Windows 10
Check the "Beta: Use Unicode UTF-8 for worldwide language support" option checkbox.
At least at the time I originally wrote this answer, text near this Unicode related checkbox claimed it's for programs that do not support Unicode, but you should just ignore that.[1]
At the time I originally wrote this answer the checkbox was found under control panel, "Region" entry, "Administrative" tab, "Change system locale" button.
Microsoft may have changed this stuff since I wrote this answer, and may change it again, eg by moving and/or renaming the checkbox, or making things more involved than just clicking a single checkbox.
Per their comment below this answer, the OP notes:
For those who are interested in that particular option, it can be found in the "legacy" Control panel of windows -> Region -> Administrative -> Edit settings...
If you're using an older version of Windows
Arguably, the good news is that Raku and Rakudo have some of the world's best modern support for Unicode, and the OK news is that it relies on Microsoft correctly supporting Unicode, which they're now trying to do.
The bad news is that they made a lot of mistakes in older versions of Windows (and even in Windows 10, which they're now trying to fix), so any solution will be constrained by those mistakes. (Perhaps the biggest problem is Microsoft's doublespeak on the topic[1], but let's hope we can work around that.)
That all said, please read the following and then either return to searching for solutions or post a fresh SO question and we'll try to help.
Quoting Wikipedia's page Unicode in Microsoft Windows:
they are still in 2018 improving their operating system support for UTF-8
Microsoft got off on the wrong foot with their Unicode support last century. The good news is that they have at last begun digging their way out of the hole they dug for themselves and everyone else.
But they're definitely not there yet -- not at the time of originally writing this answer, and, I suspect not for another N years -- at least inasmuch as things don't work correctly out of the box for many end users. I think this is the root of most problems with Unicode on Windows.
Older languages like Python, Ruby and Perl came up with a range of hacks that hid the many problems with Microsoft's older UTF8 support from most users in simple scenarios by using what Microsoft ironically described as "Unicode support".
This has always come with the trade-off that things get very hairy or even completely unworkable for more complex applications in many locales around the world. (So much so that even the mighty Microsoft finally capitulated in 2018.)
In essence, until this new Microsoft effort to get with the program, software that ran on Windows has had no alternative but to either use their fundamentally broken "Unicode support" or to actually support Unicode properly.[1]
Raku and Rakudo focused on the latter, and problems with it when run on Windows are related to this conflicting with Microsoft's old broken approach. Fortunately Microsoft is now getting with the program and so we may be able find a way to get around problems you have with Unicode on Windows provided you are patient.
In particular, if you are using an older Windows version, please expect it to not work at first with modern Unicode aware software unless you are lucky. We'll still help if we can, but it'll likely involve you being patient with us and Microsoft and Rakudo and vice-versa.
Footnotes
[1] At the time I originally wrote this answer, there is text near the checkbox that it's for programs that do not support Unicode. This is entirely the opposite of what's really going on, but hey, it's Microsoft.
Finally! We're starting to require that all our input files are encoded in utf-8! This is something we've been wanting to do for years. Unfortunately, we suck at it since none of us have ever tried it and most of us are Windows programmers or are used to operating systems where utf-8 is the only real option anyway; neither group knows anything about reading utf-8 strings in a platform agnostic way.
So we started to look at how to deal with utf-8 in a platform agnostic way and found that its pretty confusing (because Windows) and the other questions I've found here on stackoverflow don't really seem to cover our scenario or they are confusing. I found a reference to https://www.codeproject.com/Articles/38242/Reading-UTF-with-C-streams which, I find, is a bit confusing and contains a great deal of fluff.
So a few assumptions (that must be true or we're in a state of GIGO)
All files are in utf-8 (yay!)
The std::strings must contain utf-8; no conversion allowed.
The solution must be locale agnostic and work on both macOS (10.13+), Windows (10+), Android and iOS 10+.
Stream support is not required; we're dealing with local files only (for now), but support for streams is appreciated.
We're trying to avoid using std::wstring if we can and I see no reason to use it anyway. We're also trying to avoid using any third party libraries which do not use utf-8 encoded std::string; using a custom string with functions that overloads and converts all std::string arguments to the a custom string is acceptable.
Is there any way to do this using just the standard C++ library? Preferably just by imbuing the global locale with a facet that tells the stream library to just dump content of files in strings (using custom delimiters as usual); no conversion allowed.
This question is only about reading utf-8 files into std::strings and storing the content as utf-8 encoded strings. Dealing with Windows APIs and such is a separate concern.
C++17 is available.
UTF-8 is just a sequence of bytes that follow a specific encoding. If you read a sequence of bytes that is legitimate UTF-8 data into a std::string, then the string contains UTF-8 data.
There's nothing special you have to actually do to make this happen. This works like any other C or C++ file loading. Just don't mess around with iostream locales and you'll be fine.
We have large commercial app that we want to convert from Delphi 6 to 2010. Approx 10 3rd party component sets, all with source code... I have heard warnings about Unicode with 2010 - Does anyone have experience and or suggestions?
There are many resources available that you can read and that you will assist in the migration from Delphi 6 to Delphi 2009/2010 (Unicode).
You can use these articles as a guide.
Unicode Migration Statistics Tool (This utility will hopefully assist you in collecting useful statistics
on how hard (or not) it would be to migrate your older applications to
Unicode.)
Delphi 2009 and Unicode
Delphi 2009 strings explained by example
Upgrading a major project to Delphi 2009
Delphi and Unicode
Dr. Bob Delphi 2009 Unicode
Delphi 2009 - Unicode in Type Libraries
On Strings and Unicode in Delphi 2009
Delphi in a Unicode World Part I: What is Unicode, Why do you need it, and How do you work with it in Delphi?
Delphi in a Unicode World Part II: New RTL Features and Classes to Support Unicode
Delphi in a Unicode World Part III: Unicodifying Your Code
CodeRage 4 : Using Unicode and Other Encodings in your Programs
Bye.
You'll find some useful answers in these StackOverflow questions:
Move project from Delphi 3 to Delphi 2010
When and Why Should I Use TStringBuilder?
Convert function to delphi 2009/2010 (unicode)
Unicode problems with Delphi 2009 / 2010 and windows API calls
Also, for what it is worth, I purchased Marco Cantu's Delphi 2009 Handbook. It was all I needed to make a relatively smooth converstion from Delphi 4 to Delphi 2009 in only a few weeks.
I do, however, recommend that you ensure your 3rd party packages have a Delphi 2009 upgrade, or you may have some real difficulties. Converting your own code is one thing. Converting someone else's is another.
I use two 3rd party packages, both with source code. Both had upgrades available, and the developer of one of them wrote that he had a lot of trouble upgrading his very complex component to the Unicode of Delphi 2009. It took him a few months, but he completed it. And as a result, I had little trouble with my implementation of his component when I did my upgrade.
i've been in the same circumstance recently. you mostly need to pay attention to the "edges" of the app. INI files, file I/O, log files, etc. win API calls from delphi work since they've now connected the unicode API calls instead. check each 3rd party component set to make sure they're at least ready for Delphi 2009...better yet 2010. even my use of databases simply wasn't an issue...nearly everything worked right away. it just wasn't a big deal. anything that relies on the size of a character should be reviewed.
really the transition of most concern is 2007_or_earlier --> 2009_or_later.
there are plenty of discussions/blog entries about it. you could read, read, read...or you could get started & see what happens. (i did some of both). i'm sure there are "stack overflow" issues discussing your question. i'm not pretending to give a detailed description of what could happen.
it's simply not as scary as it sounds.
Approx 10 3rd party component sets, all with source code.
One thing I'd add is if the component doesn't support Delphi 2009/2010, don't try to upgrade it by hacking the code.
Following is what I posted on How do the new string types work in Delphi 2009/2010?:
See Delphi and Unicode, a white paper written by Marco Cantù and I guess
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!), written by Joel.
One pitfall is that the default Win32 API call has been mapped to use the W (wide string) version instead of the A (ANSI) version, for example ShellExecuteA If your code is doing tricky pointer code assuming internal layout of AnsiString, it will break. A fallback is to substitute PChar with PAnsiChar, Char with AnsiChar, string with AnsiString, and append A at the end of Win32 API call for that portion of code. After the code actually compiles and runs normally, you could refactor your code to use string (UnicodeString).
I must add this article from Carey Jensen to others mentioned. It is labeled: "Delphi Unicode Migration for Mere Mortals: Stories and Advice from the Front Lines" (in english).
http://www.danysoft.com/free/delphiunicodemigration.pdf
As you can see in the title of it, you will find many experiences and hints and tips. I think it is the answer to your question. After carefully read it, you will sure knows what to do next. :)
Found in: http://www.danysoft.com/productos/migrar-aplicaciones-a-delphi-xe-o-cbuilder-xe.html
Another point to take care of, is the usage of Variant types with strings and the VarType function testing for strings: one needs to use varUString instead of varString.
Assuming AValue is of type Variant and has being assigned a Unicode string value, the following won't work:
if VarType(AValue) = varString then ...
and needs to be changed to
if VarType(AValue) = varUString then ...
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I'm struggling to deliver an project to a client. The job is to package files into an archive; simple, right? Well, the files have (and must have) french characters in their names. I'm archiving from the linux command line, she's opening from the desktop on windows.
At first, I tried 'zip', and it didn't work out. Character support appears to vary by implementation from what I've read here on StackOverflow. While unpacking, the resulting files didn't look right to me (Ubuntu Archive Manager) or to her ( WinZip, Windows ).
We next tried tar. Finally, things appear normal for me, but still not ok to the client ( trying PeaZip and 7zip for Windows).
Going into this, I really didn't expect this to be a problem. French speaking computer users must archive things, what are they using?
Any insight or assistance with this would be greatly appreciated. Thanks!
ZIP traditionally encodes filenames using IBM437 encoding. However to my knowledge, many tools (incorrectly) tend to use the default encoding on the system, which will likely cause problems in such a situation, because both ends might use different encodings.
In theory ZIP also supports UTF-8 by now, which should resolve these problems, but again tool-support will be the problem. For example as far as I know the ZIP support of Windows Explorer won't be able to handle UTF-8 encoded filenames.
So we end up with this: both ends have to agree about the encoding used for filenames and you will need an encoding that supports all the characters you have (any Unicode encoding will be fine, I'm not sure about IBM437 though). ZIP came a long way and thus there are many tools which tend to disagree about encoding. If possible, explicitly specify the encoding to use and prefer Unicode. In terms of compatibility with arbitrary tools you might be better off, using a newer format that is designed with Unicode in mind.
7-Zip supports it since 4.58 beta, according to the change log, but will only use it, when the local code page doesn't support the required characters. Using the -mcu command line switch will use UTF-8 for anything but ASCII. The local encodings usually differ only on the non-ASCII character range, so this will most likely do the trick. That is, if the tool used for unpacking also supports UTF-8 (which is more likely for 7-ZIP than for ZIP, because it isn't as old as ZIP and there are fewer unpacking tools).
WinRAR might also be worth a try.
Try using an archive program that allows you to specify the character encoding (say, UTF-8), or figuring out how to do it with the one you have. This forum thread might help you, because it's similar to what you're asking, albeit in reverse and for German rather than French: http://sourceforge.net/projects/sevenzip/forums/forum/45797/topic/3710172
Alternatively... You could nuke the accented characters. If francophones are on the receiving end of the file transfer, they may or may not be sympathetic (ask your users!).
French doesn't have that all many accents to worry about, really. You have [ae]-grave, e-aigue, [aeiou]-circumflex and c-cedilla to worry about, capital and lower (though that's more likely for the grave and aigue ones, unless someone hit the capslock key)
Tar has a --transform option. If you create a sed pattern to turn every iso-latin-1 accented aeiou and c character to the unaccented versions, you'll probably be okay.
I think you should go with compression in 7z format.
Under Linux it can be done using PeaZip, or by installing p7zip and using it through an UI like Ark or Filereoller depending on your desktop (I prefer PeaZip because it can be used on any desktop).
7z format was designed ground up with UTF8 in mind (the author is Russian), and in my exeperience it never failed.