I'm working with a large codebase that is using a mix of ANSI characters and Unicode characters. Currently, there are like 13 home-rolled string classes in the project along with many home-rolled string manipulation functions (not sure why someone needed to write convert_to_upper that uses toupper).
I would like to standardize our string handling and from my past work with MFC, I really think using CAtlString is the best solution.
Ignoring the "I hate Microsoft" reasons, why would you suggest not using it and what alternative would you suggest?
One of the reasons I like CAtlSting is the easy handling of string conversion.
CAtlString myString;
myString = "Hello from ANSI";
myString = L"Hello from Unicode";
CAtlStringA ansiString("ANSI");
CAtlStringW unicodeString(ansiString); //Automatic translation to wide string
This kind of flexibility means we don't have to worry about what the string is. You just use the object.
Thanks
Related
I currently have a string as follows which I received through an API call:
\n\nIt\U2019s a great place to discover Berlin and a comfortable place
to come home to.
And I want to convert it into something like this which is more readable:
It's a great place to discover Berlin and a comfortable place to come
home to.
I've taken a look at this post, but that's manually writing down every conversion, and there may be more of these unicode scalar characters introduced.
What I understand is \u{2019} is unicode scalar, but the format for this is \U2019 and I'm quite confused. Are there any built in methods to do this conversion?
This answer suggests using the NSString method stringByFoldingWithOptions.
The Swift String class has a concept called a "view" which lets you operate on the string under different encodings. It's pretty neat, and there are some views that might help you.
If you're dealing with strings in Swift, read this excellent post by Mike Ash. He discusses the idea of what a string really is with great detail and has some helpful hints for Swift 2.
Assuming you are already splitting the string and can get the offending format separately:
func convertFormat(stringOrig: String) -> Character {
let subString = String(stringOrig.characters.split("U").map({$0})[1])
let scalarValue = Int(subString)
let scalar = UnicodeScalar(scalarValue!)
return Character(scalar)
}
This will convert the String "\U2019" to the Character represented by "\u{2019}".
We have to upgrade to XE2 (from Delphi6).
I collected many informations about this, but one of them isn't clear for me.
We are using String - what is AnsiString in XE.
As I know we must replace all (P)Ansi[String/Char] in our libraries to avoid the side effects of Unicode converts, and to we can compile our projects.
It is ok, but we are also using TStringList, and I don't found any TAnsiStringList class to change it simply... ;-)
What do you know about this? Can this cause problems too? Or this class have an option to preserve the strings?
(Ok, it seems to be 3 questions, but it is one only)
The program / OS language is hungarian, the charset is WIN-1250, what have some strange characters, like Ő, and Ű...
Thanks for your every information, link, etc.
1) 1st of all - WHY should u use AnsiStringList, rather than converting all your project to unicode-aware TStringList ? That should have certain detailed reasons, to suggest viable alternatives.
Unicode is a superset of windows-1250, windows-1251 and such.
Normally all you locale-specific string would be just losslessly converted to Unicode. IT is the opposite, Unicode to AnsiString, convertion that may loose data.
Explicit or implicit (like AnsiChar reduction in "if char-var in char-set")
You may have type-unsafe API like in DLLs, where compiler cannot check if you pass PChar or PAnsiChar, but you anyway should not pass objects liek TStrings into DLLs, there are BPLs for that.
So you probably just do not need TAnsiStringList
2) you can take TJclAnsiStringList from Jedi Code Library
3) You can use XE2 stock TList<AnsiString> type
I am coding a plugin for autodesk 3dsmax and they recommend to use the _T(x) macro for every string literal to make it work with unicode as well. I am using the stl string class a lot in this code. So do I have to rewrite the code: string("foo") to: string(_T("foo")) ? Actually the stl string class doesnt have a constructor for wchars, so it doesnt make sense, does it?
Thx
Look at the definition of "T" macro - it expands to "L" in "Unicode" builds or nothing in "non-Unicode" builds. If you want to keep using the string calss and follow the recommendation for your plugin, your best bet is to use something like tstring which would follow the same rules.
But the truth is - all this "T" business made a lot of sense 10 years ago - all modern Windows versions are Unicode-only and you can just use wstring.
You could create an own string class say xstring and use the _T for constants and then internally, depending on unicode or not switch to string or wstring. either that or instantiate xstring<yourchartype>
As part of my probably wrong and cumbersome solution to print out a form I have taken a MS-Word document, saved as XML and I'm trying to store that XML as a groovy string so that I can ${fillOutTheFormProgrammatically}
However, with MS-Word documents being as large as they are, the String is 113100 unicode characters and Groovy says its limited to 65536. Is there some way to change this or am I stuck with splitting up the string?
Groovy - need to make a printable form
That's what I'm trying to do.
Update: to be clear its too long of a Groovy String.. I think a regular string might be all good. Going to change strategy and put some strings in the file I can easily find like %!%variable_name%!% and then do .replace(... uh i feel a new question coming on here...
Are you embedding this string directly in your groovy code? The jvm itself has a limit on the length of string constants, see the VM Spec if you are interested in details.
A ugly workaround might be to split the string in smaller parts and concatenate them at runtime. A better solution would be to save the text in an external file and read the contents from your code. You could also package this file along with your code and access it from the classpath using Class#getResourceAsStream.
I'm working on a PInvoke wrapper for a library that does not support Unicode strings, but does support multi-byte ANSI strings. While investigating FxCop reports on the library, I noticed that the string marshaling being used had some interesting side effects. The PInvoke method was using "best fit" mapping to create a single-byte ANSI string. For illustration, this is what one method looked like:
[DllImport("thedll.dll", CharSet=CharSet.Ansi)]
public static extern int CreateNewResource(string resourceName);
The result of calling this function with a string that contains non-ASCII characters is that Windows finds a "close" character, generally this looks like it ends up being "???". If we pretend that 'a' is a non-ASCII character, then passing "cat" as a parameter would create a resource named "c?t".
If I follow the guidelines in the FxCop rule, I end up with something like this:
[DllImport("thedll.dll", CharSet=CharSet.Ansi, BestFitMapping = false, ThrowOnUnmappableChar = true)]
public static extern int CreateNewResource([MarshalAs(UnmanagedType.LPStr)] string resourceName);
This introduces a change in behavior; now when a character cannot be mapped an exception is thrown. This concerns me because this is a breaking change, so I'd like to try and marshal the strings as multi-byte ANSI but I cannot see a way to do so. UnmanagedType.LPStr is specified to be a single-byte ANSI string, LPTStr will be Unicode or ANSI depending on the system, and LPWStr is not what the library expects.
How would I tell PInvoke to marshal the string as a multibyte string? I see there's a WideCharToMultiByte() API function, could I change the signature to expect an IntPtr to a string I create in unmanaged memory? It seems like this still has many of the problems that the current implementation has (it still might have to drop or substitute characters), so I'm not sure if this is an improvement. Is there another method of marshaling that I'm missing?
ANSI is multi-byte, and ANSI strings are encoded according to the codepage currently enabled on the system. WideCharToMultiByte works the same way as P/Invoke.
Maybe what you're after is conversion to UTF-8. Although WideCharToMultiByte supports this, I don't think P/Invoke does, since it's not possible to adopt UTF-8 as the system-wide ANSI code page. At this point you'd be looking at passing the string as an IntPtr instead, although if you're doing that, you may as well use the managed Encoding class to do the conversion, rather than WideCharToMultiByte.
Here is the best way I've found to accomplish this. Instead of marshalling as a string, marshal as a byte[]. Put the responsibility on the caller of the pinvoke function API to convert to a byte array in the most appropriate fashion. Most likely by using one of the Text.Encoding classes.
If you end up having to call WideCharToMultiByte manually, I would get rid of the p/invoke and manually marshal this using WideCharToMultiByte in a C++/CLI wrapper function. Managed C++ is much better at these interop scenarios than C# is.
Though, if this is the only p/invoke you have, it's probably not worth it.