Formatting differences between sprintf() and wsprintf() in VS2015 - visual-c++

I am moving some code from multibyte to unicode, and finding my string formatting coming out wrong. It looks like Visual Studio 2015 handles the width argument specifier '*' differently between sprintf() and wsprintf(). Is this a compiler bug or side-effect, or am I missing something really obvious?
Code below, with output:
char cOutA [ 64 ];
wchar_t wcOutA [ 64 ];
sprintf ( cOutA, "Multibyte = %.*f\n", 3, 2.12345 );
wsprintf ( wcOutA, L"Unicode = %.*f\n", 3, 2.12345 );
printf ( cOutA );
wprintf ( wcOutA );
Output:
Multibyte = 2.123
Unicode = *f
I was expecting both to give me a floating point number to 3 decimal places. What am I doing wrong?

As mentioned by Hans in the comments, the answer is you should never use wsprintf(). It's always been broken, does not support the same formatting arguments as C standard "swprintf()" and the Microsoft documentation does not make clear how it is broken or why.
I only discovered this when trying to debug a related function: wvsprintf(). This function seems to have the same limitations, and should also be replaced by its working replacement: "vswprintf()". The similarity of the names to the working versions is very unfortunate, as is the apparent closeness to standard C library functions and naming methodologies. I have no idea why these functions are still delivered in 2017, nor why the Microsoft compiler does not generate a warning when used with unsupported arguments in the same way it does for "sprintf()".
I'm posting this for visibility as searching for these functions on Google doesn't seem to make these massive flaws obvious.

Related

VC6 /r/n and Write works; Visual Studio 2013 does not work

the following code
if(!cfile.Open(fileName, CFile::modeCreate | CFile::modeReadWrite)){
return;
}
ggg.Format(_T("0 \r\n"));
cfile.Write(ggg, ggg.GetLength());
ggg.Format(_T("SECTION \r\n"));
cfile.Write(ggg, ggg.GetLength());
produces the following:
0 SECTI
clearly this is wrong: (a) \r\n is ignored, and (b) the word SECTION is cut off.
Can someone please tell me what I am doing wrong?
The same code without _T() in VC6 produces the correct results.
Thank you
a.
Apparently, you are building a Unicode build; CString (presumably that's what ggg is) holds a sequence of wchar_t characters, each two bytes large. ggg.GetLength() is the length of the string in characters.
However, CFile::Write takes the length in bytes, not in characters. You are passing half the number of bytes actually taken by the string, so only half the number of characters gets written.
Have you considered changing lines like:
cfile.Write(ggg, ggg.GetLength());
to`
cfile.Write(ggg, ggg.GetLength() * sizeof(TCHAR))
Write needs the number of bytes (not characters). Since Unicode is 2 bytes wide you need to account for that. sizeof(TCHAR) should be the number of bytes each character takes on a given platform. If it is built as Ansi it would be 1 and Unicode would have 2. Multiply that by the string length and the number of bytes should be correct.
Information on TCHAR can be found on MSDN documentation here. In particular it is defined as:
The _TCHAR data type is defined conditionally in Tchar.h. If the symbol _UNICODE is defined for your build, _TCHAR is defined as wchar_t; otherwise, for single-byte and MBCS builds, it is defined as char. (wchar_t, the basic Unicode wide-character data type, is the 16-bit counterpart to an 8-bit signed char.)
TCHAR and _TCHAR in your usage should be synonymous. However I believe these days Microsoft recommends including <tchar.h> and using _TCHAR. What I can't tell you is if _TCHAR existed on VC 6.0.
If using the method above - if you build using Unicode your output files will be in Unicode. If you build for Ansi it will be output as 8bit ASCII.
Want CFile.write to output Ascii no matter what? Read on...
If you want all text written to the file as 8bit ASCII you are going to have to use one of the macros for conversion. In particular CT2A. More on the macros can be found in this MSDN article. Each macro can be broken up by name, however CT2A says convert the Generic character string (equivalent to W when _UNICODE is defined, equivalent to A otherwise) to Ascii per the chart at the link. So no matter whether using Unicode or Ascii it would output Ascii. Your code would look something like:
ggg.Format(_T("0 \r\n"));
cfile.Write(CT2A(ggg), ggg.GetLength());
ggg.Format(_T("SECTION \r\n"));
cfile.Write(CT2A(ggg), ggg.GetLength());
Since the macro converts everything to Ascii CString's GetLength() will suffice.

Preventing off-by-one errors with CRT secure string functions

As of Visual Studio 2005, the CRT has replaced most string functions with secure versions which add a size argument to indicate the limits of the destination buffer(s). This is fine, but it’s not clear how it should be used. Does it include the terminating zero? Take the following code for example:
…
TCHAR path[MAX_PATH] = TEXT("");
_tcscpy_s(path, MAX_PATH, filename);
…
Is it okay or does it induce an off-by-one error?
It'd be a failure of the API design to be MAX_PATH+/-1, as that would be confusing and lead to more buffer overflows.
Documentation states clearly for
dest[10] that _countof(dest) should be used, which would be 10
So a simple MAX_PATH will suffice.

In Linux, do there exist functions similar to _clearfp() and _statusfp()?

Recently, I am doing a job about porting.
I encountered such a problem:
Some Windows API, such as _clearfp(), _statusfp() etc, then I can't find the corresponding functions in Linux.
So I am here to ask for help.
You would need a POSIX system, or a C99 compiler that supported Annex F of the C99 Standard. You can test if Annex F is supported by checking if the macro __STDC_IEC_559__ is defined. The relevant functions would be found in <fenv.h>.
int feclearexcept(int excepts); // clears exceptions (returns 0 on success)
int fetestexcept(int excepts); // returns exceptions that are set
The exceptions passed in as excepts, and returned by fetestexcept, is a bitmask that can be test against the following macros:
FE_DIVBYZERO
FE_INEXACT
FE_INVALID
FE_OVERFLOW
FE_UNDERFLOW
FE_ALL_EXCEPT
The last macro, FE_ALL_EXCEPT, is just the bitwise-or of all the macros above it.

Why Visual C++ version numbers have a comma in them instead of a dot

I have seen that my Visual C++ projects have the following declarations that use COMMAS instead of DOTS for versions:
#define FILEVER 11,0,2,0
#define PRODUCTVER 11,0,2,0
#define STRFILEVER "11, 0, 2, 0\0"
#define STRPRODUCTVER "11, 0, 2, 0\0"
The MS article here also has the same values with commas (actually the above declarations are based on that article). Why are we using COMMAS here? When I open the compiled file properties, I see FileVersion as 11.0.2.0 but ProductVersion as 11,0,2,0 - for which my QA friends say that its a bug :). Is there some standard or maybe some internal mechanism that I am missing?
In the first two definitions because Microsoft resource file syntax calls for commas. For the later two definitions the Microsoft convention sticks to commas there too. Microsoft wants to differ, probably.
The file version is taken from the non-string variant and gets printed with dots in file explorer. The product version gets taken from the string. You could probably write the string with dots yourself. It's a string, it shouldn't matter. But you'll have to edit the .rc file manually—Visual Studio will write commas.
As for the first two definitions, I can see reason for choosing commas in general C++. If you had 11.0.2.0, it would be syntax error in about any context and the only thing you could do is convert it to string with the # operator. But with commas, you can expand it to definition of array or structure. Like int version[] = { 11,0,2,0 };. That's useful if you want to have version check in code.

Newest Delphi compiler versions and String type compatibilty

I'm trying to make some String processing routines compatible with
newest delphi version. I'm using Delphi2005 and 2007 but I'm not totally sure of the compatibility.
Here are a few samples, are they compatible with both the old and the new string type ?
( I'll use an imaginary STRING_UNICODE directive ).
a Type definition:
{$IFNDEF UNICODE_STRING}
TextBuffer = Array[0..13] Of Char;
{$ELSE}
TextBuffer = Array[0..13] Of WideChar;
{$ENDIF}
Useless or not? Is the Char type (becomes what was) a WideChar before the Unicode String, or is there still a difference?
a Function:
Function RemoveBlanks(Text: String): String;
Var
i: integer;
Begin
result := '';
For i:= 0 To Length(Text) Do
Begin
{$IFNDEF UNICODE_STRING}
If Byte(Text[i]) < 21 Then Continue;
{$ELSE}
If Word(Text[i]) < 21 Then Continue;
{$ENDIF}
If Text[i] = ' ' Then Continue;
Result := Result + Text[i];
End;
Is the Word() casting OK?
Here there is also the ' ' problem. How is the space handled
in Unicode version? Should I also use the directive to
differentiate ' ' and ' ' or will the ' ' be automatically handled
as a 2-byte blank?
a line jump:
NewLineBegin := CanReadText( aPTextBuffer, #13#10 );
How is the the second argument (#13#10) interpreted in the Unicode version? Is it compatible? Will it be translated to the byte block 00130010? If not, then should the directive be used instead with the constant #0013#0010?
The first thing to do is read Marco Cantú's paper on
Unicode: http://edn.embarcadero.com/article/38980
Question 1
Just use Char all the time with no conditional code and it will work in old and new.
Char is a special type that is an 8 bit type in old versions of Delphi and a 16 bit type in new Unicode versions.
Question 2
Char is an ordinal type so you can write if s[i]<#21.
You also need to start loops at 1 for strings since they use 1-based indexing.
Question 3
Writing #0013 is not needed, #13 is fine.
In short almost all well written code will need no changes.
Compiler Directives
In general, I'd advise you to be very wary of compiler directives. They serve their purpose, but for general use, they should probably be avoided altogether.
The first problem is that you have to compile your app and test it twice, because it is fundamentally and/or subtly different for a directive on/off.
This situation get worse for each additional directive, because you usually have to permute the combinations:
D1 On, D2 On
D1 On, D2 Off
D1 Off, D2 On
D1 Off, D2 Off
3 directives is 8 permutations... etc.
Unicode Strings
Please see: Get ready for Delphi 2009 and up when developing with Delphi 7?
It has some nice answers for you to consider.
Question 1
As said, I advise against it. I also advise against for other reasons in my answer to the above mentioned question.
More specifically:
In Delphi <2009, both lines are different.
In Delphi >=2009 both lines are effectively the same.
Question 2
Not only is this ill advised for the same reasons as Question 1, but it actually has some subtle problems.
The more precise type of Text (String) is determined by your Delphi version. So:
In Delphi <2009, the else part of your conditional casts a single character to a Word. (Probably with no ill effect.)
In Delph >=2009, the if part of your conditional casts a double-byte character to a Byte. (With loss of information.)
Also, there are some special considerations, and new support classes for 'special' characters. You'll want to look into those. Refer to: How to identify unicode keys on key press?
Question 3
I'm pretty sure that #13 will be treated as a single character, so in Delphi >=2009 where Char == WideChar, that character will take up 2 bytes.
However, again look for Linebreak constants in Delphi. System.sLinebreak was probably introduced back in the Kylix days.
Generic type Char becomes either fundamental type AnsiChar or fundamental type WideChar (read up on generic vs. fundamental types). BTW, there is UNICODE symbol $DEFINEd for you already, however there is no need to branch at all, until specific byte size is required.
Second part smells, scratch it completely. It is an abuse of typecasts and creates a need for conditional compilation artifically. To get unsigned integer character code of given Char use Ord() function instead (or as said in the other answer - use ordinal traits of Char type).
For the third part, character constants are of generic type Char already. Again, there is no need to worry about, #13 becomes either byte sized $0D or word sized $0D00 (remember about little endianess)

Resources