What is the meaning of "Yes, Virginia, it had better be unsigned"?

What is the meaning of "Yes, Virginia, it had better be unsigned"? - string

In the linux source code version 3.18 (and previous), in the string.c file, in the function strncasecmp, the very first thing is:
/* Yes, Virginia, it had better be unsigned */
unsigned char c1, c2;
As can be seen here: http://lxr.free-electrons.com/source/lib/string.c
What is the meaning of this?

string.c:strncasecmp() calls __tolower from include/linux/ctype.h which expects an unsinged char.
EDITed to add: In general you always want to pass unsigned char to ctype.h functions because of C Standard §7.4, which says the behavior is undefined if the argument to ctype.h functions is not representable as unsigned char or EOF. So that probably explains the "Yes, Virginia" bit.
What is a bit more mysterious is that include/linux/ctype.h actually appears idiot-proof in this respect, because it does its own safety-minded cast in #define __ismask(x) (_ctype[(int)(unsigned char)(x)]). I'm not sure when the "Yes, Virginia" comment was added relative to this other line, but with the current version of include/linux/ctype.h it appears that string.c:strncasecmp() would work fine even with char for c1 and c2. I haven't really tried to change & test it that way though...
Also if you go back to Linux 2.0.40's ctype.h, the safety-minded cast ((int)(unsigned char)) is not there anymore. There's no "Virginia" comment either in 2.0.40's string.c, but there's not even a strncasecmp in it. It looks like both changes were made somewhere in between Linux 2.0 and 2.2, but I can't tell you more right now which came first, etc.

Related

Formatting differences between sprintf() and wsprintf() in VS2015

I am moving some code from multibyte to unicode, and finding my string formatting coming out wrong. It looks like Visual Studio 2015 handles the width argument specifier '*' differently between sprintf() and wsprintf(). Is this a compiler bug or side-effect, or am I missing something really obvious?
Code below, with output:
char cOutA [ 64 ];
wchar_t wcOutA [ 64 ];
sprintf ( cOutA, "Multibyte = %.*f\n", 3, 2.12345 );
wsprintf ( wcOutA, L"Unicode = %.*f\n", 3, 2.12345 );
printf ( cOutA );
wprintf ( wcOutA );
Output:
Multibyte = 2.123
Unicode = *f
I was expecting both to give me a floating point number to 3 decimal places. What am I doing wrong?

As mentioned by Hans in the comments, the answer is you should never use wsprintf(). It's always been broken, does not support the same formatting arguments as C standard "swprintf()" and the Microsoft documentation does not make clear how it is broken or why.
I only discovered this when trying to debug a related function: wvsprintf(). This function seems to have the same limitations, and should also be replaced by its working replacement: "vswprintf()". The similarity of the names to the working versions is very unfortunate, as is the apparent closeness to standard C library functions and naming methodologies. I have no idea why these functions are still delivered in 2017, nor why the Microsoft compiler does not generate a warning when used with unsupported arguments in the same way it does for "sprintf()".
I'm posting this for visibility as searching for these functions on Google doesn't seem to make these massive flaws obvious.

How to use strstrip for parsing a string in two parts

I would like to know hot to parse a string like this "hello world" into "helloworld" using the strstrip kernel function. I am developing a Linux Kernel char device and this functions causes me a Kernel Panic (or Kernel Opss).
The way I'm using this function is the following:
char result[100];
strcpy(result, "hello world");
strstrip(result);
strstrip(&result); //Also tried this
strstrip("100+200"); //Also tried this
The Kernel error is caused as soon as the strstrip line gets executed. What is the proper way to call this function?

Actually strstrip helps to remove the white spaces at the front. It does not remove all the white spaces with in the string.
Please look at the below example.
char result[100];
strcpy(result, " hello world from stack exchange");
printk("\n before: %s",result);
strcpy(result, strstrip((char*)result));
printk("\n after: %s",result);
Hope it helps.

srtstrip() is a wrapper function for strim() (http://lxr.linux.no/linux+v3.11.2/lib/string.c#L361) in modern kernels. As it will attempt to modify the string itself, you cannot call it with a static string as you have in the third attempt.
The second attempt you have is passing a pointer to an array variable which is also a pointer. So you are passing a char** which if you look at the link above you can see is not correct.
The first attempt should not cause a kernel error, but you do not appear to be receiving the return value in a a local variable. What kind of error are you receiving? I will update this answer if you can provide that information.
In the end though as Balamurugan A points out, this function does not do what you seem to think it does. strsep() (http://lxr.linux.no/linux+v3.11.2/lib/string.c#L485) may help you out here but it will only be a stepping stone to removing all spaces. You will actually have to copy the string into a new buffer word by word as there is not way to simply "shift memory contents", as it were.

Can I use $urandom_range with time variables?

I wanted to know if I can simply write:
time time_var;
time_var = $urandom_range (10ms, 7ms);
I've tried using it directly, and there are no errors/warnings issued.
However, the returned value is not between 7-10ms.
I guess it's legal to use $urandom_range with time literals, since I didn't receive any errors. But, why can't I get a value in the proper range?

The IEEE Std (1800-2009) declares the arguments to $urandom_range to be of type int unsigned which is not the same as time. I don't think you can rely on the system function to behave predictably even if you are not getting errors or warnings from your simulator.
It is a compile error in VCS and a warning with Incisive.
Can you use something like this?
int unsigned del = $urandom_range(10, 7);
#(1ms * del);

Using the C preprocessor to determine current scope?

I am developing an application in C / Objective-C (No C++ please, I already have a solution there), and I came across an interesting use case.
Because clang does not support nested functions, my original approach will not work:
#define CREATE_STATIC_VAR(Type, Name, Dflt) static Type Name; __attribute__((constructor)) void static_ ## Type ## _ ## Name ## _init_var(void) { /* loading code here */ }
This code would compile fine with GCC, but because clang doesn't support nested functions, I get a compile error:
Expected ';' at end of declaration.
So, I found a solution that works for Clang on variables inside a function:
#define CREATE_STATIC_VAR_LOCAL(Type, Name, Dflt) static Type Name; ^{ /* loading code here */ }(); // anonymous block usage
However, I was wondering if there was a way to leverage macro concatenation to choose the appropriate one for the situation, something like:
#define CREATE_STATIC_VAR_GLOBAL(Type, Name, Dflt) static Type Name; __attribute__((constructor)) void static_ ## Type ## _ ## Name ## _init_var(void) { /* loading code here */ }
#define CREATE_STATIC_VAR_LOCAL(Type, Name, Dflt) static Type Name; ^{ /* loading code here */ }(); // anonymous block usage
#define SCOPE_CHOOSER LOCAL || GLOBAL
#define CREATE_STATIC_VAR(Type, Name, DFLT) CREATE_STATIC_VAR_ ## SCOPE_CHOOSER(Type, Name, Dflt)
Obviously, the ending implementation doesn't have to be exactly that, but something similar will suffice.
I have attempted to use __builtin_constant_p with __func__, but because __func__ is not a compile-time constant, that wasn't working.
I have also tried to use __builtin_choose_expr, but that doesn't appear to work at the global scope.
Is there something else I am missing in the docs? Seems like this should be something fairly easy to do, and yet, I cannot seem to figure it out.
Note: I am aware that I could simply type CREATE_STATIC_VAR_GLOBAL or CREATE_STATIC_VAR_LOCAL instead of messing with macro concatenation, but this is me attempting to push the limits of the compiler. I am also aware that I could use C++ and get this over with right away, but that's not my goal here.

#define SCOPE_CHOOSER LOCAL || GLOBAL
#define CREATE_STATIC_VAR(Type, Name, DFLT) CREATE_STATIC_VAR_ ## SCOPE_CHOOSER(Type, Name, Dflt)
The biggest difficulty here is that the C preprocessor works by textual substitution, so even if you figured out how to get SCOPE_CHOOSER to do what you want, you'd end up with a macro expansion that looked something like
CREATE_STATIC_VAR_LOCAL || GLOBAL(Type, Name, Dflt);
There's no way to get the preprocessor to "constant-fold" macro expansions during substitution; the only time things are "folded" is when they appear in #if expressions. So your only hope (modulo slight handwaving) is to find a single construction that will work both inside and outside of a function.
Can you explain more about the ultimate goal here? I don't think you can load the variable's initial value with __attribute__((constructor)), but maybe there's a way to load the initial value the first time the function body is entered... or register all the addresses of these variables into a global list at compile-time and have a single __attribute__((constructor)) function that traverses that list... or some mishmash of those approaches. I don't have any specific ideas in mind, but maybe if you give more information something will emerge.
EDIT: I don't think this helps you either, since it's not a preprocessor trick, but here is a constant-expression that will evaluate to 0 at function scope and 1 at global scope.
#define AT_GLOBAL_SCOPE __builtin_types_compatible_p(const char (*)[1], __typeof__(&__func__))
However, notice that I said "evaluate" and not "expand". These constructs are compile-time, not preprocessing-time.

Inspired by the #Qxuuplusone answer.
The suggested macro for AT_GLOBAL_SCOPE does indeed work (in GCC), but causes a compiler warning (and I am pretty sure it cannot be silenced by Diagnostic Pragma because it's created by pedwarn with a test here).
Unless you turn on -w you will always see these warnings and have, in the back of your mind, a horrible feeling that you probably shouldn't be doing whatever it is that you are doing.
Fortunately, there is a solution that can silence these lingering doubts.
In the Other Builtins section, there is __builtin_FUNCTION with this very interesting description (emphasis mine):
This function is the equivalent of the __FUNCTION__ symbol and returns an address constant pointing to the name of the function from which the built-in was invoked, or the empty string if the invocation is not at function scope.
It turns out, at least in version 8.3 of GCC, you can do this:
#define AT_GLOBAL_SCOPE (__builtin_FUNCTION()[0] == '\0')
This still probably won't answer the original question, but until GCC decides this too will cause a warning (it kind of seems like it's intentionally designed not to though), it lets me continue doing questionable things using macros without anything to warn me that it's a bad idea.

Newest Delphi compiler versions and String type compatibilty

I'm trying to make some String processing routines compatible with
newest delphi version. I'm using Delphi2005 and 2007 but I'm not totally sure of the compatibility.
Here are a few samples, are they compatible with both the old and the new string type ?
( I'll use an imaginary STRING_UNICODE directive ).
a Type definition:
{$IFNDEF UNICODE_STRING}
TextBuffer = Array[0..13] Of Char;
{$ELSE}
TextBuffer = Array[0..13] Of WideChar;
{$ENDIF}
Useless or not? Is the Char type (becomes what was) a WideChar before the Unicode String, or is there still a difference?
a Function:
Function RemoveBlanks(Text: String): String;
Var
i: integer;
Begin
result := '';
For i:= 0 To Length(Text) Do
Begin
{$IFNDEF UNICODE_STRING}
If Byte(Text[i]) < 21 Then Continue;
{$ELSE}
If Word(Text[i]) < 21 Then Continue;
{$ENDIF}
If Text[i] = ' ' Then Continue;
Result := Result + Text[i];
End;
Is the Word() casting OK?
Here there is also the ' ' problem. How is the space handled
in Unicode version? Should I also use the directive to
differentiate ' ' and ' ' or will the ' ' be automatically handled
as a 2-byte blank?
a line jump:
NewLineBegin := CanReadText( aPTextBuffer, #13#10 );
How is the the second argument (#13#10) interpreted in the Unicode version? Is it compatible? Will it be translated to the byte block 00130010? If not, then should the directive be used instead with the constant #0013#0010?

The first thing to do is read Marco Cantú's paper on
Unicode: http://edn.embarcadero.com/article/38980
Question 1
Just use Char all the time with no conditional code and it will work in old and new.
Char is a special type that is an 8 bit type in old versions of Delphi and a 16 bit type in new Unicode versions.
Question 2
Char is an ordinal type so you can write if s[i]<#21.
You also need to start loops at 1 for strings since they use 1-based indexing.
Question 3
Writing #0013 is not needed, #13 is fine.
In short almost all well written code will need no changes.

Compiler Directives
In general, I'd advise you to be very wary of compiler directives. They serve their purpose, but for general use, they should probably be avoided altogether.
The first problem is that you have to compile your app and test it twice, because it is fundamentally and/or subtly different for a directive on/off.
This situation get worse for each additional directive, because you usually have to permute the combinations:
D1 On, D2 On
D1 On, D2 Off
D1 Off, D2 On
D1 Off, D2 Off
3 directives is 8 permutations... etc.
Unicode Strings
Please see: Get ready for Delphi 2009 and up when developing with Delphi 7?
It has some nice answers for you to consider.
Question 1
As said, I advise against it. I also advise against for other reasons in my answer to the above mentioned question.
More specifically:
In Delphi <2009, both lines are different.
In Delphi >=2009 both lines are effectively the same.
Question 2
Not only is this ill advised for the same reasons as Question 1, but it actually has some subtle problems.
The more precise type of Text (String) is determined by your Delphi version. So:
In Delphi <2009, the else part of your conditional casts a single character to a Word. (Probably with no ill effect.)
In Delph >=2009, the if part of your conditional casts a double-byte character to a Byte. (With loss of information.)
Also, there are some special considerations, and new support classes for 'special' characters. You'll want to look into those. Refer to: How to identify unicode keys on key press?
Question 3
I'm pretty sure that #13 will be treated as a single character, so in Delphi >=2009 where Char == WideChar, that character will take up 2 bytes.
However, again look for Linebreak constants in Delphi. System.sLinebreak was probably introduced back in the Kylix days.

Generic type Char becomes either fundamental type AnsiChar or fundamental type WideChar (read up on generic vs. fundamental types). BTW, there is UNICODE symbol $DEFINEd for you already, however there is no need to branch at all, until specific byte size is required.
Second part smells, scratch it completely. It is an abuse of typecasts and creates a need for conditional compilation artifically. To get unsigned integer character code of given Char use Ord() function instead (or as said in the other answer - use ordinal traits of Char type).
For the third part, character constants are of generic type Char already. Again, there is no need to worry about, #13 becomes either byte sized $0D or word sized $0D00 (remember about little endianess)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string