A new Thing observed in "Scanf" - programming-languages

Generally the 'scanf' syntax typically appears as scanf ( , arg1, arg2, arg3, ... ) ;
But even though Format characters are %d,%s,%c...,but i noticed a new format character like
"%[^\n]" in the program code as scanf("%[^\n]",ptr) ,what does "%[^\n]" specify there,is it a new format character
If this question doesnt fit here i am sorry,
thanks a lot

It isn't new; it's part of standard C. You'll find what it does discussed in the scanf manual page and this part of the POSIX standard.

Related

Formatting differences between sprintf() and wsprintf() in VS2015

I am moving some code from multibyte to unicode, and finding my string formatting coming out wrong. It looks like Visual Studio 2015 handles the width argument specifier '*' differently between sprintf() and wsprintf(). Is this a compiler bug or side-effect, or am I missing something really obvious?
Code below, with output:
char cOutA [ 64 ];
wchar_t wcOutA [ 64 ];
sprintf ( cOutA, "Multibyte = %.*f\n", 3, 2.12345 );
wsprintf ( wcOutA, L"Unicode = %.*f\n", 3, 2.12345 );
printf ( cOutA );
wprintf ( wcOutA );
Output:
Multibyte = 2.123
Unicode = *f
I was expecting both to give me a floating point number to 3 decimal places. What am I doing wrong?
As mentioned by Hans in the comments, the answer is you should never use wsprintf(). It's always been broken, does not support the same formatting arguments as C standard "swprintf()" and the Microsoft documentation does not make clear how it is broken or why.
I only discovered this when trying to debug a related function: wvsprintf(). This function seems to have the same limitations, and should also be replaced by its working replacement: "vswprintf()". The similarity of the names to the working versions is very unfortunate, as is the apparent closeness to standard C library functions and naming methodologies. I have no idea why these functions are still delivered in 2017, nor why the Microsoft compiler does not generate a warning when used with unsupported arguments in the same way it does for "sprintf()".
I'm posting this for visibility as searching for these functions on Google doesn't seem to make these massive flaws obvious.

Extract single unicode character from string

The problem begins when I stumble upon unicode characters. For example, árbol. Right now I handle this by asking if the character at position i, that is, string (i:i) is less than 127. That means that it belongs to the ASCII table, with this I know for sure thatstring (i:i) is a complete single character. In the other case (>= 127) and for my example 'árbol', string (1,2) is the complete character.
I think the way I'm handling the strings solves the problem for my practical purposes (handling files in spanish, polish and russian), but in the case of handling chinese letter where characters may take up to 4 bytes then I would have problems.
Is there a way in fortran to single out unicode characters inside a string?
gfortran does not currently support non-ASCII characters in UTF-8 encoded files, see here. You can find the corresponding bug report here.
As a work-around, you can specify the unicode char in Hex-notation: char(int(z'00E1'), ucs4), or '\u00E1'. The latter requires the compile option -fbackslash to enable the evaluation of the backslash.
program character_kind
use iso_fortran_env
implicit none
integer, parameter :: ucs4 = selected_char_kind ('ISO_10646')
character(kind=ucs4, len=20) :: string
! string = ucs4_'árbol' ! This does not work
! string = char(int(z'00E1'), ucs4) // ucs4_'rbol' ! This works
string = ucs4_'\u00E1rbol' ! This is also working
open (output_unit, encoding='UTF-8')
print *, string(1:1)
print *, string
end program character_kind
ifort seems not to support ISO_10646 at all, selected_char_kind ('ISO_10646') returns -1. With ifort 15.0.0 I get the same message as described here.

How to use strstrip for parsing a string in two parts

I would like to know hot to parse a string like this "hello world" into "helloworld" using the strstrip kernel function. I am developing a Linux Kernel char device and this functions causes me a Kernel Panic (or Kernel Opss).
The way I'm using this function is the following:
char result[100];
strcpy(result, "hello world");
strstrip(result);
strstrip(&result); //Also tried this
strstrip("100+200"); //Also tried this
The Kernel error is caused as soon as the strstrip line gets executed. What is the proper way to call this function?
Actually strstrip helps to remove the white spaces at the front. It does not remove all the white spaces with in the string.
Please look at the below example.
char result[100];
strcpy(result, " hello world from stack exchange");
printk("\n before: %s",result);
strcpy(result, strstrip((char*)result));
printk("\n after: %s",result);
Hope it helps.
srtstrip() is a wrapper function for strim() (http://lxr.linux.no/linux+v3.11.2/lib/string.c#L361) in modern kernels. As it will attempt to modify the string itself, you cannot call it with a static string as you have in the third attempt.
The second attempt you have is passing a pointer to an array variable which is also a pointer. So you are passing a char** which if you look at the link above you can see is not correct.
The first attempt should not cause a kernel error, but you do not appear to be receiving the return value in a a local variable. What kind of error are you receiving? I will update this answer if you can provide that information.
In the end though as Balamurugan A points out, this function does not do what you seem to think it does. strsep() (http://lxr.linux.no/linux+v3.11.2/lib/string.c#L485) may help you out here but it will only be a stepping stone to removing all spaces. You will actually have to copy the string into a new buffer word by word as there is not way to simply "shift memory contents", as it were.

Script or command to automatically change some stuffs in my C code?

I have several C files. and I created a function which is named
X_STRING(arg1,arg2,arg3,arg4);
I called this function too many times from different C files. I want to replace the calling of all these functions to X_STRING(arg1, arg2, (arg1) * 2, (arg2) * 3);
awk seems to be the solution but I don't know how to treat all the cases because I should consider the case where :
I call the function with new lines inserted in the code between differnet arguments
e.g.
X_STRING(
arg1, arg2,
arg3,
arg4);
an argument contains a parenthesis :
e.g.
X_STRING(arg1, arg2, (arg3 - 4)*3, arg4);
Someone can point me to good tools to resolve my problem ?
If you don't mind changing the format in the output, you might be happy with an m4 solution. Put the following in a file:
define( X_STRING, ``X_STRING''( $1, $2, ($1) * 2, ($2) * 3 ))
And then run:
$ m4 def_file file.c
where file.c is your code and def_file is the name of the file with the above content. (The name is irrelevant.)
This should work if your code is well-formatted. (If you have unmatched parentheses, it will fail.)
This will change the whitespace, but otherwise should do what you want.
Another solution might be to customize the GCC compiler for that purpose, thru GCC plugins written in C or GCC extensions written in MELT
The advantage of working inside the compiler is that you process some compiler's internal representation (like Gimple for GCC), and not only text. Textual approaches won't work that well if for instance the call to your X_STRING is appearing by expansion of a macro, or by inlining a function.
Extending the GCC compiler has also several drawbacks: it is GCC specific (and might even depend upon the version of GCC), and it would require several days of work.

Protection from Format String Vulnerability

What exactly is a "Format String Vulnerability" in a Windows System, how does it work, and how can I protect against it?
A format string attack, at its simplest, is this:
char buffer[128];
gets(buffer);
printf(buffer);
There's a buffer overflow vulnerability in there as well, but the point is this: you're passing untrusted data (from the user) to printf (or one of its cousins) that uses that argument as a format string.
That is: if the user types in "%s", you've got an information-disclosure vulnerability, because printf will treat the user input as a format string, and will attempt to print the next thing on the stack as a string. It's as if your code said printf("%s");. Since you didn't pass any other arguments to printf, it'll display something arbitrary.
If the user types in "%n", you've got a potential elevation of privilege attack (at least a denial of service attack), because the %n format string causes printf to write the number of characters printed so far to the next location on the stack. Since you didn't give it a place to put this value, it'll write to somewhere arbitrary.
This is all bad, and is one reason why you should be extremely careful when using printf and cousins.
What you should do is this:
printf("%s", buffer);
This means that the user's input is never treated as a format string, so you're safe from that particular attack vector.
In Visual C++, you can use the __Format_string annotation to tell it to validate the arguments to printf. %n is disallowed by default. In GCC, you can use __attribute__(__printf__) for the same thing.
In this pseudo code the user enters some characters to be printed, like "hello"
string s=getUserInput();
write(s)
That works as intended. But since the write can format strings, for example
int i=getUnits();
write("%02d units",i);
outputs: "03 units". What about if the user in the first place wrote "%02d"... since there is no parameters on the stack, something else will be fetched. What that is, and if that is a problem or not depends on the program.
An easy fix is to tell the program to output a string:
write("%s",s);
or use another method that don't try to format the string:
output(s);
a link to wikipedia with more info.

Resources