CString constructor from char[] - visual-c++

Is it good to construct CString from char[] ?
char R[5000];
CString s = R;
In these lines sometimes I have exception:
Windows has triggered a breakpoint in tst.exe.
This may be due to a corruption of the heap, which indicates a bug in tst.exe or any of the DLLs it has loaded.
This may also be due to the user pressing F12 while tst.exe has focus.
The output window may have more diagnostic information

The CString constructor attempts to copy a null-terminated string from the input argument (more precisely, from the memory address pointed by the input argument).
So it copies characters until reaching a 0 (null) character, and since the contents of R[5000] are not initialized, there's a good chance that none of the characters in it is equal to 0.
If there is no such character within the legal memory region pointed by the input argument, then the CString constructor exceeds that memory region, and most likely causes an illegal memory access.

Related

How to input a string value of unknown length from console in Fortran? [duplicate]

I would like to use deferred-length character strings in a "simple" manner to read user input. The reason that I want to do this is that I do not want to have to declare the size of a character string before knowing how large the user input will be. I know that there are "complicated" ways to do this. For example, the iso_varying_string module can be used: https://www.fortran.com/iso_varying_string.f95. Also, there is a solution here: Fortran Character Input at Undefined Length. However, I was hoping for something as simple, or almost as simple, as the following:
program main
character(len = :), allocatable :: my_string
read(*, '(a)') my_string
write(*,'(a)') my_string
print *, allocated(my_string), len(my_string)
end program
When I run this program, the output is:
./a.out
here is the user input
F 32765
Notice that there is no output from write(*,'(a)') my_string. Why?
Also, my_string has not been allocated. Why?
Why isn't this a simple feature of Fortran? Do other languages have this simple feature? Am I lacking some basic understanding about this issue in general?
vincentjs's answer isn't quite right.
Modern (2003+) Fortran does allow automatic allocation and re-allocation of strings on assignment, so a sequence of statements such as this
character(len=:), allocatable :: string
...
string = 'Hello'
write(*,*)
string = 'my friend'
write(*,*)
string = 'Hello '//string
write(*,*)
is correct and will work as expected and write out 3 strings of different lengths. At least one compiler in widespread use, the Intel Fortran compiler, does not engage 2003 semantics by default so may raise an error on trying to compile this. Refer to the documentation for the setting to use Fortran 2003.
However, this feature is not available when reading a string so you have to resort to the tried and tested (aka old-fashioned if you prefer) approach of declaring a buffer of sufficient size for any input and of then assigning the allocatable variable. Like this:
character(len=long) :: buffer
character(len=:), allocatable :: string
...
read(*,*) buffer
string = trim(buffer)
No, I don't know why the language standard forbids automatic allocation on read, just that it does.
Deferred length character is a Fortran 2003 feature. Note that many of the complicated methods linked to are written against earlier language versions.
With Fortran 2003 support, reading a complete record into a character variable is relatively straight forward. A simple example with very minimal error handling below. Such a procedure only needs to be written once, and can be customized to suit a user's particular requirements.
PROGRAM main
USE, INTRINSIC :: ISO_FORTRAN_ENV, ONLY: INPUT_UNIT
IMPLICIT NONE
CHARACTER(:), ALLOCATABLE :: my_string
CALL read_line(input_unit, my_string)
WRITE (*, "(A)") my_string
PRINT *, ALLOCATED(my_string), LEN(my_string)
CONTAINS
SUBROUTINE read_line(unit, line)
! The unit, connected for formatted input, to read the record from.
INTEGER, INTENT(IN) :: unit
! The contents of the record.
CHARACTER(:), INTENT(OUT), ALLOCATABLE :: line
INTEGER :: stat ! IO statement IOSTAT result.
CHARACTER(256) :: buffer ! Buffer to read a piece of the record.
INTEGER :: size ! Number of characters read from the file.
!***
line = ''
DO
READ (unit, "(A)", ADVANCE='NO', IOSTAT=stat, SIZE=size) buffer
IF (stat > 0) STOP 'Error reading file.'
line = line // buffer(:size)
! An end of record condition or end of file condition stops the loop.
IF (stat < 0) RETURN
END DO
END SUBROUTINE read_line
END PROGRAM main
Deferred length arrays are just that: deferred length. You still need to allocate the size of the array using the allocate statement before you can assign values to it. Once you allocate it, you can't change the size of the array unless you deallocate and then reallocate with a new size. That's why you're getting a debug error.
Fortran does not provide a way to dynamically resize character arrays like the std::string class does in C++, for example. In C++, you could initialize std::string var = "temp", then redefine it to var = "temporary" without any extra work, and this would be valid. This is only possible because the resizing is done behind the scenes by the functions in the std::string class (it doubles the size if the buffer limit is exceeded, which is functionally equivalent to reallocateing with a 2x bigger array).
Practically speaking, the easiest way I've found when dealing with strings in Fortran is to allocate a reasonably large character array that will fit most expected inputs. If the size of the input exceeds the buffer, then simply increase the size of your array by reallocateing with a larger size. Removing trailing white space can be done using trim.
You know that there are "complicated" ways of doing what you want. Rather than address those, I'll answer your first two "why?"s.
Unlike intrinsic assignment a read statement does not have the target variable first allocated to the correct size and type parameters for the thing coming in (if it isn't already like that). Indeed, it is a requirement that the items in an input list be allocated. Fortran 2008, 9.6.3, clearly states:
If an input item or an output item is allocatable, it shall be allocated.
This is the case whether the allocatable variable is a character with deferred length, a variable with other deferred length-type parameters, or an array.
There is another way to declare a character with deferred length: giving it the pointer attribute. This doesn't help you, though, as we also see
If an input item is a pointer, it shall be associated with a definable target ...
Why you have no output from your write statement is related to why you see that the character variable isn't allocated: you haven't followed the requirements of Fortran and so you can't expect the behaviour that isn't specified.
I'll speculate as to why this restriction is here. I see two obvious ways to relax the restriction
allow automatic allocation generally;
allow allocation of a deferred length character.
The second case would be easy:
If an input item or an output item is allocatable, it shall be allocated unless it is a scalar character variable with deferred length.
This, though, is clumsy and such special cases seem against the ethos of the standard as a whole. We'd also need a carefully thought out rule about alloction for this special case.
If we go for the general case for allocation, we'd presumably require that the unallocated effective item is the final effective item in the list:
integer, allocatable :: a(:), b(:)
character(7) :: ifile = '1 2 3 4'
read(ifile,*) a, b
and then we have to worry about
type aaargh(len)
integer, len :: len
integer, dimension(len) :: a, b
end type
type(aaargh), allocatable :: a(:)
character(9) :: ifile = '1 2 3 4 5'
read(ifile,*) a
It gets quite messy very quickly. Which seems like a lot of problems to resolve where there are ways, of varying difficulty, of solving the read problem.
Finally, I'll also note that allocation is possible during a data transfer statement. Although a variable must be allocated (as the rules are now) when appearing in input list components of an allocated variable of derived type needn't be if that effective item is processed by defined input.

String (constants) Literals pointer

I'm relatively new to C++ and I searched for an answer to my question, however I got more confused. As I understand, string literals must be pointed by "const" pointers, since are considered to be readable only. I also understand the pointer itself is not constant (and could be changed), but actually it is pointing to a string constant.I also understand that the string itself cannot be modified. So in this example:
const char* cstr="string";
*cstr = 'a';
I get an error: "assignment of read-only location."
Now, if I define my C-string as following, and define a pointer to it, I'll be able to change the string:
char str[7]="string";
char* cstr = str;
*cstr = 'a';
cout << cstr <<endl;
the string will be modified (output --> a), means the first element of the string is changes. My two questions are:
1- why in the second example I am able to modify the C-string but in the first case I cannot make any changes to the string? 2- In both cases I am using pointers, but in the first case I should Use constant char pointer?
When you use the syntax
const char* cstr="string";
C++ defines:
An array of 7 character in the read-only section of memory, with the contents string\0 in it.
pointer on the stack (or in the writable global section of memory), with the address of that array.
However, when you use the syntax:
char str[7]="string";
C++ defines:
An array of 7 character on the stack (or in the writable global section of memory), with the contents "string\0" in it.
In the first case, the actual values are in read-only memory, so you can't change them. In the second case, they are in writable memory (stack or global).
C++ tries to enforce this semantic, so if the definition is read-only memory, you should use a const pointer.
Note that not all architectures have read-only memory, but because most of them do, and C++ might want to use the read-only memory feature (for better correctness), then C++ programmers should assume (for the purpose of pointer types) that constants are going to be placed in read-only memory.

Why doesn't printf function show string after ROP attack? [duplicate]

#include<stdio.h>
int main()
{
char *name = "Vikram";
printf("%s",name);
name[1]='s';
printf("%s",name);
return 0;
}
There is no output printed on terminal and just get segmentation fault. But when I run it in GDB, I get following -
Program received signal SIGSEGV, Segmentation fault.
0x0000000000400525 in main () at seg2.c:7
7 name[1]='s';
(gdb)
This means program receive SEG fault on 7th line (obviously I can't write on constant char array) . Then why printf() of line number 6 is not executed ?
This is due to stream buffering of stdout. Unless you do fflush(stdout) or you print a newline "\n" the output is may be buffered.
In this case, it's segfaulting before the buffer is flushed and printed.
You can try this instead:
printf("%s",name);
fflush(stdout); // Flush the stream.
name[1]='s'; // Segfault here (undefined behavior)
or:
printf("%s\n",name); // Flush the stream with '\n'
name[1]='s'; // Segfault here (undefined behavior)
First you should end your printfs with "\n" (or at least the last one). But that is not related to the segfault.
When the compiler compiles your code, it splits the binary into several section. Some are read only, while other are writeable.
Writing to an read only section may cause a segfault.
String literals are usually placed in a read only section (gcc should put it in ".rodata").
The pointer name points to that ro section. Therefore you must use
const char *name = "Vikram";
In my response I've used a few "may" "should". The behaviour depends on your OS, compiler and compilation settings (The linker script defines the sections).
Adding
-Wa,-ahlms=myfile.lst
to gcc's command line produces a file called myfile.lst with the generated assembler code.
At the top you can see
.section .rodata
.LC0:
.string "Vikram"
Which shows that the string is in Vikram.
The same code using (Must be in global scope, else gcc may store it on the stack, notice it is an array and not a pointer)
char name[] = "Vikram";
produces
.data
.type name, #object
.size name, 7
name:
.string "Vikram"
The syntax is a bit different but see how it is in .data section now, which is read-write.
By the way this example works.
The reason you are getting a segmentation fault is that C string literals are read only according to the C standard, and you are attempting to write 's' over the second element of the literal array "Vikram".
The reason you are getting no output is because your program is buffering its output and crashes before it has a chance to flush its buffer. The purpose of the stdio library, in addition to providing friendly formatting functions like printf(3), is to reduce the overhead of i/o operations by buffering data in in-memory buffers and only flushing output when necessary, and only performing input occasionally instead of constantly. Actual input and output will not, in the general case, occur at the moment when you call the stdio function, but only when the output buffer is full (or the input buffer is empty).
Things are slightly different if a FILE object has been set so it flushes constantly (like stderr), but in general, that's the gist.
If you're debugging, it is best to fprintf to stderr to assure that your debug printouts will get flushed before a crash.
By default when stdout is connected to a terminal, the stream is line-buffered. In practice, in your example the absence of '\n' (or of an explicit stream flush) is why you don't get the characters printed.
But in theory undefined behavior is not bounded (from the Standard "behavior [...] for which this International Standard imposes no requirements") and the segfault can happen even before the undefined behavior occurs, for example before the first printf call!

What is the different between a null terminated string and a string that is not terminated by null in x86 assembly language

I'm currently learning assembly programming by following Kip Irvine's "assembly language x86 programming" book.
In the book, the author states
The most common type of string ends with a null byte (containing 0).
Called a null-terminated string
In the subsequent section of the book, the author had a string example without the null byte
greeting1 \
BYTE "Welcome to the Encryption Demo program "
So I was just wondering, what is the different between a null terminated string and a string that is not terminated by null in x86 assembly language? Are they interchangeable? Or they are not equivalent of each other?
There's nothing specific to asm here; it's the same issue in C. It's all about how you store strings in memory and keep track of where they end.
what is the different between a null terminated string and a string that is not terminated by null?
A null-terminated string has a 0 byte after it, so you can find the end with strlen. (e.g. with a slow repne scasb). This makes is usable as an implicit-length string, like C uses.
NASM Assembly - what is the ", 0" after this variable for? explains the NASM syntax for creating one in static storage with db. db usage in nasm, try to store and print string shows what happens when you forget the 0 terminator.
Are they interchangeable?
If you know the length of a null-terminated string, you can pass pointer+length to a function that wants an explicit-length string. That function will never look at the 0 byte, because you will pass a length that doesn't include the 0 byte. It's not part of the string data proper.
But if you have a string without a terminator, you can't pass it to a function or system-call that wants a null-terminated string. (If the memory is writeable, you could store a 0 after the string to make it into a null-terminated string.)
In Linux, many system calls take strings as C-style implicit-length null-terminated strings. (i.e. just a char* without passing a length).
For example, open(2) takes a string for the path: int open(const char *pathname, int flags); You must pass a null-terminated string to the system call. It's impossible to create a file with a name that includes a '\0' in Linux (same as most other Unix systems), because all the system calls for dealing with files use null-terminated strings.
OTOH, write(2) takes a memory buffer which isn't necessarily a string. It has the signature ssize_t write(int fd, const void *buf, size_t count);. It doesn't care if there's a 0 at buf+count because it only looks at the bytes from buf to buf+count-1.
You can pass a string to write(). It doesn't care. It's basically just a memcpy into the kernel's pagecache (or into a pipe buffer or whatever for non-regular files). But like I said, you can't pass an arbitrary non-terminated buffer as the path arg to open().
Or they are not equivalent of each other?
Implicit-length and explicit-length are the two major ways of keeping track of string data/constants in memory and passing them around. They solve the same problem, but in opposite ways.
Long implicit-length strings are a bad choice if you sometimes need to find their length before walking through them. Looping through a string is a lot slower than just reading an integer. Finding the length of an implicit-length string is O(n), but an explicit-length string is of course O(1) time to find the length. (It's already known!). At least the length in bytes is known, but the length in Unicode characters might not be known, if it's in a variable-length encoding like UTF-8 or UTF-16.
How a string is terminated has nothing to do with assembly. Historically, '$', CRLF [10,13] or [0A,0D] and those are sometimes reversed as with GEDIT under Linux. Conventions are determined by how your system is going to interact with itself or other systems. As an example, my applications are strictly oriented around ASCII, therefore, if I would read a file that's UTF-8 or 16 my application would fail miserably. NULLs or any kind of termination could be optional.
Consider this example
Title: db 'Proto_Sys 1.00.0', 0, 0xc4, 0xdc, 0xdf, 'CPUID', 0, 'A20', 0
db 'AXCXDXBXSPBPSIDIESDSCSSSFSGS'
Err00: db 'Retry [Y/N]', 0
I've implemented a routine where if CX=0 then it's assumed a NULL terminated string is to be displayed, otherwise only one character is read and repeated CX times. That is why 0xc4 0xdc 0xdf are not terminated. Similarly, there isn't a terminator before 'Retry [Y/N]' because the way my algo is designed, there doesn't need to be.
The only thing you need concern yourself with is what is the source of your data or does your application need to be compatible with something else. Then you just simply implement whatever you need to make it work.

Can I limit the size of MFC CString buffer

I have an old application which uses CString through out the code.
Maximum size of the string which is written to CString is 8,9 characters, but I noticed that it allocates more. (at least 128 bytes per CString)
Is there a way to limit the size of CString buffer. Fox example to 64bytes?
Thanks in advance,
No.
In detail:
The CString implementation is internal. You find the code in CSimpleStringT::PrepareWrite2 and in the Reallocate function of the string manager.
PrepareWrite2 allocates the buffer. If there was no buffer before, it requests the exact size. If the buffer is changes. The buffer is newLength*1.5.
Finally the request is passed to the Reallocate function of the string manager. Finally this size is passed to the CRT function realloc.
Keep in mind that the memory manager itself decides again what blocksize is "effective" and might change the size again.
So as I see (in VS-2013/VS-2010) you have no chance to change the blocksize. The job is finally done by realloc. And even this function passes its request to HeapAlloc...

Resources