How to increase the size of the CString, if CString Object get maximum size. or tell me the function which can hold maximum data more than the CString
CString uses heap allocation for the string buffer so actual limit for the string length depends on a number of conditions and is some hundreds megabytes.
In general, each time the string needs to grow its buffer it allocates a new buffer greater then the previous one - there's a strategy for how to determine the new size of the buffer. Depending on actual amount of available memory in the system this reallocation may either fail or succeed. If it fails you have very little options of what you can do - the best choice is usually to restart the program.
For the task you solve - working with a COM port - you can use an MFC::CArray which is very convenient to use as a variable size array. You could also use std::vector for the same.
In CString, the string actual size and allocated buffer are held by signed ints (check out CStringData). The string buffer itself is dynamically allocated. This means the theoretical limit is 2^31 characters. Practically, on a 32 bit environment you'll be able to get much less due to memory fragmantation. Also, if you're using Unicode CString, each character is two bytes, which means the CString buffer will hold less text. On a 64 bit environment you might be able to get as much as 2^31 characters.
Having that said, are you really trying to work with strings that long? There's probably a lot to do before you hit the CString length limit.
Related
What is the maximum number of characters can dart String class can hold? I searched about this question in Google Search and I can't found the answer.
It depends.
When compiled for the web, the limit is the allowed JS String length.
Checking, it seems Chrome has a 0x1FFF_FFE8 limit on string length, Firefox uses 0x3FFF_FFFE.
On the Dart native VM, the limit is memory (and the size of integers, but so far the 64-bit integers cannot be a limiting factor on contemporary hardware. You can't actually have 2^64 bytes of memory on any current 64-bit architecture.)
I ran into "Exhausted heap space" errors after trying to double a string of length 0x4_0000_0000 (16 Gb). At that point the code was already fairly slow, which is not surprising since the computer "only" has 32 Gb of physical memory.
If using a 32-bit VM, the memory limit will definitely be lower.
PHP have an internal data-structure called smart string (smart_str?), where they store both length and buffer size. That is, more memory than the length of the string is allocated to improve concatenation performance. Why isn't this data-structure used for the actual PHP strings? Wouldn't that lead to fewer memory allocations and better performance?
Normal PHP strings (as of PHP 7) are represented by the zend_string type, which includes both the length of the string and its character data array. zend_strings are usually allocated to fit the character data precisely (alignment notwithstanding): They will not leave place to append additional characters.
The smart_str structure includes a pointer to a zend_string and an allocation size. This time, the zend_string will not be precisely allocated. Instead the allocation will be made too large, so that additional characters can be appended without expensive reallocations.
The reallocation policy for smart_str is as follows: First, it will be allocated to have a total size of 256 bytes (minus the zend_string header, minus allocator overhead). If this size is exceeded it will be reallocated to 4096 bytes (minus overhead). After that, the size will increase in increments of 4096 bytes.
Now, imagine that we replace all strings with smart_strings. This would mean that even a single character string would have a minimum allocation size of 256 bytes. Given that most strings in use are small, this is an unacceptable overhead.
So essentially, this is a classic performance/memory tradeoff. We use a memory-compact representation by default and switch to a faster, but less memory-effective representation in the cases that benefit most from it, i.e. cases where large strings are constructed from small parts.
According to the Rust Reference:
The isize type is a signed integer type with the same number of bits as the platform's pointer type. The theoretical upper bound on object and array size is the maximum isize value. This ensures that isize can be used to calculate differences between pointers into an object or array and can address every byte within an object along with one byte past the end.
This obviously constrain an array to at most 2G elements on 32 bits system, however what is not clear is whether an array is also constrained to at most 2GB of memory.
In C or C++, you would be able to cast the pointers to the first and one past last elements to char* and obtain the difference of pointers from those two; effectively limiting the array to 2GB (lest it overflow intptr_t).
Is an array in 32 bits also limited to 2GB in Rust? Or not?
The internals of Vec do cap the value to 4GB, both in with_capacity and grow_capacity, using
let size = capacity.checked_mul(mem::size_of::<T>())
.expect("capacity overflow");
which will panic if the pointer overflows.
As such, Vec-allocated slices are also capped in this way in Rust. Given that this is because of an underlying restriction in the allocation API, I would be surprised if any typical type could circumvent this. And if they did, Index on slices would be unsafe due to pointer overflow. So I hope not.
It might still not be possible to allocate all 4GB for other reasons, though. In particular, allocate won't let you allocate more than 2GB (isize::MAX bytes), so Vec is restricted to that.
Rust uses LLVM as compiler backend. The LLVM instruction for pointer arithmetic (GetElementPtr) takes signed integer offsets and has undefined behavior on overflow, so it is impossible to index into arrays larger than 2GB when targeting a 32-bit platform.
To avoid undefined behavior, Rust will refuse to allocate more than 2 GB in a single allocation. See Rust issue #18726 for details.
When I was looking at the way std::string is implemented in gcc I noticed that sizeof(std::string) is exactly equal to the size of pointer (4 bytes in x32 build, and 8 bytes for x64). As string should hold a pointer to string buffer and its length as a bare minimum, this made me think that std::string object in GCC is actually a pointer to some internal structure that holds this data.
As a consequence when new string is created one dynamic memory allocation should occur (even if the string is empty).
In addition to performance overhead this also cause memory overhead (that happens when we are allocating very small chunk of memory).
So I see only downsides of such design. What am I missing? What are the upsides and what is the reason for such implementation in the first place?
Read the long comment at the top of <bits/basic_string.h>, it explains what the pointer points to and where the string length (and reference count) are stored and why it's done that way.
However, C++11 doesn't allow a reference-counted Copy-On-Write std::string so the GCC implementation will have to change, but doing so would break the ABI so is being delayed until an ABI change is inevitable. We don't want to change the ABI, then have to change it again a few months later, then again. When it changes it should only change once to minimise the hassles for users.
If we have a string "A" and a number 65, since they look identical in memory, how does the OS know which is the string and which is the number?
Another question - assume that a program allocates some memory (say, one byte). How does the OS remember where that memory has been allocated?
Neither of these details are handled by the operating system. They're handled by user programs.
For your first question, internally in memory there is absolutely no distinction between the character 'A' and the numeric value 65 (assuming, of course, that you're just looking at one byte of data). The difference arises when you see how those bits are interpreted by the program. For example, if the user program tries to print the string to the screen, it will probably make some system call to the OS to ask the OS to print the character. In that case, the code in the OS consists of a series of assembly instructions to replicate those bits somewhere in the display device. The display is then tasked with rendering a set of appropriate pixels to draw the character 'A.' In other words, at no point did the program ever "know" that the value was an 'A.' Instead, the hardware simply pushed around bits which controlled another piece of code that ultimately was tasked with turning those bits into pixels.
For your second question, that really depends on the memory manager. There are many ways for a program to allocate memory and know where it's stored. I'm not fully sure I understand what you're asking, but I believe that this answer should be sufficient:
At the OS level, the OS kernel doesn't even know that the byte was allocated. Instead, the OS just allocates giant blocks of memory for the user program to use as it runs. When the program terminates, all that memory is reclaimed.
At the program level, most programs contain a memory manager, a piece of code tasked with allocating and divvying up that large chunk of memory into smaller pieces that can then be used by the program. This usually keeps track of allocated memory as a list of "chunks," where each chunk of memory is treated as a doubly-linked list of elements. Each chunk is usually annotated with information indicating that it's in use and how large the chunk is, which allows the memory manager to reclaim the memory once it's freed.
At the user code level, when you ask for memory, you typically store it in a pointer to keep track of where the memory is. This is just a series of bytes in memory storing the address, which the OS and memory manager never look at unless instructed to.
Hope this helps!
No. 2 - the system keeps a record of all allocations (of a certain process) and can thus remove them e.g. when the process terminates. I propose you read a book on operating system priciples (e.g. Tanenbaum's "Modern Operating Systems").
The character 'A' and the integer number 65 are stored the same way (atleast on 32bit systems) in memory. The string "A" however is stored differently, and that can depend on the system or the programming language. Take for example C which will stored strings as essentially an array of the characters followed by the null character.
Operating Systems use memory managers to keep track of which process are using which parts of memory.
For a computer, a string is a number. A simpliest example would be an ASCII table where for every letter there is a number attached. So if you're familiar with C, you could write printf("%c", 0x65) and actually get a A instead of number. Hope that made sense.
OS don't remember the location of the memory the program has allocated. That's what pointers are for!
The 'OS' applies an algorithm, which will look something like: "if every character in the string is a number, then the string is a number", and gets more complicated for decimals, +/-, etc!
http://en.wikipedia.org/wiki/Dynamic_memory_allocation!