The methods for getting the length of a string in C++ all seem to count up to a null terminating byte, then they either include it in the length or not and then return the length up to then. Well lets say I have a String like this:
'h','e','l','l','o','\0','t','h','e','r','e'
Now if you call length on this string you will get 5 as the length. I need a call on this string that will return 11. Is this possible?
A C++ string is by definition also a C-string, so it will always be null terminated. For this particular problem, I recommend using std::vector< char >
This question already has answers here:
How to declare strings in C [duplicate]
(4 answers)
Closed 8 years ago.
A few weeks ago I started learning the programming language C. I have knowledge in web technologies like HMTL/CSS, Javscript, PHP, and basic server administration, but C is confusing me. To my understanding, the C language does not have a data type for strings, just characters, however I may be wrong.
I have heard there are two ways of declaring a string. What is the difference between these two lines of declaring a string:
a.) char stringName[];
b.) char *stringName;
I get that char stringName[]; is an array of characters. However, the second line confuses me. To my understanding the second line makes a pointer variable. Aren't pointer variables supposed to be the memory address of another variable?
In the C language, a "string" is, as you say, an array of char. Most string functions built into the C spec expect the string to be "NUL terminated", meaning the last char of the string is a 0. Not the code representing the numeral zero, but the actual value of 0.
For example, if you're platform uses ASCII, then the following "string" is "ABC":
char myString[4] = {65, 66, 67, 0};
When you use the char varName[] = "foo" syntax, you're allocating the string on the stack (or if its in a global space, you're allocating it globally, but not dynamically.)
Memory management in C is more manual than in many other langauges you may have experience with. In particular, there is the concept of a "pointer".
char *myString = "ABC"; /* Points to a string somewhere in memory, the compiler puts somewhere. */
Now, a char * is "an address that points to a char or char array". Notice the "or" in that statement, it is important for you, the programmer, to know what the case is.
It's important to also ensure that any string operations you perform don't exceed the amount of memory you've allocated to a pointer.
char myString[5];
strcpy(myString, "12345"); /* copy "12345" into myString.
* On no! I've forgot space for my nul terminator and
* have overwritten some memory I don't own. */
"12345" is actually 6 characters long (don't forget the 0 at the end), but I've only reserved 5 characters. This is what's called a "buffer overflow", and is the cause of many serious bugs.
The other difference between "[]" and "*", is that one is creating an array (as you guessed). The other one is not reserving any space (other than the space to hold the pointer itself.) That means that until you point it somewhere that you know is valid, the value of the pointer should not be used, for either reading or writing.
Another point (made by someone in the comment)
You cannot pass an array as a parameter to a function in C. When you try, it gets converted to a pointer automatically. This is why we pass around pointers to strings rather than the strings themselves
In C, a string is a sequence of character values followed by a 0-valued byte1 . All the library functions that deal with strings use the 0 terminator to identify the end of the string. Strings are stored as arrays of char, but not all arrays of char contain strings.
For example, the string "hello" is represented as the character sequence {'h', 'e', 'l', 'l', 'o', 0}2 To store the string, you need a 6-element array of char - 5 characters plus the 0 terminator:
char greeting[6] = "hello";
or
char greeting[] = "hello";
In the second case, the size of the array is computed from the size of the string used to initialize it (counting the 0 terminator). In both cases, you're creating a 6-element array of char and copying the contents of the string literal to it. Unless the array is declared at file scope (oustide of any function) or with the static keyword, it only exists for the duration of the block in which is was declared.
The string literal "hello" is also stored in a 6-element array of char, but it's stored in such a way that it is allocated when the program is loaded into memory and held until the program terminates3, and is visible throughout the program. When you write
char *greeting = "hello";
you are assigning the address of the first element of the array that contains the string literal to the pointer variable greeting.
As always, a picture is worth a thousand words. Here's a simple little program:
#include <string.h>
#include <stdio.h>
#include <ctype.h>
int main( void )
{
char greeting[] = "hello"; // greeting contains a *copy* of the string "hello";
// size is taken from the length of the string plus the
// 0 terminator
char *greetingPtr = "hello"; // greetingPtr contains the *address* of the
// string literal "hello"
printf( "size of greeting array: %zu\n", sizeof greeting );
printf( "length of greeting string: %zu\n", strlen( greeting ) );
printf( "size of greetingPtr variable: %zu\n", sizeof greetingPtr );
printf( "address of string literal \"hello\": %p\n", (void * ) "hello" );
printf( "address of greeting array: %p\n", (void * ) greeting );
printf( "address of greetingPtr: %p\n", (void * ) &greetingPtr );
printf( "content of greetingPtr: %p\n", (void * ) greetingPtr );
printf( "greeting: %s\n", greeting );
printf( "greetingPtr: %s\n", greetingPtr );
return 0;
}
And here's the output:
size of greeting array: 6
length of greeting string: 5
size of greetingPtr variable: 8
address of string literal "hello": 0x4007f8
address of greeting array: 0x7fff59079cf0
address of greetingPtr: 0x7fff59079ce8
content of greetingPtr: 0x4007f8
greeting: hello
greetingPtr: hello
Note the difference between sizeof and strlen - strlen counts all the characters up to (but not including) the 0 terminator.
So here's what things look like in memory:
Item Address 0x00 0x01 0x02 0x03
---- ------- ---- ---- ---- ----
"hello" 0x4007f8 'h' 'e' 'l' 'l'
0x4007fc 'o' 0x00 ??? ???
...
greetingPtr 0x7fff59079ce8 0x00 0x00 0x00 0x00
0x7fff59879cec 0x00 0x40 0x7f 0xf8
greeting 0x7fff59079cf0 'h' 'e' 'l' 'l'
0x7fff59079cf4 'o' 0x00 ??? ???
The string literal "hello" is stored at a vary low address (on my system, this corresponds to the .rodata section of the executable, which is for static, constant data). The variables greeting and greetingPtr are stored at much higher addresses, corresponding to the stack on my system. As you can see, greetingPtr stores the address of the string literal "hello", while greeting stores a copy of the string contents.
Here's where things can get kind of confusing. Let's look at the following print statements:
printf( "greeting: %s\n", greeting );
printf( "greetingPtr: %s\n", greetingPtr );
greeting is a 6-element array of char, and greetingPtr is a pointer to char, yet we're passing them both to printf in exactly the same way, and the string is being printed out correctly; how can that work?
Unless it is the operand of the sizeof or unary & operators, or is a string literal used to initialize another array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", and the value of the expression will be the address of the first element of the array.
In the printf call, the expression greeting has type "6-element array of char"; since it isn't the operand of the sizeof or unary & operators, it is converted ("decays") to an expression of type "pointer to char" (char *), and the address of the first element is actually passed to printf. IOW, it behaves exactly like the greetingPtr expression in the next printf call4.
The %s conversion specifer tells printf that its corresponding argument has type char *, and that it it should print out the character values starting from that address until it sees the 0 terminator.
Hope that helps a bit.
1. Often referred to as the NUL terminator; this should not be confused with the NULL pointer constant, which is also 0-valued but used in a different context.
2. You'll also see the terminating 0-valued byte written as '\0'. The leading backslash "escapes" the value, so instead of being treated as the character '0' (ASCII 48), it's treated as the value 0 (ASCII 0)).
3. In practice, space is set aside for it in the generated binary file, often in a section marked read-only; attempting to modify the contents of a string literal invokes undefined behavior.
4. This is also why the declaration of greeting copies the string contents to the array, while the declaration of greetingPtr copies the address of the first element of the string. The string literal "hello" is also an array expression. In the first declaration, since it's being used to initialize another array in a declaration, the contents of the array are copied. In the second declaration, the target is a pointer, not an array, so the expression is converted from an array type to a pointer type, and the resulting pointer value is copied to the variable.
In C (and in C++), arrays and pointers are represented similarly; an array is represented by the address of the first element in the array (which is sufficient to gain access to the other elements, since elements are contiguous in memory within an array). This also means that an array does not, by itself, indicate where it ends, and thus you need some way of identifying the end of the array, either by passing around the length as a separate variable or by using some convention (such as that there is a sentinel value that is placed in the last position of the array to indicate the end of the array). For strings, the latter is the common convention, with '\0' (the NUL character) indicating the end of the string.
I am getting input from the user, however when I try to compare it later on to a string literal it does not work. That is just a test though.
I would like to set it up so that when a blank line is entered (just hitting the enter/return key) the program exits. I don't understand why the strings are not comparing because when I print it, it comes out identical.
in := bufio.NewReader(os.Stdin);
input, err := in.ReadBytes('\n');
if err != nil {
fmt.Println("Error: ", err)
}
if string(input) == "example" {
os.Exit(0)
}
string vs []byte
string definition:
string is the set of all strings of 8-bit bytes, conventionally but not necessarily representing UTF-8-encoded text. A string may be empty, but not nil. Values of string type are immutable.
byte definition:
byte is an alias for uint8 and is equivalent to uint8 in all ways. It is used, by convention, to distinguish byte values from 8-bit unsigned integer values.
What does it mean?
[]byte is a byte slice. slice can be empty.
string elements are unicode characters, which can have more then 1 byte.
string elements keep a meaning of data (encoding), []bytes not.
equality operator is defined for string type but not for slice type.
As you see they are two different types with different properties.
There is a great blog post explaining different string related types [1]
Regards the issue you have in your code snippet.
Bear in mind that in.ReadBytes(char) returns a byte slice with char inclusively. So in your code input ends with '\n'. If you want your code to work in desired way then try this:
if string(input) == "example\n" { // or "example\r\n" when on windows
os.Exit(0)
}
Also make sure that your terminal code page is the same as your .go source file. Be aware about different end-line styles (Windows uses "\r\n"), Standard go compiler uses utf8 internally.
[1] Comparison of Go data types for string processing.
I have seen code like this:
struct failed_login_res {
string errorMsg<>;
unsigned int error;
};
What does the <> at the end mean? How is it different from normal declaration like string errorMsg?
Correction: this is for RPC stub, not C++ and I can confirm that it does compile. The question is then still valid.
From a quick googling, I came across this PDF.
Section 6.9 is as follows:
Strings: C has no built-in string type, but instead uses the null-terminated “char *” convention. In XDR language, strings are declared using the “string” keyword, and compiled into “char *”s in the output header file. The maximum size contained in the angle brackets specifies the maximum number of characters allowed in the strings (not counting the NULL character). The maximum size may be left off, indicating a string of arbitrary length.
Examples:
string name<32>; --> char *name;
string longname<>; --> char *longname;
I am trying to learn assembler and want to write a function to convert a number to a string. The signature of the function I want to write would looks like this in a C-like fashion:
int numToStr(long int num, unsigned int bufLen, char* buf)
The function should return the number of bytes that were used if conversion was successful, and 0 otherwise.
My current approach is a simple algorithm. In all cases, if the buffer is full, return 0.
Check if the number is negative. If it is, write a - char into buf[0] and increment the current place in the buffer
Repeatedly divide by 10 and store the remainders in the buffer, until the division yields 0.
Reverse the number in the buffer.
Is this the best way to do this conversion?
This is pretty much how every single implementation of itoa that I've seen works.
One thing that you don't mention but do want to take care of is bounds checking (i.e. making sure you don't write past bufLen).
With regards to the sign: once you've written the -, you need to negate the value. Also, the - needs to be excluded from the final reversal; an alternative is to remember the sign at the start but only write it at the end (just before the reversal).
One final corner case is to make sure that zero gets written out correctly, i.e. as 0 and not as an empty string.