Can someone clarify strings for me in the C programming language [duplicate] - programming-languages

This question already has answers here:
How to declare strings in C [duplicate]
(4 answers)
Closed 8 years ago.
A few weeks ago I started learning the programming language C. I have knowledge in web technologies like HMTL/CSS, Javscript, PHP, and basic server administration, but C is confusing me. To my understanding, the C language does not have a data type for strings, just characters, however I may be wrong.
I have heard there are two ways of declaring a string. What is the difference between these two lines of declaring a string:
a.) char stringName[];
b.) char *stringName;
I get that char stringName[]; is an array of characters. However, the second line confuses me. To my understanding the second line makes a pointer variable. Aren't pointer variables supposed to be the memory address of another variable?

In the C language, a "string" is, as you say, an array of char. Most string functions built into the C spec expect the string to be "NUL terminated", meaning the last char of the string is a 0. Not the code representing the numeral zero, but the actual value of 0.
For example, if you're platform uses ASCII, then the following "string" is "ABC":
char myString[4] = {65, 66, 67, 0};
When you use the char varName[] = "foo" syntax, you're allocating the string on the stack (or if its in a global space, you're allocating it globally, but not dynamically.)
Memory management in C is more manual than in many other langauges you may have experience with. In particular, there is the concept of a "pointer".
char *myString = "ABC"; /* Points to a string somewhere in memory, the compiler puts somewhere. */
Now, a char * is "an address that points to a char or char array". Notice the "or" in that statement, it is important for you, the programmer, to know what the case is.
It's important to also ensure that any string operations you perform don't exceed the amount of memory you've allocated to a pointer.
char myString[5];
strcpy(myString, "12345"); /* copy "12345" into myString.
* On no! I've forgot space for my nul terminator and
* have overwritten some memory I don't own. */
"12345" is actually 6 characters long (don't forget the 0 at the end), but I've only reserved 5 characters. This is what's called a "buffer overflow", and is the cause of many serious bugs.
The other difference between "[]" and "*", is that one is creating an array (as you guessed). The other one is not reserving any space (other than the space to hold the pointer itself.) That means that until you point it somewhere that you know is valid, the value of the pointer should not be used, for either reading or writing.
Another point (made by someone in the comment)
You cannot pass an array as a parameter to a function in C. When you try, it gets converted to a pointer automatically. This is why we pass around pointers to strings rather than the strings themselves

In C, a string is a sequence of character values followed by a 0-valued byte1 . All the library functions that deal with strings use the 0 terminator to identify the end of the string. Strings are stored as arrays of char, but not all arrays of char contain strings.
For example, the string "hello" is represented as the character sequence {'h', 'e', 'l', 'l', 'o', 0}2 To store the string, you need a 6-element array of char - 5 characters plus the 0 terminator:
char greeting[6] = "hello";
or
char greeting[] = "hello";
In the second case, the size of the array is computed from the size of the string used to initialize it (counting the 0 terminator). In both cases, you're creating a 6-element array of char and copying the contents of the string literal to it. Unless the array is declared at file scope (oustide of any function) or with the static keyword, it only exists for the duration of the block in which is was declared.
The string literal "hello" is also stored in a 6-element array of char, but it's stored in such a way that it is allocated when the program is loaded into memory and held until the program terminates3, and is visible throughout the program. When you write
char *greeting = "hello";
you are assigning the address of the first element of the array that contains the string literal to the pointer variable greeting.
As always, a picture is worth a thousand words. Here's a simple little program:
#include <string.h>
#include <stdio.h>
#include <ctype.h>
int main( void )
{
char greeting[] = "hello"; // greeting contains a *copy* of the string "hello";
// size is taken from the length of the string plus the
// 0 terminator
char *greetingPtr = "hello"; // greetingPtr contains the *address* of the
// string literal "hello"
printf( "size of greeting array: %zu\n", sizeof greeting );
printf( "length of greeting string: %zu\n", strlen( greeting ) );
printf( "size of greetingPtr variable: %zu\n", sizeof greetingPtr );
printf( "address of string literal \"hello\": %p\n", (void * ) "hello" );
printf( "address of greeting array: %p\n", (void * ) greeting );
printf( "address of greetingPtr: %p\n", (void * ) &greetingPtr );
printf( "content of greetingPtr: %p\n", (void * ) greetingPtr );
printf( "greeting: %s\n", greeting );
printf( "greetingPtr: %s\n", greetingPtr );
return 0;
}
And here's the output:
size of greeting array: 6
length of greeting string: 5
size of greetingPtr variable: 8
address of string literal "hello": 0x4007f8
address of greeting array: 0x7fff59079cf0
address of greetingPtr: 0x7fff59079ce8
content of greetingPtr: 0x4007f8
greeting: hello
greetingPtr: hello
Note the difference between sizeof and strlen - strlen counts all the characters up to (but not including) the 0 terminator.
So here's what things look like in memory:
Item Address 0x00 0x01 0x02 0x03
---- ------- ---- ---- ---- ----
"hello" 0x4007f8 'h' 'e' 'l' 'l'
0x4007fc 'o' 0x00 ??? ???
...
greetingPtr 0x7fff59079ce8 0x00 0x00 0x00 0x00
0x7fff59879cec 0x00 0x40 0x7f 0xf8
greeting 0x7fff59079cf0 'h' 'e' 'l' 'l'
0x7fff59079cf4 'o' 0x00 ??? ???
The string literal "hello" is stored at a vary low address (on my system, this corresponds to the .rodata section of the executable, which is for static, constant data). The variables greeting and greetingPtr are stored at much higher addresses, corresponding to the stack on my system. As you can see, greetingPtr stores the address of the string literal "hello", while greeting stores a copy of the string contents.
Here's where things can get kind of confusing. Let's look at the following print statements:
printf( "greeting: %s\n", greeting );
printf( "greetingPtr: %s\n", greetingPtr );
greeting is a 6-element array of char, and greetingPtr is a pointer to char, yet we're passing them both to printf in exactly the same way, and the string is being printed out correctly; how can that work?
Unless it is the operand of the sizeof or unary & operators, or is a string literal used to initialize another array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", and the value of the expression will be the address of the first element of the array.
In the printf call, the expression greeting has type "6-element array of char"; since it isn't the operand of the sizeof or unary & operators, it is converted ("decays") to an expression of type "pointer to char" (char *), and the address of the first element is actually passed to printf. IOW, it behaves exactly like the greetingPtr expression in the next printf call4.
The %s conversion specifer tells printf that its corresponding argument has type char *, and that it it should print out the character values starting from that address until it sees the 0 terminator.
Hope that helps a bit.
1. Often referred to as the NUL terminator; this should not be confused with the NULL pointer constant, which is also 0-valued but used in a different context.
2. You'll also see the terminating 0-valued byte written as '\0'. The leading backslash "escapes" the value, so instead of being treated as the character '0' (ASCII 48), it's treated as the value 0 (ASCII 0)).
3. In practice, space is set aside for it in the generated binary file, often in a section marked read-only; attempting to modify the contents of a string literal invokes undefined behavior.
4. This is also why the declaration of greeting copies the string contents to the array, while the declaration of greetingPtr copies the address of the first element of the string. The string literal "hello" is also an array expression. In the first declaration, since it's being used to initialize another array in a declaration, the contents of the array are copied. In the second declaration, the target is a pointer, not an array, so the expression is converted from an array type to a pointer type, and the resulting pointer value is copied to the variable.

In C (and in C++), arrays and pointers are represented similarly; an array is represented by the address of the first element in the array (which is sufficient to gain access to the other elements, since elements are contiguous in memory within an array). This also means that an array does not, by itself, indicate where it ends, and thus you need some way of identifying the end of the array, either by passing around the length as a separate variable or by using some convention (such as that there is a sentinel value that is placed in the last position of the array to indicate the end of the array). For strings, the latter is the common convention, with '\0' (the NUL character) indicating the end of the string.

Related

How can a pointet to char act like an array?

I am trying to understand pointers in C.
char *name = "HASAN";
When I store " HASAN" using char *, the string is stored somewhere in the memory as an array of characters.
I can access to every single character of the string by treating "name" as an array.
//second character of HASAN is A.
printf("%c", name[1]);
But I have learnt that "name" is just a pointer that stores the address of the first character 'A' of the string "HASAN".
So, how can " name", a pointer act like an array?
It quite simple.
As standards says:
*(name + N) == name[N]
And it does not matter how name is declared as arrays decay to pointers.
char *name = "HASAN";
or
char name[] = "HASAN";
Consider the following two examples:
char arr[] = "HASAN"; // array
It allocates 6 consecutive bytes of memory and associates the address of the first allocated byte with arr.
On the other hand,
char ptr* = "HASAN"; // pointer
The pointer variable ptr* stores 6 consecutive bytes for the variable and 4 extra bytes for pointer variable itself and assigns the address of the string literal to ptr. So, in this case, a total of 10 bytes are allocated.

Without null char in second argument for strcpy() function is returning unexpected result. Why?

#include<stdio.h>
#include <string.h>
int main()
{
char a[5];
char b[2]="12";
strcpy(a,b);
printf("%s\n",a);
}
There is no null char in string b that is why the output is not as expected.output : 12#
Why the output is coming like this only?
Your program has undefined behavior.
Your array b contains { '1', '2' }. As you say, there is no null character in the array -- which means it doesn't contain a string.
strcpy's second argument must be a pointer to a string. You gave it a char* value that is not a pointer to a string.
In practice, strcpy will probably continue copying characters from the memory following b. That memory contains arbitrary garbage -- and even the attempt to access it has undefined behavior.
In a sense, you're lucky that you got output that is visibly garbage. If there had happened to be a null character immediately following your array in memory, and if your program didn't blow up trying to access it, it could have just printed 12, and you might not have known that your program is buggy.
If you want to correct your program, you can change
char b[2] = "12";
to
char b[] = "12";
The compiler will figure out how big b needs to be to hold the string (including the required terminating null character).
strcpy keeps copying until it hits a null character (byte with value 0x00). It copies whatever it encounters on the way. In your case, memory after the array b happens to contain a byte with value 0x40 ('#') and then a byte with value 0x00.

Problems assigning char string to char array

I have already read all prior answers regarding my problem. However, I'm not a very bright coder to am unable to grasp it. Could someone please look into my problem.
I am trying to write a CSV file using entries from a 2D array. string.h has already been included in main().
void create_marks_csv(int rout[][20],float p[][20],float c[][20],int n)
{
system("cls");
char str1[100],str2[100],str3[100];
printf("\nEnter filename for routing matrix: ");
gets(str1);
printf("\n Creating %s.csv file",str1);
FILE *fp;
int i,j;
str1=strcat(str1,".csv");
str1=strcat("C:\\Users\\Neil\\Documents\\Trust CSV Logs\\",str1) ;
fp=fopen(str1,"w+");
for(i=1;i<=n;i++)
{
for(j=1;j<=n;j++)
{
if(i==j)
fprintf(fp,"X");
else
fprintf(fp,"%d",rout[i][j]);
}
fprintf(fp,"\n");
}
fclose(fp);
printf("\nFile created: %s",str1);
system("cls");
}
The warnings and errors are as follows:
5 20 C:\Users\Neil\Documents\main.c [Warning] extra tokens at end of #include directive [enabled by default]
C:\Users\Neil\Documents\main.c In function 'create_marks_csv':
168 6 C:\Users\Neil\Documents\main.c [Error] incompatible types when assigning to type 'char[100]' from type 'char *'
169 6 C:\Users\Neil\Documents\main.c [Error] incompatible types when assigning to type 'char[100]' from type 'char *'
28 C:\Users\Neil\Documents\Makefile.win recipe for target 'main.o' failed
Every time you write str1 =, you are telling the compiler to change str1 so that it points to whatever location in memory is found on the right-hand side of the = sign. But you declared char str1[100], which means that str1, interpreted as a pointer, can only point to the start of the block of 100 characters at the location where this declaration allocated them. So it makes no sense to write str1 =.
Passing a C string constant as the first argument of strcat is likely to be a disaster, although the compiler seems not to mind. The first argument of strcat should be a character buffer big enough to hold the results of the concatenation. In order to concatenate something onto the end of a constant string, you can allocate a buffer big enough, then copy the constant string to it, then call strcat.
In general you can probably do whatever you need to do without using the return value of strcat, that is, no need to ever write strcat on the right-hand side of =.
It is advisable to use fgets instead of gets because then you can protect against the possibility that you will get too much input to fit in your allocated character buffer. If you allocate 100 characters in your largest buffer, you can only afford to accept 95 characters minus the length of the string "C:\\Users\\Neil\\Documents\\Trust CSV Logs\\". (The other 5 characters are required to hold the string ".csv" and the terminating null character).
I saw also that you declare str2 and str3 but I didn't see where you used either of them. It looks like you don't need both of them, but you might find it convenient to use str2 as the buffer for your last concatenation of strings.

(char)NULL , '\0' and 0 all mean the same thing in C's memset?

We are migrating a 32 bit application from rhel 5.3 to 6.4
We are getting an warning "Cast from pointer to integer of different size " on new system's memset.
Do (char)NULL, '\0' and 0 all mean the same thing in C's memset?
The following code is giving the warning in new environment.
#define NULLC (char)NULL
#define MAX_LEN 11
…
memset(process_name, NULLC, MAX_LEN + 1);
strncpy(process_name, "oreo", MAX_LEN);
The do not all mean the same thing, though they're likely to yield the same result.
(char)NULL converts the value of NULL, which is an implementation-defined null pointer constant, to char. The type of NULL may be int, or void*, or some other integer type. If it's of an integer type, the conversion is well defined and yields 0. If it's void*, you're converting a null pointer value to char, which has an implementation-defined result (which is likely, but not guaranteed, to be 0).
The macro NULL is intended to refer to a null pointer value, not a null character, which is a very different thing.
Your macro NULLC is not particularly useful. If you want to refer to a null character, just use the literal constant '\0'. (And NULLC is IMHO too easily confused with NULL.)
The other two constants, '\0' and 0, have exactly the same type (int) and value (zero).
(It's admittedly counterintutive that '\0' has type int rather than char. It's that way for historical reasons, and it rarely matters. In C++, character constants are of type char, but you asked about C.)
They all have the same value 0 but they don't mean the same thing.
(char)NULL - You are casting the value of NULL pointer to character with value 0
'\0' - End of string character with value 0 (NUL)
0 - 32 bit integer with value 0.
You are getting a warning because somewhere in your code you're likely using something like:
short somevar = NULL;
or something similar.
0 and '\0' are both the integer 0 (the type of a character literal is int, not char), so they are exactly equivalent. The second argument to memset is an int, from which only the low-order 8 bits will be used.
NULL is a different beast. It is a pointer type that is guaranteed by the standard to be different from any pointer to a real object. The standard does NOT say that this is done by giving it the value 0, though it must compare equal to zero. Also, it may be of different width than int, so passing it as the second argument to memset() might not compile.
In defining NULLC, you are casting NULL from a native pointer (64-bits, probably defined as (void*)0) to char (8-bits). If you wanted to declare NULLC, you should just do
#define NULLC 0
and do away with NULL and the (char). The formal argument to memset is int, not char.
0 = zero of int datatype
'\0' = (char)0 //null char
NULL = (void*)0 //null pointer
See how they are interlinked to each other. Gcc often gives warnings for all the typecast that are implicitly done by the compiler.
You are using
#define NULLC (char)NULL
.....
memset(process_name, NULLC, MAX_LEN + 1);
equivalent to:
memset(process_name, (char)NULL, MAX_LEN + 1);
equivalent to:
memset(process_name, '\0', MAX_LEN + 1);
You are passing char data (ie; '\0' ) as second parameter where "unsigned int" data is accepted. So compiler is converting it to unsigned int implicilty and thus giving typecast warning. You can simply ignore it or change it as:
memset(process_name, 0, MAX_LEN + 1);

Basics of Strings

Ok, i've always kind of known that computers treat strings as a series of numbers under the covers, but i never really looked at the details of how it works. What sort of magic is going on in the average compiler/processor when we do, for instance, the following?
string myString = "foo";
myString += "bar";
print(myString) //replace with printing function of your choice
The answer is completely dependent on the language in question. But C is usually a good language to kind of see how things happen behind the scenes.
In C:
In C strings are array of char with a 0 at the end:
char str[1024];
strcpy(str, "hello ");
strcpy(str, "world!");
Behind the scenes str[0] == 'h' (which has an int value), str[1] == 'e', ...
str[11] == '!', str[12] == '\0';
A char is simply a number which can contain one of 256 values. Each character has a numeric value.
In C++:
strings are supported in the same way as C but you also have a string type which is part of STL.
string literals are part of static storage and cannot be changed directly unless you want undefined behavior.
It's implementation dependent how the string type actually works behind the scenes, but the string objects themselves are mutable.
In C#:
strings are immutable. Which means you can't directly change a string once it's created. When you do += what happen is a new string gets created and your string now references that new string.
The implementation varies between language and compiler of course, but typically for C it's something like the following. Note that strings are essentially syntactical sugar for char arrays (char[]) in C.
1.
string myString = "foo";
Allocate 3 bytes of memory for the array and set the value of the 1st byte to 'f' (its ASCII code rather), the 2nd byte to 'o', the 2rd byte to 'o'.
2.
foo += "bar";
Read existing string (char array) from memory pointed to by foo.
Allocate 6 bytes of memory, fill the first 3 bytes with the read contents of foo, and the next 3 bytes with b, a, and r.
3.
print(foo)
Read the string foo now points to from memory, and print it to the screen.
This is a pretty rough overview, but hopefully should give you the general idea.
Side note: In some languages/compuilers, char != byte - for example, C#, where strings are stored in Unicode format by default, and notably the length of the string is also stored in memory. C++ typically uses null-terminated strings, which solves the problem in another way, though it means determining its length is O(n) rather than O(1).
Its very language dependent. However, in most cases strings are immutable, so doing that is going to allocate a new string and release the old one's memory.
I'm assuming a typo in your sample and that there is only one variable called either foo or myString, not two variables?
I'd say that it'll depend a lot on what compiler you're using. In .Net strings are immutable so when you add "bar" you're not actually adding it but rather creating a new string containing "foobar" and telling it to put that in your variable.
In other languages it will work differently.

Resources