strlen not counting newlines?

strlen not counting newlines? - visual-c++

I embedded lua into my project and came across a strange (for me) behavior of strlen and lua interpreting. I was trying to load a string, containing lua code, with luaL_loadbuffer and it consistently threw error of "unexpected symbol" on whatever was the last line of the lua code, except if the whole chunk was written in one line. so for example:
function start()
print("start")
end
would always results error: unexpected symbol on 3rd line, but
function start() print("start") end
loads successfully.
I figured out that loading the same chunk with luaL_loadstring, gives no errors, and saw that it uses strlen to determine the length of the specified string (I used std::string::size) and also that using strlen to provide the length of the string to the luaL_loadbuffer also results in successful loading.
Now the question was: what may be the difference between strlen and std::string::size, and at my most surprise the answer is that strlen is not counting new lines ('\n'). That is:
const char* str = "this is a string\nthis is a newline";
std::string str2(str);
str2.size(); // gives 34
strlen(str); // gives 33
The difference between the size, and the value returned by strlen was always the number of new line characters.
My questions are:
Does strlen really not counting newlines or I am missing something?
How do newlines affect the interpretation of the lua code internally?
I am using vs 2015 and lua 5.3.0
EDIT:
My first example was not exact, and did not produce the detailed effect for me neither, but I was able to recreate the problem from the original code:
std::fstream _Stream("test.lua", std::ios::ate | std::ios::in);
std::string _Source;
if(_Stream.is_open()) {
_Source.resize(_Stream.tellg());
_Stream.seekg(0, std::ios::beg);
_Stream.read(&_Source[0], _Source.size());
_Stream.close();
}
std::cout << "std::string::size() = " << _Source.size() << std::endl;
std::cout << "strlen() = " << strlen(_Source.c_str()) << std::endl;
The content of test.lua is "function start()\n\tprint("start")\nend\n\nstart()"
The difference is the number of newlines:
http://i.stack.imgur.com/y0QOW.png

Window's line endings (CR+LF) are two characters making the file size larger than the number of characters in the string so the resize operation uses the file size and not the length of the null-terminated string. strlen reports the length of the null-terminated string and counts \n as a single character. You can make the size match the length of the C string by resizing the string to match afterwards:
_Source.resize(strlen(_Source.c_str()) + 1);

Related

How to put a new line inside a string in C++?

I was thinking if it is possible to make a string in C++ which contains data in it like, I don't want to make a string of strings or an array of strings.
Suppose I have a string mv:
mv =
"hello
new
world "
"hello", "new" and "world" are in different lines. Now if we print mv, then "hello", "new" and "world" should come on different lines.
I was also thinking with respect to competitive programming. If I concatenate all the answers of queries in a single string and then output the answer, or cout all the queries one by one, will there be a time difference in both the outputs?

i want my string variable to store the information in this format therefore
like my string variable mv should have some strings in first line and some
strings on second line and this whole should work like a single string
What is a string (std::string) in C++?
It is a container for a dynamically resizable array of characters, equipped
with methods for manipulating that array.
An array of characters is:
characters: |c0|c1|c2|c3|...|cN|
That's the nature of an std::string. There's nothing you can do about that.
What is a line (of text)?
There is no formal definition of a line in C++, or any other programming
language. A line is a visual concept that belongs to reading and writing
text arranged in 2-dimensional space. One line is vertically above or below
another.
Computers don't arrange data in 2-dimensional space. They arrange it all in
linear, 1-dimensional, storage. No data is vertically above or below any other data.
But of course programming languages can represent lines of text and they
all do it by the same convention. Conventionally, a line is an array of
characters that ends with a new-line sequence.
A new-line sequence is itself an array of one or two characters, depending
on your operating system's convention. Windows uses the 2-character sequence
carriage-return,line-feed. Unix-like operating systems use the 1-character
sequence line-feed. You can study the subject in
Wikipedia: Newline. But for
portability, in C++ source code the newline sequence - whatever it actually is -
is represented by the escape sequence
\n, which you can use in source code as if it were a character.
So the array of characters:
|h|e|l|l|o|\n|
represents the line of text:
hello
in C++. And the array of characters:
|h|e|l|l|o|\n|n|e|w|\n|w|o|r|l|d|\n|
represents the three lines of text:
hello
new
world
And if you want to store that array in a single std::string, C++ lets you do it
like this:
std::string s0{'h','e','l','l','o','\n','n','e','w','\n','w','o','r','l','d','\n'};
or more conveniently like this:
std::string s1{"hello\nnew\nworld\n"};
or even - if you have a phobia about using the \n escape sequence - like this:
std::string s2
{R"(hello
new
world
)"};
All of these ways of storing the three lines in a string create exactly the
same character array in that string, namely:
|h|e|l|l|o|\n|n|e|w|\n|w|o|r|l|d|\n|
They all create exactly the same std::string.
And if you print any of those strings you will see the same thing, e.g.
this program:
#include <string>
#include <iostream>
int main() {
std::string s0{'h','e','l','l','o','\n','n','e','w','\n','w','o','r','l','d','\n'};
std::string s1{"hello\nnew\nworld\n"};
std::string s2 // Yes...
{R"(hello
new
world
)"}; // ...it is meant to be formatted like this
std::cout << "--------------" << std::endl;
std::cout << s0;
std::cout << "--------------" << std::endl;
std::cout << s1;
std::cout << "--------------" << std::endl;
std::cout << s2;
std::cout << "--------------" << std::endl;
return 0;
}
outputs:
--------------
hello
new
world
--------------
hello
new
world
--------------
hello
new
world
--------------
Live demo

If I haven't understood your question wrong.
Here is your answer, add an escape character at the end of each line.
string mv = "hello\n\
new\n\
world\n";
\n -> new line
\ -> escape character
Here is the working example:
Example
string mv = "hello\n\
new\n\
world\n";

You can just add the \n character to the string to create a new line.
cout << "Hello \n New \n World" << endl;
This will output:
Hello
New
World

Iterator malfunction when looping through string

What I'm trying to do is get the user to input a phone number in a format they prefer and then remove the helper characters the user has used in their input using a loop which compares each character in the string with a another set of defined helper characters, if there is a match it erases that character from the string. I'm doing this as a practice problem to develop my understaing of iterators. I have successfully done this with the trivial for loop. However when I try to do it this way to my surprise whenever there are two helper characters like the "(+" the loop does not run for the next character which in this case is the "+". It direclt skips to the "9" and works fine after that. It does the same behaviour if other helper characters are present later on in the string. I have checked this by placing a cout << *i just under the first for loop. I don't understand why this would happen? Because of this the program fails to do what it's supposed to and out puts "+91892333" instead of the desired "91892333".
#include <iostream>
#include <string>
using namespace std;
int main()
{
string main = "(+91)892-333";
string dictionary = "(+)-";
for( string::iterator i = main.begin(); i != main.end(); i++)
{
for( char word : dictionary)
{
if(*i == word)
{
main.erase(i);
break;
}
}
}
cout << main;
}

According to the documentation erase invalidates iterators. So after you call erase you must not use iterators obtained before or you get UB. In your case erase does not change iterator but moves the end if the string after erased symbol one symbol left. So your iterator now points to the next character. But that behaviour is not guaranteed, std::string may allocate new buffer and move the data there, leaving old iterators pointing to nowhere.

Read a String with spaces till a new line in C

I am in a pickle right now. I'm having trouble taking in an input of example
1994 The Shawshank Redemption
1994 Pulp Fiction
2008 The Dark Knight
1957 12 Angry Men
I first take in the number into an integer, then I need to take in the name of the Movie into a string using a character array, however i have not been able to get this done.
here is the code atm
while(scanf("%d", &myear) != EOF)
{
i = 0;
while(scanf("%[^\n]", &ch))
{
title[i] = ch;
i++;
}
addNode(makeData(title,myear));
}
The title array is arbitrarily large and the function is to add the data as a node to a linked list. right now the output I keep getting for each node is as follows
" hank Redemption"
" ion"
" Knight"
" Men"
Yes, it oddly prints a space in front of the cut-off title. I checked the variables and it adds the space in the data. (I am not printing the year as that is taken in correctly)
How can I fix this?

You are using the wrong type of argument passed to scanf() -- instead of scanning a character, try scanning to the string buffer immediately. %[^\n] scans an entire string up to (but not including) the newline. It does not scan only one character.
(Marginal secondary problem: I don't know from where you people are getting the idea that scanf() returns EOF at end of input, but it doesn't - you'd be better off reading the documentation instead of making incorrect assumptions.)
I hope you see now: scanf() is hard to get right. It's evil. Why not input the whole line at once then parse it using sane functions?
char buf[LINE_MAX];
while (fgets(buf, sizeof buf, stdin) != NULL) {
int year = strtol(buf, NULL, 0);
const char *p = strchr(buf, ' ');
if (p != NULL) {
char name[LINE_MAX];
strcpy(name, p + 1); // safe because strlen(p) <= sizeof(name)
}
}

c++ c_str adding strange characters to the end of the string

let me first thank you for taking the time to read this.
I'm trying to read a file in c++. Currently I have a method that allows the user to select a file in explorer and returns this as an 'std::string'. I then have to open this file, but the method I have for this uses const char*. Therefore I need to convert from one to the other.
If there is an easy method in windows for reading a file using a string instead then let me know as it would solve my entire problem.
When I convert from string to const char*, using str.c_str(), I get a lot of weird characters at the end. I've researched other topics with the same problem, but the answers all seem very specific for those projects, or say just stick to sting/vector instead. Obviously I would happily do this, but I don't have a method that opens the file using a string/vector.
Any help is appreciated :) The code and output of where this occurs is pasted below.
*SOURCE
std::string f;
if (LOWORD(wParam) == 1) {
f = openFile();
char *fchar = new char[f.size()+1]; // +1 to account for \0 byte
//char *fchar = std::vector[1];
std::strncpy(fchar, f.c_str(), f.size());
file1 = fchar;
std::cout<<"string size: " << f.size() << std::endl;
std::cout<<"string: " << f << std::endl;
std::cout<<"fchar: " << fchar << std::endl;
std::cout<<"file1: " << file1 << std::endl;
}
OUTPUT
string size: 33
string: C:\Users\Joseph\Pictures\back_raw
fchar: C:\Users\Joseph\Pictures\back_raw═²²²²½½½½½½½½¯■
file1: C:\Users\Joseph\Pictures\back_raw═²²²²½½½½½½½½¯■**

std::strncpy() does not place a `\0' character at the end of the string. See http://www.cplusplus.com/reference/clibrary/cstring/strncpy/. You need to copy that as well:
std::strncpy(fchar, f.c_str(), f.size() + 1);

Where is file1 declared ?
What about openFile ? (would like to know its return type)
Could you try to manually put that '\0' ? IMHO, your char* is simply lacking the \0 at the end...

Make sure you put a \0 at the end, everywhere you need to.

Making a WCHAR null terminated

I've got this
WCHAR fileName[1];
as a returned value from a function (it's a sys 32 function so I am not able to change the returned type). I need to make fileName to be null terminated so I am trying to append '\0' to it, but nothing seems to work.
Once I get a null terminated WCHAR I will need to pass it to another sys 32 function so I need it to stay as WCHAR.
Could anyone give me any suggestion please?
================================================
Thanks a lot for all your help. Looks like my problem has to do with more than missing a null terminated string.
//This works:
WCHAR szPath1[50] = L"\\Invalid2.txt.txt";
dwResult = FbwfCommitFile(szDrive, pPath1); //Successful
//This does not:
std::wstring l_fn(L"\\");
//Because Cache_detail->fileName is \Invalid2.txt.txt and I need two
l_fn.append(Cache_detail->fileName);
l_fn += L""; //To ensure null terminated
fprintf(output, "l_fn.c_str: %ls\n", l_fn.c_str()); //Prints "\\Invalid2.txt.txt"
iCommitErr = FbwfCommitFile(L"C:", (WCHAR*)l_fn.c_str()); //Unsuccessful
//Then when I do a comparison on these two they are unequal.
int iCompareResult = l_fn.compare(pPath1); // returns -1
So I need to figure out how these two ended up to be different.
Thanks a lot!

Since you mentioned fbwffindfirst/fbwffindnext in a comment, you're talking about the file name returned in FbwfCacheDetail. So from the fileNameLength field you know length for the fileName in bytes. The length of fileName in WCHAR's is fileNameLength/sizeof(WCHAR). So the simple answer is that you can set
fileName[fileNameLength/sizeof(WCHAR)+1] = L'\0'
Now this is important you need to make sure that the buffer you send for the cacheDetail parameter into fbwffindfirst/fbwffindnext is sizeof(WCHAR) bytes larger than you need, the above code snippet may run outside the bounds of your array. So for the size parameter of fbwffindfirst/fbwffindnext pass in the buffer size - sizeof(WCHAR).
For example this:
// *** Caution: This example has no error checking, nor has it been compiled ***
ULONG error;
ULONG size;
FbwfCacheDetail *cacheDetail;
// Make an intial call to find how big of a buffer we need
size = 0;
error = FbwfFindFirst(volume, NULL, &size);
if (error == ERROR_MORE_DATA) {
// Allocate more than we need
cacheDetail = (FbwfCacheDetail*)malloc(size + sizeof(WCHAR));
// Don't tell this call about the bytes we allocated for the null
error = FbwfFindFirstFile(volume, cacheDetail, &size);
cacheDetail->fileName[cacheDetail->fileNameLength/sizeof(WCHAR)+1] = L"\0";
// ... Use fileName as a null terminated string ...
// Have to free what we allocate
free(cacheDetail);
}
Of course you'll have to change a good bit to fit in with your code (plus you'll have to call fbwffindnext as well)
If you are interested in why the FbwfCacheDetail struct ends with a WCHAR[1] field, see this blog post. It's a pretty common pattern in the Windows API.

Use L'\0', not '\0'.

As each character of a WCHAR is 16-bit in size, you should perhaps append \0\0 to it, but I'm not sure if this works. By the way, WCHAR fileName[1]; is creating a WCHAR of length 1, perhaps you want something like WCHAR fileName[1024]; instead.

WCHAR fileName[1]; is an array of 1 character, so if null terminated it will contain only the null terminator L'\0'.
Which API function are you calling?
Edited
The fileName member in FbwfCacheDetail is only 1 character which is a common technique used when the length of the array is unknown and the member is the last member in a structure. As you have likely already noticed if your allocated buffer is is only sizeof (FbwfCacheDetail) long then FbwfFindFirst returns ERROR_NOT_ENOUGH_MEMORY.
So if I understand, what you desire to do it output the non NULL terminated filename using fprintf. This can be done as follows
fprintf (outputfile, L"%.*ls", cacheDetail.fileNameLength, cacheDetail.fileName);
This will print only the first fileNameLength characters of fileName.
An alternative approach would be to append a NULL terminator to the end of fileName. First you'll need to ensure that the buffer is long enough which can be done by subtracting sizeof (WCHAR) from the size argument you pass to FbwfFindFirst. So if you allocate a buffer of 1000 bytes, you'll pass 998 to FbwfFindFirst, reserving the last two bytes in the buffer for your own use. Then to add the NULL terminator and output the file name use
cacheDetail.fileName[cacheDetail.fileNameLength] = L'\0';
fprintf (outputfile, L"%ls", cacheDetail.fileName);

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

strlen not counting newlines? - visual-c++

Related

How to put a new line inside a string in C++?

Iterator malfunction when looping through string

Read a String with spaces till a new line in C

c++ c_str adding strange characters to the end of the string

Making a WCHAR null terminated

Categories

Resources