Remove newline character from a string? - string

I have a string that is like so:
"string content here
"
because it is too long to fit on the screen in one line
The string is the name of a file i would like to read, but i always get an error message that the file name wasn't found because it includes the new line character in the string when this obviously isn't in the file name. I cannot rename the file and I have tried the strip function to remove it, but this doesn't work. How can I remove the enter character from my string so I can load my file?

You can use the function strip to remove any trailing whitespace from a string.
>> text = "hello" + newline; %Create test string.
>> disp(text)
hello
>> text_stripped = strip(text);
>> disp(text_stripped)
hello
>>
In the above ">>" has been included to better present the removal of the whitespace in the string.

Consider replacing the newline character with nothing using strrep. Link
As an example:
s = sprintf('abc\ndef') % Create a string s with a newline character in the middle
s = strrep(s, newline, '') % Replace newline with nothing
Alternatively, you could use regular expressions if there are several characters causing you issues.
Alternatively, you could use strip if you know the newline always occurs at the beginning or end.

Related

Remove spaces from a string but not new lines in lua

I used string.gsub(str, "%s+") to remove spaces from a string but not remove new lines, example:
str = "string with\nnew line"
string.gsub(str, "%s+")
print(str)
and I'm expecting the output to be like:
stringwith
newline
what pattern should I use to get that result.
It seems you want to match any whitespace matched with %s but exclude a newline char from the pattern.
You can use a reverse %S pattern (that matches any non-whitespace char) in a negated character set, [^...], and add a \n there:
local str = "string with\nnew line"
str = string.gsub(str, "[^%S\n]+", "")
print(str)
See an online Lua demo yielding
stringwith
newline
"%s" matches any whitespace character. if you want to match a space use " ". If you want to define a specific number of spaces either explicitly write them down " " or use string.rep(" ", 5)

how do you count and replace a string in a text file that starts at the end of one line and continues on the next using linux commands?

I have a large (4 GB) Windows .csv text file (each lines end in "\r\n") in a Linux environment that was supposed to have been a csv delimited file (delimiter = '|', text qualifier = '"') with each field separated by a pipe and enclosed in double quotes. Any narrative text field with embedded double quotes was supposed to have the double quote escaped with a second double quote (ie. " the quick "brown" fox" was supposed to have been represented as "the quick ""brown"" fox"). Unfortunately escaping the embedded double quotes did not occur. Further the text fields may include embedded new lines (i.e. Windows CR (\r\n)) which need to be retained.
Sample lines might look as follows:
"1234567890123456"|"2016-07-30"|"2016-08-01"|"123"|"456"|"789"|"text narrative field starts\r\n
with text lines that may have embedded double quotes "For example"\r\n
and may include measurements such as 1/2" x 2" with \r\n
the text continuing and includes embedded line breaks \r\n
which will finally be terminated with a double quote"\r\n
"9876543210654321"|"2017-01-31"|"2018-08-01"|"123"|"456"|"789"|"text narrative field"\r\n
"2345678901234567"|"...."\r\n
with the objective to have the output appear as follows:
~1234567890123456~|~2016-07-30~|~2016-08-01~|~123~|~456~|~789~|~text narrative field starts\r\n
with text lines that may have embedded double quotes ""For example""\r\n
and may include measurements such as 1/2"" x 2"" with \r\n
the text continuing and includes embedded line breaks \r\n
which will finally be terminated with a double quote~\r\n
~9876543210654321~|~2017-01-31~|~2018-08-01~|~123~|~456~|~789~|~text narrative field~\r\n
~2345678901234567~|~....~\r\n
The solution I was attempting to implement was to:
SUCCESSFUL: change all the "|" sequences to ~|~
SUCCESSFUL: change the double quote (")at the start of the first line and end of the last line to a tilde (~)
change the ending and starting double quotes to tildes for any lines ending in a double quote at the end of the first line and terminated with a CR (\r\n) (eg. ..."\r\n) and the next line begins with a double quote, followed by 16 digit number and a tilde (eg. "1234567890123456~...) (i.e. it is the start of a new record)
convert all remaining double quote characters to two successive double quotes (change " to "")
then reverse the first 3 steps above changing all ~ back to double quotes.
I started by using sed to replace all strings with double quote, followed by a pipe, followed by a double quote (i.e. "|") with a tilde, pipe, tilde (i.e. ~|~). I then manually replaced the first and last doublequote in the file with a tilde.
This is where I ran into issues as I tried to count the number of occurrences where a line ends with a doublequote(") and the start of the next line begins with a doublequote followed by a 16 digit number and a "~" which will tell me the actual number of csv records in the file (minus one) as opposed to the number of lines. I attempted to do this using grep: grep '"\r\n"\d{16}~' | wc -l but that didn't work
I then need to replace those double quotes wherein a double quote ends a record and the succeeding record begins with a double quote followed by a 16 digit number and a "~" leaving everything else intact.
I tried to use sed: sed 's/"\r\n"(\d{16}~)/~\r\n~\1' windows_file.txt but it is not working as hoped.
I would welcome any recommendations as to how to accomplish the above.
The script below does what you expect using awk, except for the very last line in the file since it does not know where that record ends.
It could be fixed counting lines in the file but would be impractical since it's a big file.
Looking at data structure records are separated by "\r\n" and fields by "|" let's use that with awk.
gawk 'BEGIN{
RS="\"\r\n\"" # input record separator RS, 2 double quotes with a DOS line ending in the middle
FS="\"\\|\"" # input field separator FS, 2 double quotes with a pipe in the middle
ORS="~\r\n~" # your record separator
OFS="~|~" # your field separator
} {
$1=$1 # trick awk into believing something has changed
if (NR == 1){ # first record, replace first character
print "~" substr($0,2)
}else{
print $0
}
} ' test.txt
Result (assuming lines end with \r\n):
~1234567890123456~|~2016-07-30~|~2016-08-01~|~123~|~456~|~789~|~text narrative field starts
with text lines that may have embedded double quotes "For example"
and may include measurements such as 1/2" x 2" with
the text continuing and includes embedded line breaks
which will finally be terminated with a double quote~
~9876543210654321~|~2017-01-31~|~2018-08-01~|~123~|~456~|~789~|~text narrative field~
~10654321~|~2018-09-31~|~2018-08-01~|~123~|~456~|~789~|~asdasdasdasdad asasda"
~
~
PS: will break if a field contains a line that starts with " and the preceding line within the same ends with "\r\n since the pattern will match the proposed RS.
"10654321"|"2018-09-31"|"2018-08-01"|"123"|"456"|"789"|"asdasdasdasdad asasda"\r\n
"some more"\r\n
"22222"|".... (another record)

gsubbing a string with a pattern containing a newline character in Lua

Does string.gsub recognize the newline character in a string literal? I have a scenario in which I am trying to gsub a portion of a string indicated by a given operator from the start of the operator to the newline like so:
local function removeComments(str, operator)
local new_Sc = (str):gsub(operator..".*\n", "");
return new_Sc;
end
local source = [[
int hi = 123; //a basic comment
char ok = "abc"; //another comment
]];
source = removeComments(source, "//");
print(source);
however in the output I see that it removed the rest of the string literal after the first comment:
int hi = 123;
I tried using the literal newline character by using string.char(10) like so (str):gsub(operator..".*"..string.char(10), ""); however I still got the same output; it removes the comment and the rest of the string instead of the start of the comment to the newline.
So is there anyway to gsub a string literal for a pattern containing a newline character?
Thanks
The problem you are facing is akin to greedy vs. lazy matching in regular expressions (.* vs .*?).
In Lua patterns, X.*\n means "match X, then match as many as possible characters followed by a newline". gsub has no special handling for a newline, hence it will try to continue matching until the last newline, subbing as many characters as it can. You want to match as few characters as possible, which is represented by .- in Lua patterns.
Also, I am not sure if it is intended or not, but this strategy will not remove the comment from the last line, if it is not (properly) ended by a newline. I am not sure if it can be represented by a single pattern, but this function will remove comments from all lines:
local function removeComments(str, operator)
local new_Sc = str:gsub(operator..".-\n", "\n");
new_Sc = new_Sc:gsub(operator.."[^\n].*$", "");
return new_Sc;
end

perl: print remaining string only if there is no character before the matched value.

The following prints the entire content of the line after "B. "
perl -ne'print if /B[.] (.*)/s' $string > file
How can I match/print the line only if there is no other character before the "B. "? In other words, if there is a character before the "B. " ie. "TAB." skip the line / do not print.
The correct "B." is always on a new line, the only correct line to match appears as follows:
B. some text here
A regex with a leading carat indicates that the expression should match only if it is the first item on the line. The pattern /^B[.] (.*)/s should get you the result you're looking for.
Put ^ in front of the B. It means match the word starts with B. So your regex should be /^B\. (.*)/. Then no need you s flag in your pattern match.

New to Perl and was wondering why my code isn't doing what it's supposed to

I have an assignment asking me to print x iterations of a string for each character in that string. So if the string input is "Gum", then it should print out:
Gum
Gum
Gum
Right now my code is
my $string = <>;
my $length = length($string);
print ($string x $length, "\n");
And I'm getting gum printed five times as my output.
Those who have said you will get CR + LF at the end of the line on a Windows system are mistaken. Perl will convert the native line ending to a simple newline \n on any platform.
You must bear this in mind whether you are reading from the terminal or from a file.
The built-in chomp function will remove the line terminator character from the end of a string variable. If the string doesn't end with a line terminator then it will have no effect.
So when you type GumEnter you are setting $string to "Gum\n", and length will show that it has four characters.
You are seeing it five times on your screen because the first line is what you typed in yourself. The following four are printed by the program.
After a chomp, $string is just "Gum" with a length of three characters, which is what you want.
To output this on separate lines you have to print a newline after each line, so you can write
my $string = <>;
chomp $string;
my $length = length $string;
print ("$string\n" x $length);
or perhaps
print $string, "\n" for 1 .. $length;
I hope that helps
As you are simply using the input string, it still contains the newline at the end. This is also counted as a character. On my system, it outputs 4 Gum\n.
chomp($string) will remove the line ending, but the output will then also run together, resulting in GumGumGum\n
When You insert input and press enter afterwards You don't enter "Gum" but "Gum\r\n" which is a string of length 5. You should do trimming.
Your code is working fine. See this: http://ideone.com/AsPFh3
Possibility 1: It might be that you're putting 2 spaces while giving input from command line, that's why the length comes out to be 5, and it prints 5 times. Something like this: http://ideone.com/fsvnrd
In above case the my $string=<>; will give you my $string = "gum "; so length will be 5.
Possibility 2:
Another possibility is that if you use Windows then it will add carriage return (\r) and new line (due to enter \n) at the end of string. So it makes the length 5.
Edit: To print in new line: Use the below code.
#!/usr/bin/perl
# your code goes here
chomp(my $string=<>);
my $length = length($string);
print ("$string\n" x $length);
Demo
Edit 2: To remove \r\n use the below:
$string=~ s/\r|\n//g; Read more here.

Resources