I would like to cut off unneeded additional new line characters in strings using R.
For example, if I have the string:
"This is an example test string. \n \n \n"
I would like it to look like this:
"This is an example test string. \n"
Try
x <- gsub("\\n\\s*", "\n", x)
This searches for any newline followed by whitespace and replaces it with a single newline
Related
I used string.gsub(str, "%s+") to remove spaces from a string but not remove new lines, example:
str = "string with\nnew line"
string.gsub(str, "%s+")
print(str)
and I'm expecting the output to be like:
stringwith
newline
what pattern should I use to get that result.
It seems you want to match any whitespace matched with %s but exclude a newline char from the pattern.
You can use a reverse %S pattern (that matches any non-whitespace char) in a negated character set, [^...], and add a \n there:
local str = "string with\nnew line"
str = string.gsub(str, "[^%S\n]+", "")
print(str)
See an online Lua demo yielding
stringwith
newline
"%s" matches any whitespace character. if you want to match a space use " ". If you want to define a specific number of spaces either explicitly write them down " " or use string.rep(" ", 5)
I have a string that is like so:
"string content here
"
because it is too long to fit on the screen in one line
The string is the name of a file i would like to read, but i always get an error message that the file name wasn't found because it includes the new line character in the string when this obviously isn't in the file name. I cannot rename the file and I have tried the strip function to remove it, but this doesn't work. How can I remove the enter character from my string so I can load my file?
You can use the function strip to remove any trailing whitespace from a string.
>> text = "hello" + newline; %Create test string.
>> disp(text)
hello
>> text_stripped = strip(text);
>> disp(text_stripped)
hello
>>
In the above ">>" has been included to better present the removal of the whitespace in the string.
Consider replacing the newline character with nothing using strrep. Link
As an example:
s = sprintf('abc\ndef') % Create a string s with a newline character in the middle
s = strrep(s, newline, '') % Replace newline with nothing
Alternatively, you could use regular expressions if there are several characters causing you issues.
Alternatively, you could use strip if you know the newline always occurs at the beginning or end.
I want to replace \n with a space in a String with a recursive function using pattern matching, but I can't figure out how to match the \ char.
This is my function:
replace :: String -> String
replace ('\\':'n':xs) = ' ' : replace xs
replace (x:xs) = x : replace xs
replace "" = ""
In ('\':'n':xs) the backslash would escape the single quote and mess up the code, so I wrote ('\\':'n':xs) expecting that the first \ would escape the escape of the second \ and would match a backslash in a String. However, it doesn't.
This is what happens when I try the function in GHCi:
*Example> replace "m\nop"
"m\nop"
*Example> replace "m\\nop"
"m op"
How can I match a single backslash?
\n is a single character. If we use \n in a string like "Hello\nWorld!", then the resulting list looks like this: ['H','e','l','l','o','\n','W','o','r','l','d','!']. \n denotes a newline character, a single ASCII byte 10. However, since a newline isn't really easy to type in many programming languages, the escape sequence \n is used instead in string literals.
If you want to pattern match on a newline, you must use the whole escape sequence:
replace :: String -> String
replace ('\n':xs) = ' ' : replace xs
replace (x:xs) = x : replace xs
replace "" = ""
Otherwise, you will only match the literal \.
Exercise: Now that replace works, try to use map instead of explicit recursion.
I have a large (4 GB) Windows .csv text file (each lines end in "\r\n") in a Linux environment that was supposed to have been a csv delimited file (delimiter = '|', text qualifier = '"') with each field separated by a pipe and enclosed in double quotes. Any narrative text field with embedded double quotes was supposed to have the double quote escaped with a second double quote (ie. " the quick "brown" fox" was supposed to have been represented as "the quick ""brown"" fox"). Unfortunately escaping the embedded double quotes did not occur. Further the text fields may include embedded new lines (i.e. Windows CR (\r\n)) which need to be retained.
Sample lines might look as follows:
"1234567890123456"|"2016-07-30"|"2016-08-01"|"123"|"456"|"789"|"text narrative field starts\r\n
with text lines that may have embedded double quotes "For example"\r\n
and may include measurements such as 1/2" x 2" with \r\n
the text continuing and includes embedded line breaks \r\n
which will finally be terminated with a double quote"\r\n
"9876543210654321"|"2017-01-31"|"2018-08-01"|"123"|"456"|"789"|"text narrative field"\r\n
"2345678901234567"|"...."\r\n
with the objective to have the output appear as follows:
~1234567890123456~|~2016-07-30~|~2016-08-01~|~123~|~456~|~789~|~text narrative field starts\r\n
with text lines that may have embedded double quotes ""For example""\r\n
and may include measurements such as 1/2"" x 2"" with \r\n
the text continuing and includes embedded line breaks \r\n
which will finally be terminated with a double quote~\r\n
~9876543210654321~|~2017-01-31~|~2018-08-01~|~123~|~456~|~789~|~text narrative field~\r\n
~2345678901234567~|~....~\r\n
The solution I was attempting to implement was to:
SUCCESSFUL: change all the "|" sequences to ~|~
SUCCESSFUL: change the double quote (")at the start of the first line and end of the last line to a tilde (~)
change the ending and starting double quotes to tildes for any lines ending in a double quote at the end of the first line and terminated with a CR (\r\n) (eg. ..."\r\n) and the next line begins with a double quote, followed by 16 digit number and a tilde (eg. "1234567890123456~...) (i.e. it is the start of a new record)
convert all remaining double quote characters to two successive double quotes (change " to "")
then reverse the first 3 steps above changing all ~ back to double quotes.
I started by using sed to replace all strings with double quote, followed by a pipe, followed by a double quote (i.e. "|") with a tilde, pipe, tilde (i.e. ~|~). I then manually replaced the first and last doublequote in the file with a tilde.
This is where I ran into issues as I tried to count the number of occurrences where a line ends with a doublequote(") and the start of the next line begins with a doublequote followed by a 16 digit number and a "~" which will tell me the actual number of csv records in the file (minus one) as opposed to the number of lines. I attempted to do this using grep: grep '"\r\n"\d{16}~' | wc -l but that didn't work
I then need to replace those double quotes wherein a double quote ends a record and the succeeding record begins with a double quote followed by a 16 digit number and a "~" leaving everything else intact.
I tried to use sed: sed 's/"\r\n"(\d{16}~)/~\r\n~\1' windows_file.txt but it is not working as hoped.
I would welcome any recommendations as to how to accomplish the above.
The script below does what you expect using awk, except for the very last line in the file since it does not know where that record ends.
It could be fixed counting lines in the file but would be impractical since it's a big file.
Looking at data structure records are separated by "\r\n" and fields by "|" let's use that with awk.
gawk 'BEGIN{
RS="\"\r\n\"" # input record separator RS, 2 double quotes with a DOS line ending in the middle
FS="\"\\|\"" # input field separator FS, 2 double quotes with a pipe in the middle
ORS="~\r\n~" # your record separator
OFS="~|~" # your field separator
} {
$1=$1 # trick awk into believing something has changed
if (NR == 1){ # first record, replace first character
print "~" substr($0,2)
}else{
print $0
}
} ' test.txt
Result (assuming lines end with \r\n):
~1234567890123456~|~2016-07-30~|~2016-08-01~|~123~|~456~|~789~|~text narrative field starts
with text lines that may have embedded double quotes "For example"
and may include measurements such as 1/2" x 2" with
the text continuing and includes embedded line breaks
which will finally be terminated with a double quote~
~9876543210654321~|~2017-01-31~|~2018-08-01~|~123~|~456~|~789~|~text narrative field~
~10654321~|~2018-09-31~|~2018-08-01~|~123~|~456~|~789~|~asdasdasdasdad asasda"
~
~
PS: will break if a field contains a line that starts with " and the preceding line within the same ends with "\r\n since the pattern will match the proposed RS.
"10654321"|"2018-09-31"|"2018-08-01"|"123"|"456"|"789"|"asdasdasdasdad asasda"\r\n
"some more"\r\n
"22222"|".... (another record)
I've hit s small block with string parsing. I have a string like:
footage/down/temp/cars_[100]_upper/cars_[100]_upper.exr
and I'm having difficulty using gsub to delete a portion of the string. Normally I would do this
lineA = footage/down/temp/cars_[100]_upper/cars_[100]_upper.exr
lineB = footage/down/temp/cars_[100]_upper/
newline = lineA:gsub(lineB, "")
which would normally give me 'cars_[100]_upper.exr'
The problem is that gsub doesn't like the [] or other special characters in the string and unlike string.find gsub doesn't have the option of using the 'plain' flag to cancel pattern searching.
I am not able to manually edit the lines to include escape characters for the special characters as I'm doing file a file comparison script.
Any help to get from lineA to newline using lineB would be most appreciated.
Taking from page 181 of Programming in Lua 2e:
The magic characters are:
( ) . % + - * ? [ ] ^ $
The character '%' works as an escape
for these magic characters.
So, we can just come up with a simple function to escape these magic characters, and apply it to your input string (lineB):
function literalize(str)
return str:gsub("[%(%)%.%%%+%-%*%?%[%]%^%$]", function(c) return "%" .. c end)
end
lineA = "footage/down/temp/cars_[100]_upper/cars_[100]_upper.exr"
lineB = literalize("footage/down/temp/cars_[100]_upper/")
newline = lineA:gsub(lineB, "")
print(newline)
Which of course prints: cars_[100]_upper.exr.
You may use another approach like:
local i1, i2 = lineA:find(lineB, nil, true)
local result = lineA:sub(i2 + 1)
You can also escape punctuation in a text string, str, using:
str:gsub ("%p", "%%%0")