How to split string using a whole word as separator?

How to split string using a whole word as separator? - string

Is there a way to split string like:
"importanttext1 has gathered importanttext2 from stackoverflow."
I want to grab anything before the word "has", and just want to grab the one word after "gathered", so it doesn't include "from stackoverflow.". I want to be left with 2 variables that cointain importanttext1 and importanttext2

local str = "importanttext1 has gathered importanttext2 from stackoverflow."
local s1 = str:match("(.+)has")
local s2 = str:match("gathered%s+(%S+)")
print(s1)
print(s2)
Note that "%s" matches a whitespace character while "%S" matches a non-whitespace character.

Related

SAS finding an uppercase word within a string

I have a string which contains one word in uppercase somewhere within it. I want to extract that one word into a new variable using SAS.
I think I need to find a way to code up finding a word which contains two or more uppercase letters (as the start of a sentence would begin with an uppercase letter).
i.e. How do I create the variable 'word':
data example;
length txtString $50;
length word $20;
infile datalines dlm=',';
input txtString $ word $;
datalines;
This is one EXAMPLE. Of what I need.,EXAMPLE
THIS is another.,THIS
etc ETC,ETC
;
run;
Hope someone can help and the question is clear
Thanks in advance

Consider a regex match/replace with a negative lookbehind to include two types of matches:
consecutive upper case words followed by a space with at least two characters (to avoid title cases at beginning of sentence): (([A-Z ]){2,})
consecutive upper case words followed by a period with at least two characters: (to avoid title cases at beginning of sentence): (([A-Z.]){2,})
CAVEAT: This solution works except the I article is also matched which technically is a valid match as it is also an all uppercase one-word. Being the only type in English language, consider a tranwrd() replace for such a special case. In fact, relatedly, this solution matches ALL uppercase words.
data example;
length txtString $50;
length word $20;
infile datalines dlm=',';
input txtString $ word $;
datalines;
This is one EXAMPLE. Of what I need.,EXAMPLE
THIS is another.,THIS
etc ETC,ETC
;
run;
data example;
set example;
pattern_num = prxparse("s/(?!(([A-Z ]){2,})|(([A-Z.]){2,})).//");
wordextract = prxchange(pattern_num, -1, txtString);
wordextract = tranwrd(wordextract, " I ", "");
drop pattern_num;
run;
txtString word wordextract
This is one EXAMPLE. Of what I need. EXAMPLE EXAMPLE
THIS is another. THIS THIS
etc ETC ETC ETC

SAS has a prxsubstr() function call that finds the starting position and length of a substring that matches a given regex pattern within a given string. Here's a sample solution using the prxsubstr() function call:
data solution;
set example;
/* Build a regex pattern of the word to search for, and hang on to it */
/* (The regex below means: word boundary, then two or more capital letters,
then word boundary. Word boundary here means the start or the end of a string
of letters, digits and/or underscores.) */
if _N_ = 1 then pattern_num = prxparse("/\b[A-Z]{2,}\b/");
retain pattern_num;
/* Get the starting position and the length of the word to extract */
call prxsubstr(pattern_num, txtString, mypos, mylength);
/* If a word matching the regex pattern is found, extract it */
if mypos ^= 0 then word = substr(txtString, mypos, mylength);
run;
SAS prxsubstr() documentation: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002295971.htm
Regex word boundary info: http://www.regular-expressions.info/wordboundaries.html

In Swift how to obtain the "invisible" escape characters in a string variable into another variable

In Swift I can create a String variable such as this:
let s = "Hello\nMy name is Jack!"
And if I use s, the output will be:
Hello
My name is Jack!
(because the \n is a linefeed)
But what if I want to programmatically obtain the raw characters in the s variable? As in if I want to actually do something like:
let sRaw = s.raw
I made the .raw up, but something like this. So that the literal value of sRaw would be:
Hello\nMy name is Jack!
and it would literally print the string, complete with literal "\n"
Thank you!

The newline is the "raw character" contained in the string.
How exactly you formed the string (in this case from a string literal with an escape sequence in source code) is not retained (it is only available in the source code, but not preserved in the resulting program). It would look exactly the same if you read it from a file, a database, the concatenation of multiple literals, a multi-line literal, a numeric escape sequence, etc.
If you want to print newline as \n you have to convert it back (by doing text replacement) -- but again, you don't know if the string was really created from such a literal.

You can do this with escaped characters such as \n:
let secondaryString = "really"
let s = "Hello\nMy name is \(secondaryString) Jack!"
let find = Character("\n")
let r = String(s.characters.split(find).joinWithSeparator(["\\","n"]))
print(r) // -> "Hello\nMy name is really Jack!"
However, once the string s is generated the \(secondaryString) has already been interpolated to "really" and there is no trace of it other than the replaced word. I suppose if you already know the interpolated string you could search for it and replace it with "\\(secondaryString)" to get the result you want. Otherwise it's gone.

How to display each individual word of a string in MATLAB

How to display each individual word of a string in MATLAB version R2012a? The function strsplit doesn't work in this version. For example 'Hello, here I am'. I want to display every word on a single line.

Each word in a single line means replacing each blank with a new line:
strrep(s,' ',sprintf('\n'))

You can use regexp with the 'split' option:
>> str = 'Hello, here I am';
>> words = regexp(str, '\s+', 'split').'
words =
'Hello,'
'here'
'I'
'am'
Change '\s+' to a more elaborate pattern if needed.

How do I remove lines from a string begins with specific string in Lua?

How do I remove lines from a string begins with another string in Lua ? For instance i want to remove all line from string result begins with the word <Table. This is the code I've written so far:
for line in result:gmatch"<Table [^\n]*" do line = "" end

string.gmtach is used to get all occurrences of a pattern. For replacing certain pattern, you need to use string.gsub.
Another problem is your pattern <Table [^\n]* will match all line containing the word <Table, not just begins with it.
Lua pattern doesn't support beginning of line anchor, this almost works:
local str = result:gsub("\n<Table [^\n]*", "")
except that it will miss on the first line. My solution is using a second run to test the first line:
local str1 = result:gsub("\n<Table [^\n]*", "")
local str2 = str1:gsub("^<Table [^\n]*\n", "")

The LPEG library is perfect
for this kind of task.
Just write a function to create custom line strippers:
local mk_striplines
do
local lpeg = require "lpeg"
local P = lpeg.P
local Cs = lpeg.Cs
local lpegmatch = lpeg.match
local eol = P"\n\r" + P"\r\n" + P"\n" + P"\t"
local eof = P(-1)
local linerest = (1 - eol)^1 * (eol + eof) + eol
mk_striplines = function (pat)
pat = P (pat)
local matchline = pat * linerest
local striplines = Cs (((matchline / "") + linerest)^1)
return function (str)
return lpegmatch (striplines, str)
end
end
end
Note that the argument to mk_striplines() may be a string or a
pattern.
Thus the result is very flexible:
mk_striplines (P"<Table" + P"</Table>") would create a stripper
that drops lines with two different patterns.
mk_striplines (P"x" * P"y"^0) drops each line starting with an
x followed by any number of y’s -- you get the idea.
Usage example:
local linestripper = mk_striplines "foo"
local test = [[
foo lorem ipsum
bar baz
buzz
foo bar
xyzzy
]]
print (linestripper (test))

The other answers provide good solutions to actually stripping lines from a string, but don't address why your code is failing to do that.
Reformatting for clarity, you wrote:
for line in result:gmatch"<Table [^\n]*" do
line = ""
end
The first part is a reasonable way to iterate over result and extract all spans of text that begin with <Table and continue up to but not including the next newline character. The iterator returned by gmatch returns a copy of the matching text on each call, and the local variable line holds that copy for the body of the for loop.
Since the matching text is copied to line, changes made to line are not and cannot modifying the actual text stored in result.
This is due to a more fundamental property of Lua strings. All strings in Lua are immutable. Once stored, they cannot be changed. Variables holding strings are actually holding a pointer into the internal table of reference counted immutable strings, which permits only two operations: internalization of a new string, and deletion of an internalized string with no remaining references.
So any approach to editing the content of the string stored in result is going to require the creation of an entirely new string. Where string.gmatch provides an iteration over the content but cannot allow it to be changed, string.gsub provides for creation of a new string where all text matching a pattern has been replaced by something new. But even string.gsub is not changing the immutable source text; it is creating a new immutable string that is a copy of the old with substitutions made.
Using gsub could be as simple as this:
result = result:gsub("<Table [^\n]*", "")
but that will disclose other defects in the pattern itself. First, and most obviously, nothing requires that the pattern match at only the beginning of the line. Second, the pattern does not include the newline, so it will leave the line present but empty.
All of that can be refined by careful and clever use of the pattern library. But it doesn't change the fact that you are starting with XML text and are not handling it with XML aware tools. In that case, any approach based on pattern matching or even regular expressions is likely to end in tears.

result = result:gsub('%f[^\n%z]<Table [^\n]*', '')
The start of this pattern, '%f[^\n%z], is a frontier pattern which will match any transition from either a newline or zero character to another character, and for frontier patterns the pre-first character counts as a zero character. In other words, using that prefix allows the rest of the pattern to match at either the first line or any other start-of-line.
Reference: the Lua 5.3 manual, section 6.4.1 on string patterns

Convert underscores to spaces in Matlab string?

So say I have a string with some underscores like hi_there.
Is there a way to auto-convert that string into "hi there"?
(the original string, by the way, is a variable name that I'm converting into a plot title).

Surprising that no-one has yet mentioned strrep:
>> strrep('string_with_underscores', '_', ' ')
ans =
string with underscores
which should be the official way to do a simple string replacements. For such a simple case, regexprep is overkill: yes, they are Swiss-knifes that can do everything possible, but they come with a long manual. String indexing shown by AndreasH only works for replacing single characters, it cannot do this:
>> s = 'string*-*with*-*funny*-*separators';
>> strrep(s, '*-*', ' ')
ans =
string with funny separators
>> s(s=='*-*') = ' '
Error using ==
Matrix dimensions must agree.
As a bonus, it also works for cell-arrays with strings:
>> strrep({'This_is_a','cell_array_with','strings_with','underscores'},'_',' ')
ans =
'This is a' 'cell array with' 'strings with' 'underscores'

Try this Matlab code for a string variable 's'
s(s=='_') = ' ';

If you ever have to do anything more complicated, say doing a replacement of multiple variable length strings,
s(s == '_') = ' ' will be a huge pain. If your replacement needs ever get more complicated consider using regexprep:
>> regexprep({'hi_there', 'hey_there'}, '_', ' ')
ans =
'hi there' 'hey there'
That being said, in your case #AndreasH.'s solution is the most appropriate and regexprep is overkill.
A more interesting question is why you are passing variables around as strings?

regexprep() may be what you're looking for and is a handy function in general.
regexprep('hi_there','_',' ')
Will take the first argument string, and replace instances of the second argument with the third. In this case it replaces all underscores with a space.

In Matlab strings are vectors, so performing simple string manipulations can be achieved using standard operators e.g. replacing _ with whitespace.
text = 'variable_name';
text(text=='_') = ' '; //replace all occurrences of underscore with whitespace
=> text = variable name

I know this was already answered, however, in my case I was looking for a way to correct plot titles so that I could include a filename (which could have underscores). So, I wanted to print them with the underscores NOT displaying with as subscripts. So, using this great info above, and rather than a space, I escaped the subscript in the substitution.
For example:
% Have the user select a file:
[infile inpath]=uigetfile('*.txt','Get some text file');
figure
% this is a problem for filenames with underscores
title(infile)
% this correctly displays filenames with underscores
title(strrep(infile,'_','\_'))

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string