Particular string split in R

Particular string split in R - string

I'd like to split a text string in R but I want to take some aspects into consideration. For instance, if the string has a dot . or a !, I want my function to take them as elements of my split list. Below an example of what I want to get.
mytext="Caracas. Montevideo! Chicago."
split= "Caracas", "." ,"Montevideo", "!", "Chicago", "."
My current approach consists in replacing previously with the built-in R function gsub the "." by " . " and then I use strsplit function as well.
mytext=gsub("\\."," .",mytext)
mytext=gsub("\\!"," !",mytext)
unlist(strsplit(mytext,split=' '))
So, my question is: is there another way of implementing this by configuring the parameters for the strsplit function or another approach you coonsider could be more efficient.
Any help or suggestion is appreciated.

Look-ahead is what you're looking for here:
strsplit(mytext, split = "(?=(\\.|!))", perl = TRUE)
#[[1]]
#[1] "Caracas" "." " Montevideo" "!" " Chicago" "."

eddi's solution doesn't split the whitespaces. Try this:
> regmatches(mytext, gregexpr(text=mytext, pattern="(?=[\\.\\!])|(?:\\s)", perl=T), invert=T)
[[1]]
[1] "Caracas" "." "Montevideo" "!" "Chicago" "."

Related

VB.NET Compare two string for full or partial match

How do you programmatically Compare two string for full or partial match.
I may have a string called "ItemName" like "003.00.112.0" and a string I am trying to compare it to the string named "ItemNameToFind" like "001...**.*" where " * " is meant to be an unknown blank spot.
What I have figured out so far is that it's some sort of an if statement that needs to correctly compare the strings.
If ItemName = ItemNameToFind Then
MsgBox("Item " & ItemName & " was found based on " & ItemNameToFind)
End If
In the example above it would return that it's not a match because of the first three symbols in the string.
Could, someone, please, help with the code to make that happen correctly as I explained?

In VB.NET you can use the Like operator:
Dim ItemName = "001.00.112.0"
Dim ItemNameToFind = "001.??.???.?"
If ItemName Like ItemNameToFind Then
Console.Write("Item " & ItemName & " was found based on " & ItemNameToFind)
Else
Console.Write("not found")
End If
Note that i have replaced your * with ? since that means "Any single character".
? Any single character
* Zero or more characters
# Any single digit (0–9)
[charlist] Any single character in charlist
[!charlist] Any single character not in charlist

Delimiters in split function Julia

I got some issues trying to split a sentence in single substrings. Using the split function, I can't manage to use more than a delimiter. The code is:
sentence = "If you're visiting this page, you're likely here because you're searching for a random sentence"
split(sentence, "," , " " , ";")
I get this error:
LoadError: MethodError: no method matching split(::String, ::String, ::String, ::String).
I would like to get an array of single words.

Provide a vector of Chars as the split argument (in Julia quotation " is used for String and apostrophe ' for Char):
julia> split(sentence,[' ',',',';'])
16-element Vector{SubString{String}}:
"If"
"you're"
"visiting"
"this"
"page"
""
"you're"
"likely"
"here"
"because"
"you're"
"searching"
"for"
"a"
"random"
"sentence"

Convert underscores to spaces in Matlab string?

So say I have a string with some underscores like hi_there.
Is there a way to auto-convert that string into "hi there"?
(the original string, by the way, is a variable name that I'm converting into a plot title).

Surprising that no-one has yet mentioned strrep:
>> strrep('string_with_underscores', '_', ' ')
ans =
string with underscores
which should be the official way to do a simple string replacements. For such a simple case, regexprep is overkill: yes, they are Swiss-knifes that can do everything possible, but they come with a long manual. String indexing shown by AndreasH only works for replacing single characters, it cannot do this:
>> s = 'string*-*with*-*funny*-*separators';
>> strrep(s, '*-*', ' ')
ans =
string with funny separators
>> s(s=='*-*') = ' '
Error using ==
Matrix dimensions must agree.
As a bonus, it also works for cell-arrays with strings:
>> strrep({'This_is_a','cell_array_with','strings_with','underscores'},'_',' ')
ans =
'This is a' 'cell array with' 'strings with' 'underscores'

Try this Matlab code for a string variable 's'
s(s=='_') = ' ';

If you ever have to do anything more complicated, say doing a replacement of multiple variable length strings,
s(s == '_') = ' ' will be a huge pain. If your replacement needs ever get more complicated consider using regexprep:
>> regexprep({'hi_there', 'hey_there'}, '_', ' ')
ans =
'hi there' 'hey there'
That being said, in your case #AndreasH.'s solution is the most appropriate and regexprep is overkill.
A more interesting question is why you are passing variables around as strings?

regexprep() may be what you're looking for and is a handy function in general.
regexprep('hi_there','_',' ')
Will take the first argument string, and replace instances of the second argument with the third. In this case it replaces all underscores with a space.

In Matlab strings are vectors, so performing simple string manipulations can be achieved using standard operators e.g. replacing _ with whitespace.
text = 'variable_name';
text(text=='_') = ' '; //replace all occurrences of underscore with whitespace
=> text = variable name

I know this was already answered, however, in my case I was looking for a way to correct plot titles so that I could include a filename (which could have underscores). So, I wanted to print them with the underscores NOT displaying with as subscripts. So, using this great info above, and rather than a space, I escaped the subscript in the substitution.
For example:
% Have the user select a file:
[infile inpath]=uigetfile('*.txt','Get some text file');
figure
% this is a problem for filenames with underscores
title(infile)
% this correctly displays filenames with underscores
title(strrep(infile,'_','\_'))

String concatenation with spaces

I would like to concatenate strings. I tried using strcat:
x = 5;
m = strcat('is', num2str(x))
but this function removes trailing white-space characters from each string. Is there another MATLAB function to perform string concatenation which maintains trailing white-space?

You can use horzcat instead of strcat:
>> strcat('one ','two')
ans =
onetwo
>> horzcat('one ','two')
ans =
one two
Alternatively, if you're going to be substituting numbers into strings, it might be better to use sprintf:
>> x = 5;
>> sprintf('is %d',x)
ans =
is 5

How about
strcat({' is '},{num2str(5)})
that gives
' is 5'

Have a look at the final example on the strcat documentation: try using horizontal array concatination instead of strcat:
m = ['is ', num2str(x)]
Also, have a look at sprintf for more information on string formatting (leading/trailing spaces etc.).

How about using strjoin ?
x = 5;
m ={'is', num2str(x)};
strjoin(m, ' ')

What spaces does this not take into account ? Only the spaces you haven't mentioned ! Did you mean:
m = strcat( ' is ',num2str(x) )
perhaps ?
Matlab isn't going to guess (a) that you want spaces or (b) where to put the spaces it guesses you want.

What's wrong with this line?

I am trying to remove all the " from a string called s1, I have this line
s1=replace (s1, """, "")
But I get a compile error saying it is expecting a list separator or )
How can I fix it?
Thanks in advance.

Your second string isn’t properly delimited. If you want to use a quotation mark (") inside your string, you need to double it. Since your string only consists of a quotation mark, it looks as follows:
Quotation mark to start the string, ".
Double quotation mark that represents a single quotation mark inside the string, "".
Ending quotation mark, ".
In summary:
s1 = Replace(s1, """", "")

Konrad's suggestion is the one you should go with, but here's another for completeness/amusement.
s1 = Replace(s1, Chr(34), "")
And if you ever get bored at a party and need something to read on your phone, here's the list of the 256 such ASCII codes you can use with Chr().
http://msdn.microsoft.com/en-us/library/4z4t9ed1%28v=VS.80%29.aspx

Did you really write """? You have to escape the " in the middle - just double it like:
replace( s1, """", "" )

Commonly used syntaxes are:
s1=replace (s1, "\"", "")
s1=replace (s1, """", "")
s1=replace (s1, '"', "")

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Particular string split in R - string

Look-ahead is what you're looking for here: strsplit(mytext, split = "(?=(\\.|!))", perl = TRUE) #[[1]] #[1] "Caracas" "." " Montevideo" "!" " Chicago" "."

eddi's solution doesn't split the whitespaces. Try this: > regmatches(mytext, gregexpr(text=mytext, pattern="(?=[\\.\\!])|(?:\\s)", perl=T), invert=T) [[1]] [1] "Caracas" "." "Montevideo" "!" "Chicago" "."

Related

VB.NET Compare two string for full or partial match

Delimiters in split function Julia

Convert underscores to spaces in Matlab string?

String concatenation with spaces

What's wrong with this line?

Categories

Resources