I'd like to split a text string in R but I want to take some aspects into consideration. For instance, if the string has a dot . or a !, I want my function to take them as elements of my split list. Below an example of what I want to get.
mytext="Caracas. Montevideo! Chicago."
split= "Caracas", "." ,"Montevideo", "!", "Chicago", "."
My current approach consists in replacing previously with the built-in R function gsub the "." by " . " and then I use strsplit function as well.
mytext=gsub("\\."," .",mytext)
mytext=gsub("\\!"," !",mytext)
unlist(strsplit(mytext,split=' '))
So, my question is: is there another way of implementing this by configuring the parameters for the strsplit function or another approach you coonsider could be more efficient.
Any help or suggestion is appreciated.
Look-ahead is what you're looking for here:
strsplit(mytext, split = "(?=(\\.|!))", perl = TRUE)
#[[1]]
#[1] "Caracas" "." " Montevideo" "!" " Chicago" "."
eddi's solution doesn't split the whitespaces. Try this:
> regmatches(mytext, gregexpr(text=mytext, pattern="(?=[\\.\\!])|(?:\\s)", perl=T), invert=T)
[[1]]
[1] "Caracas" "." "Montevideo" "!" "Chicago" "."
Related
How do you programmatically Compare two string for full or partial match.
I may have a string called "ItemName" like "003.00.112.0" and a string I am trying to compare it to the string named "ItemNameToFind" like "001...**.*" where " * " is meant to be an unknown blank spot.
What I have figured out so far is that it's some sort of an if statement that needs to correctly compare the strings.
If ItemName = ItemNameToFind Then
MsgBox("Item " & ItemName & " was found based on " & ItemNameToFind)
End If
In the example above it would return that it's not a match because of the first three symbols in the string.
Could, someone, please, help with the code to make that happen correctly as I explained?
In VB.NET you can use the Like operator:
Dim ItemName = "001.00.112.0"
Dim ItemNameToFind = "001.??.???.?"
If ItemName Like ItemNameToFind Then
Console.Write("Item " & ItemName & " was found based on " & ItemNameToFind)
Else
Console.Write("not found")
End If
Note that i have replaced your * with ? since that means "Any single character".
? Any single character
* Zero or more characters
# Any single digit (0–9)
[charlist] Any single character in charlist
[!charlist] Any single character not in charlist
I got some issues trying to split a sentence in single substrings. Using the split function, I can't manage to use more than a delimiter. The code is:
sentence = "If you're visiting this page, you're likely here because you're searching for a random sentence"
split(sentence, "," , " " , ";")
I get this error:
LoadError: MethodError: no method matching split(::String, ::String, ::String, ::String).
I would like to get an array of single words.
Provide a vector of Chars as the split argument (in Julia quotation " is used for String and apostrophe ' for Char):
julia> split(sentence,[' ',',',';'])
16-element Vector{SubString{String}}:
"If"
"you're"
"visiting"
"this"
"page"
""
"you're"
"likely"
"here"
"because"
"you're"
"searching"
"for"
"a"
"random"
"sentence"
So say I have a string with some underscores like hi_there.
Is there a way to auto-convert that string into "hi there"?
(the original string, by the way, is a variable name that I'm converting into a plot title).
Surprising that no-one has yet mentioned strrep:
>> strrep('string_with_underscores', '_', ' ')
ans =
string with underscores
which should be the official way to do a simple string replacements. For such a simple case, regexprep is overkill: yes, they are Swiss-knifes that can do everything possible, but they come with a long manual. String indexing shown by AndreasH only works for replacing single characters, it cannot do this:
>> s = 'string*-*with*-*funny*-*separators';
>> strrep(s, '*-*', ' ')
ans =
string with funny separators
>> s(s=='*-*') = ' '
Error using ==
Matrix dimensions must agree.
As a bonus, it also works for cell-arrays with strings:
>> strrep({'This_is_a','cell_array_with','strings_with','underscores'},'_',' ')
ans =
'This is a' 'cell array with' 'strings with' 'underscores'
Try this Matlab code for a string variable 's'
s(s=='_') = ' ';
If you ever have to do anything more complicated, say doing a replacement of multiple variable length strings,
s(s == '_') = ' ' will be a huge pain. If your replacement needs ever get more complicated consider using regexprep:
>> regexprep({'hi_there', 'hey_there'}, '_', ' ')
ans =
'hi there' 'hey there'
That being said, in your case #AndreasH.'s solution is the most appropriate and regexprep is overkill.
A more interesting question is why you are passing variables around as strings?
regexprep() may be what you're looking for and is a handy function in general.
regexprep('hi_there','_',' ')
Will take the first argument string, and replace instances of the second argument with the third. In this case it replaces all underscores with a space.
In Matlab strings are vectors, so performing simple string manipulations can be achieved using standard operators e.g. replacing _ with whitespace.
text = 'variable_name';
text(text=='_') = ' '; //replace all occurrences of underscore with whitespace
=> text = variable name
I know this was already answered, however, in my case I was looking for a way to correct plot titles so that I could include a filename (which could have underscores). So, I wanted to print them with the underscores NOT displaying with as subscripts. So, using this great info above, and rather than a space, I escaped the subscript in the substitution.
For example:
% Have the user select a file:
[infile inpath]=uigetfile('*.txt','Get some text file');
figure
% this is a problem for filenames with underscores
title(infile)
% this correctly displays filenames with underscores
title(strrep(infile,'_','\_'))
I would like to concatenate strings. I tried using strcat:
x = 5;
m = strcat('is', num2str(x))
but this function removes trailing white-space characters from each string. Is there another MATLAB function to perform string concatenation which maintains trailing white-space?
You can use horzcat instead of strcat:
>> strcat('one ','two')
ans =
onetwo
>> horzcat('one ','two')
ans =
one two
Alternatively, if you're going to be substituting numbers into strings, it might be better to use sprintf:
>> x = 5;
>> sprintf('is %d',x)
ans =
is 5
How about
strcat({' is '},{num2str(5)})
that gives
' is 5'
Have a look at the final example on the strcat documentation: try using horizontal array concatination instead of strcat:
m = ['is ', num2str(x)]
Also, have a look at sprintf for more information on string formatting (leading/trailing spaces etc.).
How about using strjoin ?
x = 5;
m ={'is', num2str(x)};
strjoin(m, ' ')
What spaces does this not take into account ? Only the spaces you haven't mentioned ! Did you mean:
m = strcat( ' is ',num2str(x) )
perhaps ?
Matlab isn't going to guess (a) that you want spaces or (b) where to put the spaces it guesses you want.
I am trying to remove all the " from a string called s1, I have this line
s1=replace (s1, """, "")
But I get a compile error saying it is expecting a list separator or )
How can I fix it?
Thanks in advance.
Your second string isn’t properly delimited. If you want to use a quotation mark (") inside your string, you need to double it. Since your string only consists of a quotation mark, it looks as follows:
Quotation mark to start the string, ".
Double quotation mark that represents a single quotation mark inside the string, "".
Ending quotation mark, ".
In summary:
s1 = Replace(s1, """", "")
Konrad's suggestion is the one you should go with, but here's another for completeness/amusement.
s1 = Replace(s1, Chr(34), "")
And if you ever get bored at a party and need something to read on your phone, here's the list of the 256 such ASCII codes you can use with Chr().
http://msdn.microsoft.com/en-us/library/4z4t9ed1%28v=VS.80%29.aspx
Did you really write """? You have to escape the " in the middle - just double it like:
replace( s1, """", "" )
Commonly used syntaxes are:
s1=replace (s1, "\"", "")
s1=replace (s1, """", "")
s1=replace (s1, '"', "")