Delimiters in split function Julia - string

I got some issues trying to split a sentence in single substrings. Using the split function, I can't manage to use more than a delimiter. The code is:
sentence = "If you're visiting this page, you're likely here because you're searching for a random sentence"
split(sentence, "," , " " , ";")
I get this error:
LoadError: MethodError: no method matching split(::String, ::String, ::String, ::String).
I would like to get an array of single words.

Provide a vector of Chars as the split argument (in Julia quotation " is used for String and apostrophe ' for Char):
julia> split(sentence,[' ',',',';'])
16-element Vector{SubString{String}}:
"If"
"you're"
"visiting"
"this"
"page"
""
"you're"
"likely"
"here"
"because"
"you're"
"searching"
"for"
"a"
"random"
"sentence"

Related

Replace atom in array of strings

Suppose I have an array of strings "31,793.1" "29,798.6" "30,455.7" "29,700.9"
How do I replace , with nothing to give me "31793.1" "29798.6" "30455.7" "29,700.9"
Another example is to replace - in "-5" "-3" "-4" with _ to give "_5" "_3" "_4" .
"31,793.1" "29,798.6" "30,455.7" "29,700.9" would not be an "array of strings" in J. I will suppose that you have a line like this and you want to end up with an array of numbers:
data =: '"31,793.1" "29,798.6" "30,455.7" "29,700.9" "-5"'
NB. Convert commas to "null" and '-'s to '_'s
NB. rplc works in pairs 'old';'new'
data rplc ',';'';'-';'_'
"31793.1" "29798.6" "30455.7" "29700.9" "_5"
NB. remove '"'s
data rplc '"';''
31793.1 29798.6 30455.7 29700.9 _5
Normally now you would have to split on whitespace (there are many ways to do this) but converting to numbers using ". takes care of this here:
NB. ". data
31793.1 29798.6 30455.7 29700.9 _5
+/ data
121743

Excel: Find and Replace without also grabbing the beginning of another word

I'm currently working on shortening a large excel sheet using Find/Replace. I'm finding all instances of words like ", Inc.", ", Co." " LLC", etc. and replacing them with nothing (aka removing them). The problem I am having is that I'm unable to do similar searches for " Inc", ", Inc", ", Co", etc. and remove them because it will also remove them the beginnings of words like ", Inc"orporated, and ", Co"mpany.
Is there a blank character or something I can do in VBA that would allow me to just find/replace items with nothing after what I'm finding (I.e. finding ", Co" without also catching ", Co"rporated)?
In VBA you can use Regular Expressions to ensure that there are "word boundaries" before and after the abbreviation you are trying to remove. You can also remove extraneous spaces that might appear, depending on the original string.
Function remAbbrevs(S As String, ParamArray abbrevs()) As String
Dim RE As Object
Dim sPat As String
sPat = "\s*\b(?:" & Join(abbrevs, "|") & ")\b\.?"
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.ignorecase = False
.Pattern = sPat
remAbbrevs = .Replace(S, "")
End With
End Function
For arguments to this function you can enter a series of abbreviations. The function creates an appropriate regex to use.
For example in the below, I entered:
=remAbbrevs(A1,"Inc","Corp")
and filled down:
Explanation of the regex:
remAbbrevs
\s*\b(?:Inc|Corp)\b\.?
Options: Case sensitive
Match a single character that is a “whitespace character” \s*
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
Assert position at a word boundary \b
Match the regular expression below (?:Inc|Corp)
Match this alternative Inc
Match the character string “Inc” literally Inc
Or match this alternative Corp
Match the character string “Corp” literally Corp
Assert position at a word boundary \b
Match the character “.” literally \.?
Between zero and one times, as many times as possible, giving back as needed (greedy) ?
Created with RegexBuddy

java String.format - how to put a space between two characters

I am searching for a way to use a formatter to put a space between two characters. i thought it would be easy with a string formatter.
here is what i am trying to accomplish:
given: "AB" it will produce "A B"
Here is what i have tried so far:
"AB".format("%#s")
but this keep returning "AB" i want "A B". i thought the number sign could be used for space.
i also tried this:
"26".format("%#d") but its still prints "26"
is there anyway to do this with string.formatter.
It is kind of possible with the string formatter although not directly with a pattern.
jshell> String.format("%1$c %2$c", "AB".chars().boxed().toArray())
$10 ==> "A B"
We need to turn the string into an object array so it can be passed in as varargs and the formatter pattern can extract characters based on index (1$ and 2$) and format them as characters (c).
A much simpler regex solution is the following which scales to any number of characters:
jshell> "ABC^&*123".replaceAll(".", "$0 ").trim()
$3 ==> "A B C ^ & * 1 2 3"
All single characters are replaced with them-self ($0) followed by a space. Then the last extra space is removed with the trim() call.
I could not find way to do this using String#format. But here is a way to accomplish this using regex replacement:
String input = "AB";
String output = input.replaceAll("(?<=[A-Z])(?=[A-Z])", " ");
System.out.println(output);
The regex pattern (?<=[A-Z])(?=[A-Z]) will match every position in between two capital letters, and interpolate a space at that point. The above script prints:
A B

Converting long string to individual words in Red/Rebol

How can a string with a sentence be converted to a series of words, e.g. convert following string to:
str: "This is a sentence with some words"
to a series of:
["This" "is" "a" "sentence" "with" "some" "words"]
There seems to be a split function in Rebol3 but no such function in Rebol2.
I tried following code with parse but it does not work:
str: "This is a sentence with some words"
strlist: []
parse str [
some got: " " (append strlist got) ]
Error is:
** Script Error: Invalid argument: got
How can this be achieved (a method with parse will be preferable)?
In Rebol 2, this would be:
str: "This is a sentence with some words"
parse str none
resulting in:
["This" "is" "a" "sentence" "with" "some" "words"]
As mentioned in the comments on your post, the documentation. Parse has two modes, one of which is string splitting.
Rebol 3, split will work.
It will be
split str " "
Where split is function. First argument is your string, and second — delimiter.

Particular string split in R

I'd like to split a text string in R but I want to take some aspects into consideration. For instance, if the string has a dot . or a !, I want my function to take them as elements of my split list. Below an example of what I want to get.
mytext="Caracas. Montevideo! Chicago."
split= "Caracas", "." ,"Montevideo", "!", "Chicago", "."
My current approach consists in replacing previously with the built-in R function gsub the "." by " . " and then I use strsplit function as well.
mytext=gsub("\\."," .",mytext)
mytext=gsub("\\!"," !",mytext)
unlist(strsplit(mytext,split=' '))
So, my question is: is there another way of implementing this by configuring the parameters for the strsplit function or another approach you coonsider could be more efficient.
Any help or suggestion is appreciated.
Look-ahead is what you're looking for here:
strsplit(mytext, split = "(?=(\\.|!))", perl = TRUE)
#[[1]]
#[1] "Caracas" "." " Montevideo" "!" " Chicago" "."
eddi's solution doesn't split the whitespaces. Try this:
> regmatches(mytext, gregexpr(text=mytext, pattern="(?=[\\.\\!])|(?:\\s)", perl=T), invert=T)
[[1]]
[1] "Caracas" "." "Montevideo" "!" "Chicago" "."

Resources