Why Split behaves differently on different Strings? - string

Here are the two cases :
Case 1:
scala> "".split('f')
res3: Array[String] = Array("")
Case 2:
scala> "f".split('f')
res5: Array[String] = Array()
Why does it behaves diffently here ! A concrete explanation would be great !

In first case you provide a string and a separator that doesn't match any of characters in that string. So it just returns the original string. This can be illustrated with non-empty string example:
scala> "abcd".split('f')
res2: Array[String] = Array(abcd)
However your second string contains only separator. So it matches the separator and splits the string. Since splits contain nothing - it returns an empty array. According to Java String docs:
If expression doesn't match:
If the expression does not match any part of the input then the resulting array has just one element, namely this string.
If expression matches:
Trailing empty strings are therefore not included in the resulting array.
Source: http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String,%20int)

If you will look in implementation of split you will notice that it checks for index of delimiter inside String, and if delimiter don't occur in given String it will result with String itself.

Related

Idiomatic way to split string in Groovy

Is there a nicer/shorter/better way of performing the following:
filename = "AA_BB_CC_DD_EE_FF.xyz"
parts = filename.split("_")
packageName = "${parts[0]}_${parts[1]}_${parts[2]}_${parts[3]}"
//packageName == "AA_BB_CC_DD"
The format remains constant (6 parts, _ separator) but some of the values and lengths of AA,BB are variable.
You can do the same thing by just programming the "joining" part differently:
The following result in the same thing as packageName:
filename.split('_')[0..3].join('_')
It just uses a range to slice the array, and .join to concatenate with a delimiter.
As the separator char between the "segments" in the source filename and in the
result is the same (_), you don't need to split the filename and join the parts again.
Your task can be done with a single regex:
def result = filename.find(/([A-Z0-9]+_){3}[A-Z0-9]+/)

How to split a string in scala?

I have a below string which I want to parse in Scala.
word, {"..Json Structure..."}
In python I can split the string giving (", {") as an argument.However, Scala is not accepting space as an argument.
Can you guys please help me with the query?
Scala string split method uses regular expression, { is a special character in regular expression which is used for quantifying matched patterns. If you want to treat it as literal, you need to escape the character with , \\{:
val s = """word, {"..Json Structure..."}"""
// s: String = word, {"..Json Structure..."}
s.split(", \\{")
// res32: Array[String] = Array(word, "..Json Structure..."})
Or:
s.split(""", \{""")
// res33: Array[String] = Array(word, "..Json Structure..."})

split string by char

scala has a standard way of splitting a string in StringOps.split
it's behaviour somewhat surprised me though.
To demonstrate, using the quick convenience function
def sp(str: String) = str.split('.').toList
the following expressions all evaluate to true
(sp("") == List("")) //expected
(sp(".") == List()) //I would have expected List("", "")
(sp("a.b") == List("a", "b")) //expected
(sp(".b") == List("", "b")) //expected
(sp("a.") == List("a")) //I would have expected List("a", "")
(sp("..") == List()) // I would have expected List("", "", "")
(sp(".a.") == List("", "a")) // I would have expected List("", "a", "")
so I expected that split would return an array with (the number a separator occurrences) + 1 elements, but that's clearly not the case.
It is almost the above, but remove all trailing empty strings, but that's not true for splitting the empty string.
I'm failing to identify the pattern here. What rules does StringOps.split follow?
For bonus points, is there a good way (without too much copying/string appending) to get the split I'm expecting?
For curious you can find the code here.https://github.com/scala/scala/blob/v2.12.0-M1/src/library/scala/collection/immutable/StringLike.scala
See the split function with the character as an argument(line 206).
I think, the general pattern going on over here is, all the trailing empty splits results are getting ignored.
Except for the first one, for which "if no separator char is found then just send the whole string" logic is getting applied.
I am trying to find if there is any design documentation around these.
Also, if you use string instead of char for separator it will fall back to java regex split. As mentioned by #LRLucena, if you provide the limit parameter with a value more than size, you will get your trailing empty results. see http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String,%20int)
You can use split with a regular expression. I´m not sure, but I guess that the second parameter is the largest size of the resulting array.
def sp(str: String) = str.split("\\.", str.length+1).toList
Seems to be consistent with these three rules:
1) Trailing empty substrings are dropped.
2) An empty substring is considered trailing before it is considered leading, if applicable.
3) First case, with no separators is an exception.
split follows the behaviour of http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String)
That is split "around" the separator character, with the following exceptions:
Regardless of anything else, splitting the empty string will always give Array("")
Any trailing empty substrings are removed
Surrogate characters only match if the matched character is not part of a surrogate pair.

What's the difference between raw string interpolation and triple quotes in scala

Scala has triple quoted strings """String\nString""" to use special characters in the string without escaping. Scala 2.10 also added raw"String\nString" for the same purpose.
Is there any difference in how raw"" and """""" work? Can they produce different output for the same string?
Looking at the source for the default interpolators (found here: https://github.com/scala/scala/blob/2.11.x/src/library/scala/StringContext.scala) it looks like the "raw" interpolator calls the identity function on each letter, so what you put in is what you get out. The biggest difference that you will find is that if you are providing a string literal in your source that includes the quote character, the raw interpolator still won't work. i.e. you can't say
raw"this whole "thing" should be one string object"
but you can say
"""this whole "thing" should be one string object"""
So you might be wondering "Why would I ever bother using the raw interpolator then?" and the answer is that the raw interpolator still performs variable substitution. So
val helloVar = "hello"
val helloWorldString = raw"""$helloVar, "World"!\n"""
Will give you the string "hello, "World"!\n" with the \n not being converted to a newline, and the quotes around the word world.
It is surprising that using the s-interpolator turns escapes back on, even when using triple quotes:
scala> "hi\nthere."
res5: String =
hi
there.
scala> """hi\nthere."""
res6: String = hi\nthere.
scala> s"""hi\nthere."""
res7: String =
hi
there.
The s-interpolator doesn't know that it's processing string parts that were originally triple-quoted. Hence:
scala> raw"""hi\nthere."""
res8: String = hi\nthere.
This matters when you're using backslashes in other ways, such as regexes:
scala> val n = """\d"""
n: String = \d
scala> s"$n".r
res9: scala.util.matching.Regex = \d
scala> s"\d".r
scala.StringContext$InvalidEscapeException: invalid escape character at index 0 in "\d"
at scala.StringContext$.loop$1(StringContext.scala:231)
at scala.StringContext$.replace$1(StringContext.scala:241)
at scala.StringContext$.treatEscapes0(StringContext.scala:245)
at scala.StringContext$.treatEscapes(StringContext.scala:190)
at scala.StringContext$$anonfun$s$1.apply(StringContext.scala:94)
at scala.StringContext$$anonfun$s$1.apply(StringContext.scala:94)
at scala.StringContext.standardInterpolator(StringContext.scala:124)
at scala.StringContext.s(StringContext.scala:94)
... 33 elided
scala> s"""\d""".r
scala.StringContext$InvalidEscapeException: invalid escape character at index 0 in "\d"
at scala.StringContext$.loop$1(StringContext.scala:231)
at scala.StringContext$.replace$1(StringContext.scala:241)
at scala.StringContext$.treatEscapes0(StringContext.scala:245)
at scala.StringContext$.treatEscapes(StringContext.scala:190)
at scala.StringContext$$anonfun$s$1.apply(StringContext.scala:94)
at scala.StringContext$$anonfun$s$1.apply(StringContext.scala:94)
at scala.StringContext.standardInterpolator(StringContext.scala:124)
at scala.StringContext.s(StringContext.scala:94)
... 33 elided
scala> raw"""\d$n""".r
res12: scala.util.matching.Regex = \d\d

How to extract substring in Groovy?

I have a Groovy method that currently works but is real ugly/hacky looking:
def parseId(String str) {
System.out.println("str: " + str)
int index = href.indexOf("repositoryId")
System.out.println("index: " + index)
int repoIndex = index + 13
System.out.println("repoIndex" + repoIndex)
String repoId = href.substring(repoIndex)
System.out.println("repoId is: " + repoId)
}
When this runs, you might get output like:
str: wsodk3oke30d30kdl4kof94j93jr94f3kd03k043k?planKey=si23j383&repositoryId=31850514
index: 59
repoIndex: 72
repoId is: 31850514
As you can see, I'm simply interested in obtaining the repositoryId value (everything after the = operator) out of the String. Is there a more efficient/Groovier way of doing this or this the only way?
There are a lot of ways to achieve what you want. I'll suggest a simple one using split:
sub = { it.split("repositoryId=")[1] }
str='wsodk3oke30d30kdl4kof94j93jr94f3kd03k043k?planKey=si23j383&repositoryId=31850514'
assert sub(str) == '31850514'
Using a regular expression you could do
def repositoryId = (str =~ "repositoryId=(.*)")[0][1]
The =~ is a regex matcher
or a shortcut regexp - if you are looking only for single match:
String repoId = str.replaceFirst( /.*&repositoryId=(\w+).*/, '$1' )
All the answers here contains regular expressions, however there are a bunch of string methods in Groovy.
String Function
Sample
Description
contains
myStringVar.contains(substring)
Returns true if and only if this string contains the specified sequence of char values
equals
myStringVar.equals(substring)
This is similar to the above but has to be an exact match for the check to return a true value
endsWith
myStringVar.endsWith(suffix)
This method checks the new value contains an ending string
startsWith
myStringVar.startsWith(prefix)
This method checks the new value contains an starting string
equalsIgnoreCase
myStringVar.equalsIgnoreCase(substring)
The same as equals but without case sensitivity
isEmpty
myStringVar.isEmpty()
Checks if myStringVar is populated or not.
matches
myStringVar.matches(substring)
This is the same as equals with the slight difference being that matches takes a regular string as a parameter unlike equals which takes another String object
replace
myStringVar.replace(old,new)
Returns a string resulting from replacing all occurrences of oldChar in this string with newChar
replaceAll
myStringVar.replaceAll(old_regex,new)
Replaces each substring of this string that matches the given regular expression with the given replacement
split
myStringVar.split(regex)
Splits this string around matches of the given regular expression
Source

Resources