How to split a string in scala? - string

I have a below string which I want to parse in Scala.
word, {"..Json Structure..."}
In python I can split the string giving (", {") as an argument.However, Scala is not accepting space as an argument.
Can you guys please help me with the query?

Scala string split method uses regular expression, { is a special character in regular expression which is used for quantifying matched patterns. If you want to treat it as literal, you need to escape the character with , \\{:
val s = """word, {"..Json Structure..."}"""
// s: String = word, {"..Json Structure..."}
s.split(", \\{")
// res32: Array[String] = Array(word, "..Json Structure..."})
Or:
s.split(""", \{""")
// res33: Array[String] = Array(word, "..Json Structure..."})

Related

Kotlin String.split, ignore when delimiter is inside a quote

I have a string:
Hi there, "Bananas are, by nature, evil.", Hey there.
I want to split the string with commas as the delimiter. How do I get the .split method to ignore the comma inside the quotes, so that it returns 3 strings and not 5.
You can use regex in split method
According to this answer the following regex only matches , outside of the " mark
,(?=(?:[^\"]\"[^\"]\")[^\"]$)
so try this code:
str.split(",(?=(?:[^\\\"]*\\\"[^\\\"]*\\\")*[^\\\"]*\$)".toRegex())
You can use split overload that accepts regular expressions for that:
val text = """Hi there, "Bananas are, by nature, evil.", Hey there."""
val matchCommaNotInQuotes = Regex("""\,(?=([^"]*"[^"]*")*[^"]*$)""")
println(text.split(matchCommaNotInQuotes))
Would print:
[Hi there, "Bananas are, by nature, evil.", Hey there.]
Consider reading this answer on how the regular expression works in this case.
You have to use a regular expression capable of handling quoted values. See Java: splitting a comma-separated string but ignoring commas in quotes and C#, regular expressions : how to parse comma-separated values, where some values might be quoted strings themselves containing commas
The following code shows a very simple version of such a regular expression.
fun main(args: Array<String>) {
"Hi there, \"Bananas are, by nature, evil.\", Hey there."
.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)".toRegex())
.forEach { println("> $it") }
}
outputs
> Hi there
> "Bananas are, by nature, evil."
> Hey there.
Be aware of the regex backtracking problem: https://www.regular-expressions.info/catastrophic.html. You might be better off writing a parser.
If you don't want regular expressions:
val s = "Hi there, \"Bananas are, by nature, evil.\", Hey there."
val hold = s.substringAfter("\"").substringBefore("\"")
val temp = s.split("\"")
val splitted: MutableList<String> = (temp[0] + "\"" + temp[2]).split(",").toMutableList()
splitted[1] = "\"" + hold + "\""
splitted is the List you want

Could anyone explain this spark expression for me?

I'm a new learner of spark. There's one line of code estimating pi but I don't quite understand how it works.
scala>val pi_approx = f"pi = ${355f/113}%.5f"
pi_approx: String = pi = 3.14159
I don't understand the 'f' '$' and '%' in the expression above. Could anyone explain the usage of them? Thanks!
This is the example of String Interpolation that allows users to embed variable references directly in processed string literals. For e.g.
scala> val name = "Scala"
name: String = Scala
scala> println(s"Hello, $name")
Hello, Scala
In above example the literal s"Hello, $name" is a processed string literal.
Scala provides three string interpolation methods out of the box: s, f and raw.
Prepending f to any string literal allows the creation of simple formatted strings, similar to printf in other languages.
The formats allowed after the % character tells that result is formatted as a decimal number while ${} allows any arbitrary expression to be embedded. For e.g.
scala> println(s"1 + 1 = ${1 + 1}")
1 + 1 = 2
More detailed information can be found on:
Scala String Interpolation
Java Formatter

Why Split behaves differently on different Strings?

Here are the two cases :
Case 1:
scala> "".split('f')
res3: Array[String] = Array("")
Case 2:
scala> "f".split('f')
res5: Array[String] = Array()
Why does it behaves diffently here ! A concrete explanation would be great !
In first case you provide a string and a separator that doesn't match any of characters in that string. So it just returns the original string. This can be illustrated with non-empty string example:
scala> "abcd".split('f')
res2: Array[String] = Array(abcd)
However your second string contains only separator. So it matches the separator and splits the string. Since splits contain nothing - it returns an empty array. According to Java String docs:
If expression doesn't match:
If the expression does not match any part of the input then the resulting array has just one element, namely this string.
If expression matches:
Trailing empty strings are therefore not included in the resulting array.
Source: http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String,%20int)
If you will look in implementation of split you will notice that it checks for index of delimiter inside String, and if delimiter don't occur in given String it will result with String itself.

How to wrap a raw string literal without inserting newlines into the raw string?

I have a raw string literal which is very long. Is it possible to split this across multiple lines without adding newline characters to the string?
file.write(r#"This is an example of a line which is well over 100 characters in length. Id like to know if its possible to wrap it! Now some characters to justify using a raw string \foo\bar\baz :)"#)
In Python and C for example, you can simply write this as multiple string literals.
# "some string"
(r"some "
r"string")
Is it possible to do something similar in Rust?
While raw string literals don't support this, it can be achieved using the concat! macro:
let a = concat!(
r#"some very "#,
r#"long string "#,
r#"split over lines"#);
let b = r#"some very long string split over lines"#;
assert_eq!(a, b);
It is possible with indoc.
The indoc!() macro takes a multiline string literal and un-indents it at compile time so the leftmost non-space character is in the first column.
let testing = indoc! {"
def hello():
print('Hello, world!')
hello()
"};
let expected = "def hello():\n print('Hello, world!')\n\nhello()\n";
assert_eq!(testing, expected);
Ps: I really think we could use an AI that recommend good crates to Rust users.

What's the difference between raw string interpolation and triple quotes in scala

Scala has triple quoted strings """String\nString""" to use special characters in the string without escaping. Scala 2.10 also added raw"String\nString" for the same purpose.
Is there any difference in how raw"" and """""" work? Can they produce different output for the same string?
Looking at the source for the default interpolators (found here: https://github.com/scala/scala/blob/2.11.x/src/library/scala/StringContext.scala) it looks like the "raw" interpolator calls the identity function on each letter, so what you put in is what you get out. The biggest difference that you will find is that if you are providing a string literal in your source that includes the quote character, the raw interpolator still won't work. i.e. you can't say
raw"this whole "thing" should be one string object"
but you can say
"""this whole "thing" should be one string object"""
So you might be wondering "Why would I ever bother using the raw interpolator then?" and the answer is that the raw interpolator still performs variable substitution. So
val helloVar = "hello"
val helloWorldString = raw"""$helloVar, "World"!\n"""
Will give you the string "hello, "World"!\n" with the \n not being converted to a newline, and the quotes around the word world.
It is surprising that using the s-interpolator turns escapes back on, even when using triple quotes:
scala> "hi\nthere."
res5: String =
hi
there.
scala> """hi\nthere."""
res6: String = hi\nthere.
scala> s"""hi\nthere."""
res7: String =
hi
there.
The s-interpolator doesn't know that it's processing string parts that were originally triple-quoted. Hence:
scala> raw"""hi\nthere."""
res8: String = hi\nthere.
This matters when you're using backslashes in other ways, such as regexes:
scala> val n = """\d"""
n: String = \d
scala> s"$n".r
res9: scala.util.matching.Regex = \d
scala> s"\d".r
scala.StringContext$InvalidEscapeException: invalid escape character at index 0 in "\d"
at scala.StringContext$.loop$1(StringContext.scala:231)
at scala.StringContext$.replace$1(StringContext.scala:241)
at scala.StringContext$.treatEscapes0(StringContext.scala:245)
at scala.StringContext$.treatEscapes(StringContext.scala:190)
at scala.StringContext$$anonfun$s$1.apply(StringContext.scala:94)
at scala.StringContext$$anonfun$s$1.apply(StringContext.scala:94)
at scala.StringContext.standardInterpolator(StringContext.scala:124)
at scala.StringContext.s(StringContext.scala:94)
... 33 elided
scala> s"""\d""".r
scala.StringContext$InvalidEscapeException: invalid escape character at index 0 in "\d"
at scala.StringContext$.loop$1(StringContext.scala:231)
at scala.StringContext$.replace$1(StringContext.scala:241)
at scala.StringContext$.treatEscapes0(StringContext.scala:245)
at scala.StringContext$.treatEscapes(StringContext.scala:190)
at scala.StringContext$$anonfun$s$1.apply(StringContext.scala:94)
at scala.StringContext$$anonfun$s$1.apply(StringContext.scala:94)
at scala.StringContext.standardInterpolator(StringContext.scala:124)
at scala.StringContext.s(StringContext.scala:94)
... 33 elided
scala> raw"""\d$n""".r
res12: scala.util.matching.Regex = \d\d

Resources