Scala - Remove all elements in a list/map of strings from a single String - string

Working on an internal website where the URL contains the source reference from other systems. This is a business requirement and cannot be changed.
i.e. "http://localhost:9000/source.address.com/7808/project/repo"
"http://localhost:9000/build.address.com/17808/project/repo"
I need to remove these strings from the "project/repo" string/variables using a trait so this can be used natively from multiple services. I also want to be able to add more sources to this list (which already exists) and not modify the method.
"def normalizePath" is the method accessed by services, 2 non-ideal but reasonable attempts so far. Getting stuck on a on using foldLeft which I woudl like some help with or an simpler way of doing the described. Code Samples below.
1st attempt using an if-else (not ideal as need to add more if/else statements down the line and less readable than pattern match)
trait NormalizePath {
def normalizePath(path: String): String = {
if (path.startsWith("build.address.com/17808")) {
path.substring("build.address.com/17808".length, path.length)
} else {
path
}
}
}
and 2nd attempt (not ideal as likely more patterns will get added and it generates more bytecode than if/else)
trait NormalizePath {
val pattern = "build.address.com/17808/"
val pattern2 = "source.address.com/7808/"
def normalizePath(path: String) = path match {
case s if s.startsWith(pattern) => s.substring(pattern.length, s.length)
case s if s.startsWith(pattern2) => s.substring(pattern2.length, s.length)
case _ => path
}
}
Last attempt is to use an address list(already exists elsewhere but defined here as MWE) to remove occurrences from the path string and it doesn't work:
trait NormalizePath {
val replacements = (
"build.address.com/17808",
"source.address.com/7808/")
private def remove(path: String, string: String) = {
path-string
}
def normalizePath(path: String): String = {
replacements.foldLeft(path)(remove)
}
}
Appreciate any help on this!

If you are just stripping out those strings:
val replacements = Seq(
"build.address.com/17808",
"source.address.com/7808/")
replacements.foldLeft("http://localhost:9000/source.address.com/7808/project/repo"){
case(path, toReplace) => path.replaceAll(toReplace, "")
}
// http://localhost:9000/project/repo
If you are replacing those string by something else:
val replacementsMap = Seq(
"build.address.com/17808" -> "one",
"source.address.com/7808/" -> "two/")
replacementsMap.foldLeft("http://localhost:9000/source.address.com/7808/project/repo"){
case(path, (toReplace, replacement)) => path.replaceAll(toReplace, replacement)
}
// http://localhost:9000/two/project/repo
The replacements collection can come from elsewhere in the code and will not need to be redeployed.
// method replacing by empty string
def normalizePath(path: String) = {
replacements.foldLeft(path){
case(startingPoint, toReplace) => startingPoint.replaceAll(toReplace, "")
}
}
normalizePath("foobar/build.address.com/17808/project/repo")
// foobar/project/repo
normalizePath("whateverPath")
// whateverPath
normalizePath("build.address.com/17808build.address.com/17808/project/repo")
// /project/repo

A very simple replacement could be made as follows:
val replacements = Seq(
"build.address.com/17808",
"source.address.com/7808/")
def normalizePath(path: String): String = {
replacements.find(path.startsWith(_)) // find the first occurrence
.map(prefix => path.substring(prefix.length)) // remove the prefix
.getOrElse(path) // if not found, return the original string
}
Since the expected replacements are very similar, have you tried to generalize them and use regex matching?

There are a million and one ways to extract /project/repo from a String in Scala. Here are a few I came up with:
val list = List("build.address.com/17808", "source.address.com/7808") //etc
def normalizePath(path: String) = {
path.stripPrefix(list.find(x => path.contains(x)).getOrElse(""))
}
Output:
scala> normalizePath("build.address.com/17808/project/repo")
res0: String = /project/repo
val list = List("build.address.com/17808", "source.address.com/7808") //etc
def normalizePath(path: String) = {
list.map(x => if (path.contains(x)) {
path.takeRight(path.length - x.length)
}).filter(y => y != ()).head
}
Output:
scala> normalizePath("build.address.com/17808/project/repo")
res0: Any = /project/repo
val list = List("build.address.com/17808", "source.address.com/7808") //etc
def normalizePath(path: String) = {
list.foldLeft(path)((a, b) => a.replace(b, ""))
}
Output:
scala> normalizePath("build.address.com/17808/project/repo")
res0: String = /project/repo
Depends how complicated you want your code to look (or how silly you want to be), really. Note that the second example has return type Any, which might not be ideal for your scenario. Also, these examples aren't meant to be able to just take the String out of the middle of your path... they can be fairly easily modified if you want to do that though. Let me know if you want me to add some examples just stripping things like build.address.com/17808 out of a String - I'd be happy to do so.

Related

Is there a shorter replacement for Kotlin's deprecated String.capitalize() function?

Kotlin deprecated the capitalize function on String class, and their suggested replacement is obnoxiously long. This is an example of a situation where they made the right call on deprecating it, but the wrong call on the user experience.
For example, this code:
val x = listOf("foo", "bar", "baz").map { it.capitalize() }
is "cleaned up" by the IDE to become:
val x = listOf("foo", "bar", "baz").map { it.replaceFirstChar {
if (it.isLowerCase()) it.titlecase(
Locale.getDefault()
) else it.toString()
} }
This is preeeeetty ugly. What can we do about it?
The suggested replacement is ugly because it needs to be equivalent to what capitalize() used to do:
dependent on the default locale
NOT converting an uppercase first char into titlecase (e.g.
capitalize does NOT transform a leading 'DŽ' into 'Dž' - both are single characters here, try to select them)
If you didn't care too much about this behaviour, you can use a simpler expression using an invariant locale and unconditionally titlecasing the first character even if uppercase:
val x = listOf("foo", "bar", "baz").map { it.replaceFirstChar(Char::titlecase) }
This means that if the first character is uppercase like 'DŽ', it will be transformed into the titlecase variant 'Dž' anyway, while the original code wouldn't touch it. This might actually be desirable.
One of the reasons capitalize() has been deprecated is because the behaviour of the method was unclear. For instance:
behaviour #2 is pretty weird
not capitalizing words in a sentence might be unexpected (C# would titlecase every space-separated word)
not lowercasing other characters of the words might be unexpected as well
If you want to keep the exact current behaviour on purpose, but make it more convenient to use, you can always roll your own extension function with a name that suits you ("capitalize(d)" might not give enough info to the unaware reader):
fun String.titlecaseFirstCharIfItIsLowercase() = replaceFirstChar {
if (it.isLowerCase()) it.titlecase(Locale.getDefault()) else it.toString()
}
Or for the version with invariant locale that titlecases the uppercase chars:
fun String.titlecaseFirstChar() = replaceFirstChar(Char::titlecase)
A neat solution is to define a new extension function on String, which hides the gory details with a cleaner name:
/**
* Replacement for Kotlin's deprecated `capitalize()` function.
*/
fun String.capitalized(): String {
return this.replaceFirstChar {
if (it.isLowerCase())
it.titlecase(Locale.getDefault())
else it.toString()
}
}
Now your old code can look like this:
val x = listOf("foo", "bar", "baz").map { it.capitalized() }
You'll need to define the extension function at the top level in some package that you can import easily. For example, if you have a kotlin file called my.package.KotlinUtils (KotlinUtils.kt), and you put the definition inside it like so:
package my.package
fun String.capitalized(): String {...}
Then you can import it in your other packages with:
import my.package.capitalized
val fruits = listOf("baNana", "avocAdo", "apPle", "kiwifRuit")
fruits
.filter { it.startsWith("a") }
.sortedBy { it }
.map { it.lowercase().replaceFirstChar(Char::uppercase) }
.forEach { println(it) }
Output:
Apple
Avocado
You can call the replaceFirstChar function on the original string and pass the transform function as input. The transform function takes the first character and converts it to an uppercase character using the uppercase() function.
val list = listOf("foo", "bar", "baz") .map {
it.replaceFirstChar { firstChar ->
firstChar.uppercase()
}
}
println("List - > $list")
Output
List - > [Foo, Bar, Baz]
How about this?
fun main() {
val x = listOf("foo", "bar", "baz").map { it[0].uppercase() + it.drop(1) }
println(x)
}
Output:
[Foo, Bar, Baz]
If you are not sure (maybe you receive Strings from an API) if the first letter is upper or lower case , you can use the below method;
var title = "myTitle"
title.replaceFirstChar {
if (it.isLowerCase()) it.titlecase(Locale.getDefault()) else
it.toString()
}
New title will be "MyTitle"
You can use this extension function to capitalize first characture of String
fun String.capitalize(): String {
return this.replaceFirstChar {
if (it.isLowerCase()) it.titlecase(Locale.getDefault())
else it.toString()
}
}
And call this method like
"abcd".capitalize()
I found a method trying to capitalize a string that came from the API and it apparently worked, found it in the Kotlin docs:
println("kotlin".replaceFirstChar { it.uppercase() }) // Kotlin
and use it like this in my code:
binding.textDescriptions.text = "${it.Year} - ${it.Type.replaceFirstChar {it.uppercase()}}"

BSON to Play JSON support for Long values

I've started using the play-json/play-json-compat libraries with reactivemongo 0.20.11.
So I can use JSON Play reads/writes while importing the 'reactivemongo.play.json._' package and then easily fetch data from a JSONCollection instead of a BSONCollection.
For most cases, this works great but for Long fields, it doesn't :(
For example:
case class TestClass(name: String, age: Long)
object TestClass {
implicit val reads = Json.reads[TestClass]
}
If I try querying using the following func:
def getData: Map[String, TestClass] = {
val res = collection.find(emptyDoc)
.cursor[TestClass]()
.collect[List](-1, Cursor.ContOnError[List[TestClass]] { case (_, t) =>
failureLogger.error(s"Failed deserializing TestClass from Mongo", t)
})
.map { items =>
items map { item =>
item.name -> item.age
} toMap
}
Await.result(res, 10 seconds)
}
Then I get the following error:
play.api.libs.json.JsResultException: JsResultException(errors:List((/age,List(ValidationError(List(error.expected.jsnumber),WrappedArray())))))
I've debugged the reading of the document and noticed that when it first converts the BSON to a JsObject, then the long field is as following:
"age": {"$long": 1526389200000}
I found a way to make this work but I really don't like it:
case class MyBSONLong(`$long`: Long)
object MyBSONLong {
implicit val longReads = Json.reads[MyBSONLong]
}
case class TestClass(name: String, age: Long)
object TestClass {
implicit val reads = (
(__ \ "name").read[String] and
(__ \ "age").read[MyBSONLong].map(_.`$long`)
) (apply _)
}
So this works, but it's a very ugly solution.
Is there a better way to do this?
Thanks in advance :)

Convert RDD[Array[Row]] to RDD[Row]

How to convert RDD[Array[Row]] to RDD[Row]?
Details:
I have some use case where my parsing function returns type Array[Row] for some data and Row for some data. How will I convert both of these to RDD[Row] for further use?
CODE SAMPLE
private def getRows(rdd: RDD[String], parser: Parser): RDD[Row] = {
var processedLines = rdd.map { line =>
map(p => parser.processBeacon(line) }
val rddOfRowsList = processedLines.map { x =>
x match {
case Right(obj) => obj.map { p =>
MyRow.getValue(p)
}//I can use flatmap here
case Left(obj) =>
MyRow.getValue(obj)
}//Cant use flatmap here
}
// Here I have to convert rddOfRowsList to RDD[Row]
//?????
val rowsRdd =?????
//
rowsRdd
}
def processLine(logMap: Map[String, String]):Either[Map[String, Object], Array[Map[String, Object]]] =
{
//process
}
Use flatMap;
rdd.flatMap(identity)
You ca use flatmap to get new rdd, and then use union to compose them.
use flatMap to flattern the contents of RDD

How can I retrieve the alias for a DataFrame in Spark

I'm using Spark 2.0.2. I have a DataFrame that has an alias on it, and I'd like to be able to retrieve that. A simplified example of why I'd want that is below.
def check(ds: DataFrame) = {
assert(ds.count > 0, s"${df.getAlias} has zero rows!")
}
The above code of course fails because DataFrame has no getAlias function. Is there a way to do this?
You can try something like this but I wouldn't go so far to claim it is supported:
Spark < 2.1:
import org.apache.spark.sql.catalyst.plans.logical.SubqueryAlias
import org.apache.spark.sql.Dataset
def getAlias(ds: Dataset[_]) = ds.queryExecution.analyzed match {
case SubqueryAlias(alias, _) => Some(alias)
case _ => None
}
Spark 2.1+:
def getAlias(ds: Dataset[_]) = ds.queryExecution.analyzed match {
case SubqueryAlias(alias, _, _) => Some(alias)
case _ => None
}
Example usage:
val plain = Seq((1, "foo")).toDF
getAlias(plain)
Option[String] = None
val aliased = plain.alias("a dataset")
getAlias(aliased)
Option[String] = Some(a dataset)
Disclaimer: as stated above, this code relies on undocumented APIs subject to change. It works as of Spark 2.3.
After much digging into mostly undocumented Spark methods, here is the full code to pull the list of fields, along with the table alias for a dataframe in PySpark:
def schema_from_plan(df):
plan = df._jdf.queryExecution().analyzed()
all_fields = _schema_from_plan(plan)
iterator = plan.output().iterator()
output_fields = {}
while iterator.hasNext():
field = iterator.next()
queryfield = all_fields.get(field.exprId().id(),{})
if not queryfield=={}:
tablealias = queryfield["tablealias"]
else:
tablealias = ""
output_fields[field.exprId().id()] = {
"tablealias": tablealias,
"dataType": field.dataType().typeName(),
"name": field.name()
}
return list(output_fields.values())
def _schema_from_plan(root,tablealias=None,fields={}):
iterator = root.children().iterator()
while iterator.hasNext():
node = iterator.next()
nodeClass = node.getClass().getSimpleName()
if (nodeClass=="SubqueryAlias"):
# get the alias and process the subnodes with this alias
_schema_from_plan(node,node.alias(),fields)
else:
if tablealias:
# add all the fields, along with the unique IDs, and a new tablealias field
iterator = node.output().iterator()
while iterator.hasNext():
field = iterator.next()
fields[field.exprId().id()] = {
"tablealias": tablealias,
"dataType": field.dataType().typeName(),
"name": field.name()
}
_schema_from_plan(node,tablealias,fields)
return fields
# example: fields = schema_from_plan(df)
For Java:
As #veinhorn mentioned, it is also possible to get the alias in Java. Here is a utility method example:
public static <T> Optional<String> getAlias(Dataset<T> dataset){
final LogicalPlan analyzed = dataset.queryExecution().analyzed();
if(analyzed instanceof SubqueryAlias) {
SubqueryAlias subqueryAlias = (SubqueryAlias) analyzed;
return Optional.of(subqueryAlias.alias());
}
return Optional.empty();
}

How to use stringByAddingPercentEncodingWithAllowedCharacters() for a URL in Swift 2.0

I was using this, in Swift 1.2
let urlwithPercentEscapes = myurlstring.stringByAddingPercentEscapesUsingEncoding(NSUTF8StringEncoding)
This now gives me a warning asking me to use
stringByAddingPercentEncodingWithAllowedCharacters
I need to use a NSCharacterSet as an argument, but there are so many and I cannot determine what one will give me the same outcome as the previously used method.
An example URL I want to use will be like this
http://www.mapquestapi.com/geocoding/v1/batch?key=YOUR_KEY_HERE&callback=renderBatch&location=Pottsville,PA&location=Red Lion&location=19036&location=1090 N Charlotte St, Lancaster, PA
The URL Character Set for encoding seems to contain sets the trim my
URL. i.e,
The path component of a URL is the component immediately following the
host component (if present). It ends wherever the query or fragment
component begins. For example, in the URL
http://www.example.com/index.php?key1=value1, the path component is
/index.php.
However I don't want to trim any aspect of it.
When I used my String, for example myurlstring it would fail.
But when used the following, then there were no issues. It encoded the string with some magic and I could get my URL data.
let urlwithPercentEscapes = myurlstring.stringByAddingPercentEscapesUsingEncoding(NSUTF8StringEncoding)
As it
Returns a representation of the String using a given encoding to
determine the percent escapes necessary to convert the String into a
legal URL string
Thanks
For the given URL string the equivalent to
let urlwithPercentEscapes = myurlstring.stringByAddingPercentEscapesUsingEncoding(NSUTF8StringEncoding)
is the character set URLQueryAllowedCharacterSet
let urlwithPercentEscapes = myurlstring.stringByAddingPercentEncodingWithAllowedCharacters( NSCharacterSet.URLQueryAllowedCharacterSet())
Swift 3:
let urlwithPercentEscapes = myurlstring.addingPercentEncoding( withAllowedCharacters: .urlQueryAllowed)
It encodes everything after the question mark in the URL string.
Since the method stringByAddingPercentEncodingWithAllowedCharacters can return nil, use optional bindings as suggested in the answer of Leo Dabus.
It will depend on your url. If your url is a path you can use the character set
urlPathAllowed
let myFileString = "My File.txt"
if let urlwithPercentEscapes = myFileString.addingPercentEncoding(withAllowedCharacters: .urlPathAllowed) {
print(urlwithPercentEscapes) // "My%20File.txt"
}
Creating a Character Set for URL Encoding
urlFragmentAllowed
urlHostAllowed
urlPasswordAllowed
urlQueryAllowed
urlUserAllowed
You can create also your own url character set:
let myUrlString = "http://www.mapquestapi.com/geocoding/v1/batch?key=YOUR_KEY_HERE&callback=renderBatch&location=Pottsville,PA&location=Red Lion&location=19036&location=1090 N Charlotte St, Lancaster, PA"
let urlSet = CharacterSet.urlFragmentAllowed
.union(.urlHostAllowed)
.union(.urlPasswordAllowed)
.union(.urlQueryAllowed)
.union(.urlUserAllowed)
extension CharacterSet {
static let urlAllowed = CharacterSet.urlFragmentAllowed
.union(.urlHostAllowed)
.union(.urlPasswordAllowed)
.union(.urlQueryAllowed)
.union(.urlUserAllowed)
}
if let urlwithPercentEscapes = myUrlString.addingPercentEncoding(withAllowedCharacters: .urlAllowed) {
print(urlwithPercentEscapes) // "http://www.mapquestapi.com/geocoding/v1/batch?key=YOUR_KEY_HERE&callback=renderBatch&location=Pottsville,PA&location=Red%20Lion&location=19036&location=1090%20N%20Charlotte%20St,%20Lancaster,%20PA"
}
Another option is to use URLComponents to properly create your url
Swift 3.0 (From grokswift)
Creating URLs from strings is a minefield for bugs. Just miss a single / or accidentally URL encode the ? in a query and your API call will fail and your app won’t have any data to display (or even crash if you didn’t anticipate that possibility). Since iOS 8 there’s a better way to build URLs using NSURLComponents and NSURLQueryItems.
func createURLWithComponents() -> URL? {
var urlComponents = URLComponents()
urlComponents.scheme = "http"
urlComponents.host = "www.mapquestapi.com"
urlComponents.path = "/geocoding/v1/batch"
let key = URLQueryItem(name: "key", value: "YOUR_KEY_HERE")
let callback = URLQueryItem(name: "callback", value: "renderBatch")
let locationA = URLQueryItem(name: "location", value: "Pottsville,PA")
let locationB = URLQueryItem(name: "location", value: "Red Lion")
let locationC = URLQueryItem(name: "location", value: "19036")
let locationD = URLQueryItem(name: "location", value: "1090 N Charlotte St, Lancaster, PA")
urlComponents.queryItems = [key, callback, locationA, locationB, locationC, locationD]
return urlComponents.url
}
Below is the code to access url using guard statement.
guard let url = createURLWithComponents() else {
print("invalid URL")
return nil
}
print(url)
Output:
http://www.mapquestapi.com/geocoding/v1/batch?key=YOUR_KEY_HERE&callback=renderBatch&location=Pottsville,PA&location=Red%20Lion&location=19036&location=1090%20N%20Charlotte%20St,%20Lancaster,%20PA
In Swift 3.1, I am using something like the following:
let query = "param1=value1&param2=" + valueToEncode.addingPercentEncoding(withAllowedCharacters: .alphanumeric)
It's safer than .urlQueryAllowed and the others, because it this will encode every characters other than A-Z, a-z and 0-9. This works better when the value you are encoding may use special characters like ?, &, =, + and spaces.
In my case where the last component was non latin characters I did the following in Swift 2.2:
extension String {
func encodeUTF8() -> String? {
//If I can create an NSURL out of the string nothing is wrong with it
if let _ = NSURL(string: self) {
return self
}
//Get the last component from the string this will return subSequence
let optionalLastComponent = self.characters.split { $0 == "/" }.last
if let lastComponent = optionalLastComponent {
//Get the string from the sub sequence by mapping the characters to [String] then reduce the array to String
let lastComponentAsString = lastComponent.map { String($0) }.reduce("", combine: +)
//Get the range of the last component
if let rangeOfLastComponent = self.rangeOfString(lastComponentAsString) {
//Get the string without its last component
let stringWithoutLastComponent = self.substringToIndex(rangeOfLastComponent.startIndex)
//Encode the last component
if let lastComponentEncoded = lastComponentAsString.stringByAddingPercentEncodingWithAllowedCharacters(NSCharacterSet.alphanumericCharacterSet()) {
//Finally append the original string (without its last component) to the encoded part (encoded last component)
let encodedString = stringWithoutLastComponent + lastComponentEncoded
//Return the string (original string/encoded string)
return encodedString
}
}
}
return nil;
}
}
Swift 4.0
let encodedData = myUrlString.addingPercentEncoding(withAllowedCharacters: CharacterSet.urlHostAllowed)

Resources