transform a Spark(Java) object into SparkR object - apache-spark

I need to transform(convert) below Spark(Java) object into SparkR object ,below my object
struct_schema_megdp(output_count)
.....
Java ref type org.apache.spark.sql.types.StructType id 32
and able to get the fields only
sparkR.callJMethod(struct_schema_megdp(output_count), "fields")
[[1]]
Java ref type org.apache.spark.sql.types.StructField id 38
[[2]]
Java ref type org.apache.spark.sql.types.StructField id 39
I want it to be converted to SparkR , e.g below object(schema):
struct_schema <- function(output_count)
{
return(do.call(SparkR::structType, lapply(SparkR::dtypes(output_count), function(x) SparkR::structField(x[1], x[2]))))
}
struct_schema(output_count)
StructType
|-name = "word", type = "StringType", nullable = TRUE
|-name = "count", type = "StringType", nullable = TRUE
I expected to be like :
> **struct_schema(output_count)**
StructType
|-name = "word", type = "StringType", nullable = TRUE
|-name = "count", type = "StringType", nullable = TRUE

Related

How add standard attributes with space contain

standard attributes
how to add
given name
family name
schema {
attribute_data_type = "Sring"
developer_only_attribute = false
mutable = false
name = "famil name"
required = false
string_attribute_constraints {
min_length = 7
max_length = 150
}

Extract Datatype from Cell Values in PowerQuery?

How can I save the datatype of each cell into a new column? I want the 'Custom' column to display the datatypes of each item in the "values" column. I've tried using Value.Type([values]) but the output just displays this 'Type' value. If I click on 'Type' it creates a new navigation query and I can see that the datatypes are saved inside of it but I can't seem to extract them.
Primitive vs Ascribed types
#AlexisOlson:
This is a possible solution but I don't like it.
You're right, that answer is not correct. It uses Table.Schema, which tests for ascribed types That's usually the right type, but if data isn't transformed correctly, it can be wrong
You need to use Value.Type() to get the real / actual primitive type
Query
let
Sample = {
123, 0.45, true,
"text",
Currency.From(12.34),
Int64.From(91),
DateTime.LocalNow(), DateTimeZone.FixedLocalNow(), #date(2021,1,3),
#duration(0,0,1,3), #binary({2,3,4}),
#table(type table[Foo=text], {{"bar"}}),
[ a = "b" ],
each "x",
{0..3}
},
records = List.Transform(
Sample,
(item) => [
Original = item,
Name = TypeToText( Value.Type( item ))
]
),
Final = Table.FromRecords( records, type table[Original = any, Name = text])
in
Final
Type.ToText.pq
This is the simplified version of _serialize_typename() which contains true serialization at DataConnectors/UnitTesting.query.pq
let
Type.ToText = (x, optional funtype as logical) =>
let
isFunctionType = (x as type) => try if Type.FunctionReturn(x) is type then true else false otherwise false,
isTableType = (x as type) => try if Type.TableSchema(x) is table then true else false otherwise false,
isRecordType = (x as type) => try if Type.ClosedRecord(x) is type then true else false otherwise false,
isListType = (x as type) => try if Type.ListItem(x) is type then true else false otherwise false
in
if funtype = null and isTableType(x) then "Table"
else if funtype = null and isListType(x) then "list"
else if funtype = null and isFunctionType(x) then "Function"
else if funtype = null and isRecordType(x) then "Record"
else if x = type any then "any"
else let base = Type.NonNullable(x) in
(if Type.IsNullable(x) then "nullable " else "") &
(if base = type anynonnull then "anynonnull" else
if base = type binary then "binary" else
if base = type date then "date" else
if base = type datetime then "datetime" else
if base = type datetimezone then "datetimezone" else
if base = type duration then "duration" else
if base = type logical then "logical" else
if base = type none then "none" else
if base = type null then "null" else
if base = type number then "number" else
if base = type text then "text" else
if base = type time then "time" else
if base = type type then "type" else
/* Abstract types: */
if base = type function then "function" else
if base = type table then "table" else
if base = type record then "record" else
if base = type list then "list"
else "any /*Actually unknown type*/")
in
Type.ToText

Merging overlapping strings

Suppose I need to merge two overlapping strings like that:
def mergeOverlap(s1: String, s2: String): String = ???
mergeOverlap("", "") // ""
mergeOverlap("", "abc") // abc
mergeOverlap("xyz", "abc") // xyzabc
mergeOverlap("xab", "abc") // xabc
I can write this function using the answer to one of my previous questions:
def mergeOverlap(s1: String, s2: String): String = {
val n = s1.tails.find(tail => s2.startsWith(tail)).map(_.size).getOrElse(0)
s1 ++ s2.drop(n)
}
Could you suggest either a simpler or maybe more efficient implementation of mergeOverlap?
You can find the overlap between two strings in time proportional to the total length of the strings O(n + k) using the algorithm to calculate the prefix function. Prefix function of a string at index i is defined as the size of the longest suffix at index i that is equal to the prefix of the whole string (excluding the trivial case).
See those links for more explanation of the definition and the algorithm to compute it:
https://cp-algorithms.com/string/prefix-function.html
https://hyperskill.org/learn/step/6413#a-definition-of-the-prefix-function
Here is an implementation of a modified algorithm that calculates the longest prefix of the second argument, equal to the suffix of the first argument:
import scala.collection.mutable.ArrayBuffer
def overlap(hasSuffix: String, hasPrefix: String): Int = {
val overlaps = ArrayBuffer(0)
for (suffixIndex <- hasSuffix.indices) {
val currentCharacter = hasSuffix(suffixIndex)
val currentOverlap = Iterator.iterate(overlaps.last)(overlap => overlaps(overlap - 1))
.find(overlap =>
overlap == 0 ||
hasPrefix.lift(overlap).contains(currentCharacter))
.getOrElse(0)
val updatedOverlap = currentOverlap +
(if (hasPrefix.lift(currentOverlap).contains(currentCharacter)) 1 else 0)
overlaps += updatedOverlap
}
overlaps.last
}
And with that mergeOverlap is just
def mergeOverlap(s1: String, s2: String) =
s1 ++ s2.drop(overlap(s1, s2))
And some tests of this implementation:
scala> mergeOverlap("", "")
res0: String = ""
scala> mergeOverlap("abc", "")
res1: String = abc
scala> mergeOverlap("", "abc")
res2: String = abc
scala> mergeOverlap("xyz", "abc")
res3: String = xyzabc
scala> mergeOverlap("xab", "abc")
res4: String = xabc
scala> mergeOverlap("aabaaab", "aab")
res5: String = aabaaab
scala> mergeOverlap("aabaaab", "aabc")
res6: String = aabaaabc
scala> mergeOverlap("aabaaab", "bc")
res7: String = aabaaabc
scala> mergeOverlap("aabaaab", "bbc")
res8: String = aabaaabbc
scala> mergeOverlap("ababab", "ababc")
res9: String = abababc
scala> mergeOverlap("ababab", "babc")
res10: String = abababc
scala> mergeOverlap("abab", "aab")
res11: String = ababaab
It's not tail recursive but it is a very simple algorithm.
def mergeOverlap(s1: String, s2: String): String =
if (s2 startsWith s1) s2
else s1.head +: mergeOverlap(s1.tail, s2)

Scala , Spark code : Iterating over array and evaluating expression using an element in the array

I am coding in scala-spark and trying to segregate all strings and column datatypes .
I am getting the output for columns(2)_2 albeit with a warning but when i apply the same thing in the if statement i get an error . Any idea why. This part got solved by adding columns(2)._2 : David Griffin
var df = some dataframe
var columns = df.dtypes
var colnames = df.columns.size
var stringColumns:Array[(String,String)] = null;
var doubleColumns:Array[(String,String)] = null;
var otherColumns:Array [(String,String)] = null;
columns(2)._2
columns(2)._1
for (x<-1 to colnames)
{
if (columns(x)._2 == "StringType")
{stringColumns = stringColumns ++ Seq((columns(x)))}
if (columns(x)._2 == "DoubleType")
{doubleColumns = doubleColumns ++ Seq((columns(x)))}
else
{otherColumns = otherColumns ++ Seq((columns(x)))}
}
Previous Output:
stringColumns: Array[(String, String)] = null
doubleColumns: Array[(String, String)] = null
otherColumns: Array[(String, String)] = null
res158: String = DoubleType
<console>:127: error: type mismatch;
found : (String, String)
required: scala.collection.GenTraversableOnce[?]
{stringColumns = stringColumns ++ columns(x)}
Current Output:
stringColumns: Array[(String, String)] = null
doubleColumns: Array[(String, String)] = null
otherColumns: Array[(String, String)] = null
res382: String = DoubleType
res383: String = CVB
java.lang.NullPointerException
^
I believe you are missing a .. Change this:
columns(2)_2
to
columns(2)._2
If nothing else, it will get rid of the warning.
And then, you need to do:
++ Seq(columns(x))
Here's a cleaner example:
scala> val arr = Array[(String,String)]()
arr: Array[(String, String)] = Array()
scala> arr ++ (("foo", "bar"))
<console>:9: error: type mismatch;
found : (String, String)
required: scala.collection.GenTraversableOnce[?]
arr ++ (("foo", "bar"))
scala> arr ++ Seq(("foo", "bar"))
res2: Array[(String, String)] = Array((foo,bar))
This is the answer modified from David Griffins answer so please up-vote him too . Just altered ++ to +:=
var columns = df.dtypes
var colnames = df.columns.size
var stringColumns= Array[(String,String)]();
var doubleColumns= Array[(String,String)]();
var otherColumns= Array[(String,String)]();
for (x<-0 to colnames-1)
{
if (columns(x)._2 == "StringType"){
stringColumns +:= columns(x)
}else if (columns(x)._2 == "DoubleType") {
doubleColumns +:= columns(x)
}else {
otherColumns +:= columns(x)
}
}
println(stringColumns)
println(doubleColumns)
println(otherColumns)

How can I convert a String to a Symbol in Runtime in Scala?

I have a case class that looks like this:
case class Outcome(text: Symbol)
Now I need to change the value of text at runtime. I try to do something like this:
val o2 = o1.copy(text.name = "foo" ++ text.name)
This obviously gives me a compilation error:
type mismatch; found : String required: Symbol
How can I convert a symbol to string, append/prepend something and again change it to a symbol? Or to be more simple, how can I change the name of a symbol?
You could use Symbol.apply method:
Symbol("a" + "b")
// Symbol = 'ab
val o2 = o1.copy(text = Symbol("foo" + o1.text.name))
There is a useful tool to work with nested structures in scalaz - Lens
import scalaz._, Scalaz._
case class Outcome(symbol: Symbol)
val symbolName = Lens.lensu[Symbol, String]( (_, str) => Symbol(str), _.name)
val outcomeSymbol =
Lens.lensu[Outcome, Symbol]( (o, s) => o.copy(symbol = s), _.symbol)
val outcomeSymbolName = outcomeSymbol >=> symbolName
val o = Outcome('Bar)
val o2 = outcomeSymbolName.mod("foo" + _, o)
// o2: Outcome = Outcome('fooBar)

Resources