Pass row to UDF and select column based on pattern match - apache-spark

How can I achieve the following by passing a row to a udf ?
val df1 = df.withColumn("col_Z",
when($"col_x" === "a", $"col_A")
.when($"col_x" === "b", $"col_B")
.when($"col_x" === "c", $"col_C")
.when($"col_x" === "d", $"col_D")
.when($"col_x" === "e", $"col_E")
.when($"col_x" === "f", $"col_F")
.when($"col_x" === "g", $"col_G")
)
As I understand it, only columns can be passed as arguments to a UDF in Scala Spark.
I have taken a look at this question:
How to pass whole Row to UDF - Spark DataFrame filter
and tried to implement this udf:
def myUDF(r:Row) = udf {
val z : Float = r.getAs("col_x") match {
case "a" => r.getAs("col_A")
case "b" => r.getAs("col_B")
case other => lit(0.0)
}
z
}
but I'm getting a type mismatch error:
error: type mismatch;
found : String("a")
required: Nothing
case "a" => r.getAs("col_A")
^
What am I doing wrong ?

Related

Terraform - Multiple maps with lists [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 26 days ago.
Improve this question
I struggle to loop through my data structure. If anyone have any feedback regarding my data structure, this is highly desirable
Data structure
locals = {
values = {
key1 = ["a", "b"],
key2 = ["c", "d"]
}
}
What I've tried
value = { for key, value in local.values : key => values }
This basically prints out local.array as is. I know I should have the ability to loop through the value given in the expression above, but I'm not able to do so.
Desired output
# Following does NOT work
value = { for key, values in local.values : key =>
for v in values : key => v}
Key1: a
Key1: b
Key2: c
Key2: d
A map must have unique keys, you cannot use the same key twice. you could vor example make a map of the values to keys like.
locals {
values = {
key1 = ["a", "b"],
key2 = ["c", "d"]
}
}
output "vals" {
value = merge([for key, values in local.values: { for value in values: value => key}]...)
}
output
Outputs:
vals = {
"a" = "key1"
"b" = "key1"
"c" = "key2"
"d" = "key2"
}
Based on your comment you could essentially convert it to a list of maps where each map has a single map element that you could iterate over
locals {
values = {
key1 = ["a", "b"],
key2 = ["c", "d"]
}
}
output "vals" {
value = concat([for key, values in local.values: [for value in values: {(key) = value}]]...)
}
OUTPUT
Outputs:
vals = [
{
"key1" = "a"
},
{
"key1" = "b"
},
{
"key2" = "c"
},
{
"key2" = "d"
},
]

String Permutations of Different Lengths

I have been trying to wrap my head around something and can't seem to find an answer. I know how to get all the permutations of a string as it is fairly easy. What I want to try and do is get all the permutations of the string in different sizes. For example:
Given "ABCD" and a lower limit of 3 chars I would want to get back ABC, ABD, ACB, ACD, ADB, ADC, ... , ABCD, ACBD, ADBC, .. etc.
I'm not quite sure how to accomplish that. I have it in my head that it is something that could be very complicated or very simple. Any help pointing me in a direction is appreciated. Thanks.
If you've already got the full-length permutations, you can drop stuff off of the front or back, and insert the result into a set.
XCTAssertEqual(
Permutations(["A", "B", "C"]).reduce( into: Set() ) { set, permutation in
permutation.indices.forEach {
set.insert( permutation.dropLast($0) )
}
},
[ ["A", "B", "C"],
["A", "C", "B"],
["B", "C", "A"],
["B", "A", "C"],
["C", "A", "B"],
["C", "B", "A"],
["B", "C"],
["C", "B"],
["C", "A"],
["A", "C"],
["A", "B"],
["B", "A"],
["A"],
["B"],
["C"]
]
)
public struct Permutations<Sequence: Swift.Sequence>: Swift.Sequence, IteratorProtocol {
public typealias Array = [Sequence.Element]
private let array: Array
private var iteration = 0
public init(_ sequence: Sequence) {
array = Array(sequence)
}
public mutating func next() -> Array? {
guard iteration < array.count.factorial!
else { return nil }
defer { iteration += 1 }
return array.indices.reduce(into: array) { permutation, index in
let shift =
iteration / (array.count - 1 - index).factorial!
% (array.count - index)
permutation.replaceSubrange(
index...,
with: permutation.dropFirst(index).shifted(by: shift)
)
}
}
}
public extension Collection where SubSequence: RangeReplaceableCollection {
func shifted(by shift: Int) -> SubSequence {
let drops =
shift > 0
? (shift, count - shift)
: (count + shift, -shift)
return dropFirst(drops.0) + dropLast(drops.1)
}
}
public extension BinaryInteger where Stride: SignedInteger {
/// - Note: `nil` for negative numbers
var factorial: Self? {
switch self {
case ..<0:
return nil
case 0...1:
return 1
default:
return (2...self).reduce(1, *)
}
}
}

spark sql dynamic filter condition

How can I construct a boolean filter condition dynamically in spark sql?
Having:
val d = Seq(1, 2, 3, 5, 6).toDF
d.filter(col("value") === 1 or col("value") === 3).show
How can I replicate this dynamically:
val desiredThings = Seq(1,3)
I try to build the filter:
val myCondition = desiredThings.map(col("value") === _)
d.filter(myCondition).show
but fail with:
overloaded method value filter with alternatives:
org.apache.spark.api.java.function.FilterFunction[org.apache.spark.sql.Row]
cannot be applied to (Seq[org.apache.spark.sql.Column])
When executing
d.filter(myCondition).show
Also when experimenting with fold left:
val myCondition = desiredThings.foldLeft()((result, entry) => result && col(c.columnCounterId) === entry)
I have compile errors.
How can I adapt the code to dynamically generate the filter predicate?
Just use isin:
d.filter(col("value").isin(desiredThings: _*))
but if you really want to foldLeft you have to provide the base condition:
d.filter(desiredThings.foldLeft(lit(false))(
(acc, x) => (acc || col("value") === (x)))
)
Alternatively, to use with filter or where, you can generate a SQL expression using:
val filterExpr = desiredThings.map( v => s"value = $v").mkString(" or ")
And then use it like
d.filter(filterExpr).show
// or
d.where(filterExpr).show
//+-----+
//|value|
//+-----+
//| 1|
//| 3|
//+-----+

Golang Alphabetic representation of a number

Is there an easy way to convert a number to a letter?
For example,
3 => "C" and 23 => "W"?
For simplicity range check is omitted from below solutions.
They all can be tried on the Go Playground.
Number -> rune
Simply add the number to the const 'A' - 1 so adding 1 to this you get 'A', adding 2 you get 'B' etc.:
func toChar(i int) rune {
return rune('A' - 1 + i)
}
Testing it:
for _, i := range []int{1, 2, 23, 26} {
fmt.Printf("%d %q\n", i, toChar(i))
}
Output:
1 'A'
2 'B'
23 'W'
26 'Z'
Number -> string
Or if you want it as a string:
func toCharStr(i int) string {
return string('A' - 1 + i)
}
Output:
1 "A"
2 "B"
23 "W"
26 "Z"
This last one (converting a number to string) is documented in the Spec: Conversions to and from a string type:
Converting a signed or unsigned integer value to a string type yields a string containing the UTF-8 representation of the integer.
Number -> string (cached)
If you need to do this a lot of times, it is profitable to store the strings in an array for example, and just return the string from that:
var arr = [...]string{"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M",
"N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"}
func toCharStrArr(i int) string {
return arr[i-1]
}
Note: a slice (instead of the array) would also be fine.
Note #2: you may improve this if you add a dummy first character so you don't have to subtract 1 from i:
var arr = [...]string{".", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M",
"N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"}
func toCharStrArr(i int) string { return arr[i] }
Number -> string (slicing a string constant)
Also another interesting solution:
const abc = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
func toCharStrConst(i int) string {
return abc[i-1 : i]
}
Slicing a string is efficient: the new string will share the backing array (it can be done because strings are immutable).
If you need not a rune, but a string and also more than one character for e.g. excel column
package main
import (
"fmt"
)
func IntToLetters(number int32) (letters string){
number--
if firstLetter := number/26; firstLetter >0{
letters += IntToLetters(firstLetter)
letters += string('A' + number%26)
} else {
letters += string('A' + number)
}
return
}
func main() {
fmt.Println(IntToLetters(1))// print A
fmt.Println(IntToLetters(26))// print Z
fmt.Println(IntToLetters(27))// print AA
fmt.Println(IntToLetters(1999))// print BXW
}
preview here: https://play.golang.org/p/GAWebM_QCKi
I made also package with this: https://github.com/arturwwl/gointtoletters
The simplest solution would be
func stringValueOf(i int) string {
var foo = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
return string(foo[i-1])
}
Hope this will help you to solve your problem. Happy Coding!!

Groovy: the simplest way to detect duplicate, non-consecutive values in a list

I know that in Groovy,
if
list = [1,2,3,1]
when
list.unique()
with return
[1,2,3]
But if I want to detect duplicate value for duplicate, non-consecutive items in a list. How can I do this?
detect([1,2,3,1]) => true
detect([1,2,3,2]) => true
detect([1,1,2,3]) => false
detect([1,2,2,3,3]) => false
detect([1,2,3,4]) => false
Thanks.
Edit:
add these two cases
detect([1,2,2,1]) => true
detect([1,2,1,1]) => true
true means any non-consecutive, duplicate occur.
This should do it:
List list = ["a", "b", "c", "a", "d", "c", "a"]
and
list.countBy{it}.grep{it.value > 1}.collect{it.key}
In case you need to obtain duplicate elements:
def nonUniqueElements = {list ->
list.findAll{a -> list.findAll{b -> b == a}.size() > 1}.unique()
}
assert nonUniqueElements(['a', 'b', 'b', 'c', 'd', 'c']) == ['b', 'c']
To determine whether a collection contains non-unique items (your first two examples), you can do something like this:
def a = [1, 2, 3, 1]
boolean nonUnique = a.clone().unique().size() != a.size()
(Note that unique() modifies the list).
Meanwhile, Collection.unique() seems to do what you asked as far as 'grouping' items (your last three examples).
Edit: unique() works properly regardless of whether the collection is sorted.
You should be able to metaClass list and add your own detect method as below:
List.metaClass.detect = {
def rslt = delegate.inject([]){ ret, elem ->
ret << (ret && ret.last() != elem ? elem : !ret ? elem : 'Dup')
}
return (!rslt.contains('Dup') && rslt != rslt.unique(false))
}
assert [1,2,3,1].detect() == true //Non-consecutive Dups 1
assert [1,2,3,2].detect() == true //Non-consecutive Dups 2
assert [1,1,2,3].detect() == false //Consecutive Dups 1
assert [1,2,2,3,3].detect() == false //Consecutive Dups 2 and 3
assert [1,2,3,4].detect() == false //Unique no dups
To know if it has duplicates:
stringList.size() == stringList.toSet().size() // if true, it has no duplicates
To know which values are duplicated, you can do something like this:
class ListUtils {
static List<String> getDuplicates(List<String> completeList) {
List<String> duplicates = []
Set<String> nonDuplicates = new HashSet<>()
for (String string in completeList) {
boolean addded = nonDuplicates.add(string)
if (!addded) {
duplicates << string
}
}
return duplicates
}
}
And here its Spock test case:
import spock.lang.Specification
class ListUtilsSpec extends Specification {
def "getDuplicates"() {
when:
List<String> duplicates = ListUtils.getDuplicates(["a", "b", "c", "a"])
then:
duplicates == ["a"]
}
}

Resources