Nested struct in a struct - apache-spark

Given some rows coming from a SQL data source with an schema like...
| A | B | C | D | E | F |
... I'd like to transform it into:
{
A: {
invented: { B, C }
D,
E
F
}
}
AFAIK, dataFrame.withColumn won't let me implement such transformation (it doesn't support nesting a struct into a first-level struct)
Is my goal ever possible?

I think that following code should work (if I understood correctly your question):
df
.withColumn("nested_struct", struct(
col("A"),
struct(
col("B"),
struct(
col("C"),
struct(col("E"), col("F"))
),
col("D")
)
)
)

First of all, thanks to #partlov and his answer. Actually, when I first posted my question, I forgot to mention that some nested struct had to own an unexistent column.
That said, the issue was very easy to resolve.
My first attemp was:
dataFrame.WithColumn("invented",
Struct
(
Struct("invented2", "A"),
))
But this was throwing an exceptions: Spark complained "could not resolve 'invented'" because invented isn't in the schema.
Then I realized that I could try to don't provide "invented" at all. And it worked, but Spark created col1. Finally, I tried to alias col1, and it has solved the issue!
dataFrame.WithColumn("invented",
Struct
(
Struct("invented2", "A").As("X"),
));
Note: above sample is C# code, and I'm using .NET for Spark. Anyway, it should work the same way in Scala, Python, Java, R...

Related

Understanding Rust tracing info(target: "") macro and types

I've been enjoying navigating Rust through its type system. But when it goes into macros I find it difficult to follow. In the below example, why is it ok to pass "target_name" to target but not assign it then pass the assignment in? How do you navigate the macro in tracing such that the below is obvious to you? I am asking this as much from a developer experience perspective as a programmer. (I'm definitely looking for a "teach a man to fish" style answer.)
info!(target: "target_name", "message"); // fine, must be cast to &str?
let target_name = "target_name"; // must be cast to String?
info!(target: target_name, "message"); // not fine
The latter call results in:
error[E0435]: attempt to use a non-constant value in a constant
|
44 | info!(target: target_name, "message");
| ^^^^^^^^^^^ non-constant value
Even if I switch to &target_name.as_str() which I believe should be constant (not growable like String) the macro still fails with the same error. This is where my mental map is failing. I can understand that the assumed type when assigning is wrong, but then when I recast it, why would it fail?
The solution here is to use a const with a type that's compatible with expectations, like:
const target_name : &str = "target_name";
You can usually view the source for these macros in the documentation, as shown here:
#[macro_export(local_inner_macros)]
macro_rules! info {
(target: $target:expr, $($arg:tt)+) => (
log!(target: $target, $crate::Level::Info, $($arg)+)
);
($($arg:tt)+) => (
log!($crate::Level::Info, $($arg)+)
)
}
That's just a wrapper around log!, so it's not especially informative, and log! is just a wrapper around __private_api_log which is even less helpful.

sig literals in Alloy

How can I write out a literal for a sig in Alloy? Consider the example below.
sig Foo { a: Int }
fact { #Foo = 1 }
If I execute this, I get
| this/Foo | a |
|----------|---|
| Foo⁰ | 7 |
In the evaluator, I know I can get a reference to the Foo instance with Foo$0 but how can I write a literal that represents the same value?
I've tried {a: 7}, but this is not equal to Foo$0. This is intentionally a trivial example, but I'm debugging a more complex model and I need to be able to write out literals of sigs with multiple fields.
Ah, this is one of the well hidden secrets! :-) Clearly in your model you cannot refer to atoms since the model is defining all possible values of those atoms. However, quite often you need your hands on some atom to reason about it. That is, you want to be able to name some objects.
The best way to get 'constants' is to create a predicate you call from a run clause. In this predicate, you define names for atoms you want to discuss. You only have to make sure this predicate is true.
pred collision[ car1, car2 : Car, road : Road ] {
// here you can reason about car1 and car2
}
run collision for 10
Another way is to create a quantification whenever you need to have some named objects:
run {
some car1, car2 : Car, road : Road {
// here you can reason about car1 and car2 and road
}
} for 10
There was a recent discussion to add these kinds of instances to the language so that Kodkod could take advantage of them. (It would allow faster solving and it is extremely useful for test cases of your model.) However, during a discussion this solution I presented came forward and it does not require any new syntax.
try to put a limitation for 'Integer' in the 'run' command. I mean :
sig Foo {a : Int}
fact{ #Foo = 1}
pred show {}
run show for 1 Foo, 2 Int

Relational override on 'objects'?

I have a signature
sig Test {
a: Int,
b: Int,
c: Int
}
If I have two instances (atoms?) of this ( x,y:Test )
can I define a relation between these where only some parameters has changed without having to list all the other parameters as equal?
I want to avoid having to list all unchanged fields
as this can be error-prone assuming I have many fields.
Currently I am using x.(a+b+c) = y.(a+next[b]+c) but would like to use something like x = y ++ (b->next[y.b])
from what I understand about Alloy I think the answer is No: you cannot talk about all relations where some atom is involved in without explicitly naming these relations. But some experts may correct me if I'm wrong.

Lazy evaluation of chained functional methods in Groovy

What I've seen in Java
Java 8 allows lazy evaluation of chained functions in order to avoid performance penalties.
For instance, I can have a list of values and process it like this:
someList.stream()
.filter( v -> v > 0)
.map( v -> v * 4)
.filter( v -> v < 100)
.findFirst();
I pass a number of closures to the methods called on a stream to process the values in a collection and then only grab the first one.
This looks as if the code had to iterate over the entire collection, filter it, then iterate over the entire result and apply some logic, then filter the whole result again and finally grab just a single element.
In reality, the compiler handles this in a smarter way and optimizes the number of iterations required.
This is possible because no actual processing is done until findFirst is called. This way the compiler knows what I want to achieve and it can figure out how to do it in an efficient manner.
Take a look at this video of a presentation by Venkat Subramaniam for a longer explanation.
What I'd like to do in Groovy
While answering a question about Groovy here on StackOverflow I figured out a way to perform the task the OP was trying to achieve in a more readable manner. I refrained from suggesting it because it meant a performance decrease.
Here's the example:
collectionOfSomeStrings.inject([]) { list, conf -> if (conf.contains('homepage')) { list } else { list << conf.trim() } }
Semantically, this could be rewritten as
collectionOfSomeStrings.grep{ !it.contains('homepage')}.collect{ it.trim() }
I find it easier to understand but the readability comes at a price. This code requires a pass of the original collection and another iteration over the result of grep. This is less than ideal.
It doesn't look like the GDK's grep, collect and findAll methods are lazily evaluated like the methods in Java 8's streams API. Is there any way to have them behave like this? Is there any alternative library in Groovy that I could use?
I imagine it might be possible to use Java 8 somehow in Groovy and have this functionality. I'd welcome an explanation on the details but ideally, I'd like to be able to do that with older versions of Java.
I found a way to combine closures but it's not really what I want to do. I'd like to chain not only closures themselves but also the functions I pass them to.
Googling for Groovy and Streams mostly yields I/O related results. I haven't found anything of interest by searching for lazy evaluation, functional and Groovy as well.
Adding the suggestion as an answer taking cfrick's comment as an example:
#Grab( 'com.bloidonia:groovy-stream:0.8.1' )
import groovy.stream.Stream
List integers = [ -1, 1, 2, 3, 4 ]
//.first() or .last() whatever is needed
Stream.from integers filter{ it > 0 } map{ it * 4 } filter{ it < 15 }.collect()
Tim, I still know what you did few summers ago. ;-)
Groovy 2.3 supports jdk8 groovy.codehaus.org/Groovy+2.3+release+notes. your example works fine using groovy closures:
[-1,1,2,3,4].stream().filter{it>0}.map{it*4}.filter{it < 100}.findFirst().get()
If you can't use jdk8, you can follow the suggestion from the other answer or achieve "the same" using RxJava/RxGroovy:
#Grab('com.netflix.rxjava:rxjava-groovy:0.20.7')
import rx.Observable
Observable.from( [-1, 1, 2, 3, 4, 666] )
.filter { println "f1 $it"; it > 0 }
.map { println "m1 $it"; it * 4 }
.filter { println "f2 $it"; it < 100 }
.subscribe { println "result $it" }

Scala String format named parameters (Winner: Ugliest Code)

I came up with a trick to use named parameters in Scala. Is there a better way? What are the downsides?
<x>
|CREATE OR REPLACE FUNCTION myFunction({columns.map(column => column.name).
mkString(",\n")})
|RETURNS BOOLEAN AS $$
|BEGIN
| -- more stuff
|END;
|$$ LANGUAGE 'plpgsql';
|</x>.text.stripMargin
Watch out for ampersands in the XML body; they need to be "quoted" as & or placed in braces like {"&"}. Do I win a prize for ugliest code? :-)
I think that if you need a string formater on this scale, you need a Builder or a templating engine, like Velocity. Incidentally, I've found Scala's good for builders and DSLs.
If you don't mind a compiler plugin, try Johannes Rudolph's Scala Enhanced Strings. I like it a lot.
Good news! Scala 2.10.0 introduced real, functional string interpolation!
The docs are available here: http://docs.scala-lang.org/overviews/core/string-interpolation.html
Here's a quick sample:
In Python, I used to do things like:
print "%(from)s -> %(to)s" % {"from": foo, "to": bar}
now, in Scala 2.10.0+, we can do this!
val from = "Foo"
val to = 256
println(s"$from -> $to") // Prints: Foo -> 256
There's also some format string support as well, which is pretty awesome:
val from = 10.00 // USD
val to = 984.30 // JPY
println(f"$$$from%.2f -> $to%.2fJPY") // Prints: $10.00 -> 984.30JPY
Since the second example has some minimal type expressiveness, it also gives us some basic type checking as well!
val from = 10.00
println(f"$$$from%d") // <-- Type error! Found "Double", required "Int"!

Resources