Why does spark mllib define DoubleParam, IntParam, FloatParam, etc? - apache-spark

In Spark params.scala, there are definitions for DoubleParam, IntParam, FloatParam, etc. I want to know why the developers define those classes?

As can be seen in the param.scala file, Param is a class used by Spark. There are two comments in the code explaining the need of primitive-typed params is needed to make them more Java friendly:
... Primitive-typed param should use the specialized versions, which are more friendly to Java users.
and:
// specialize primitive-typed params because Java doesn't recognize scala.Double, scala.Int, ...
Hence, Double, Int, Float, Long, Boolean and some Array types all ahve their own, specific implementation. These use the Java classes as can be seen here (for Array[Array[Double]]):
/** Creates a param pair with a `java.util.List` of values (for Java and Python). */
def w(value: java.util.List[java.util.List[java.lang.Double]]): ParamPair[Array[Array[Double]]] =
w(value.asScala.map(_.asScala.map(_.asInstanceOf[Double]).toArray).toArray)

Related

JOOQ generator: use existing enums

I am using nu.studer.jooq gradle plugin to generate pojos, tables and records for a PostgreSQL database with tables that have fields of type ENUM.
We already have the enums in the application, so I would like that the generator uses those enums instead of generating new ones.
I defined in build.gradle for the generator: udts = false, so it doesn't generate the enums, and I wrote a custom generator strategy that sets the package for the enums to be the one of the already existing enums.
I have an issue in the generated table fields, the SQLDataType.VARCHAR.asEnumDataType(mypackage.ExistingEnum) doesn't work because the mypackage.ExistingEnum does not implement org.jooq.EnumType.
public enum ExistingEnum {
VAL1, VAL2
}
Generated table record:
public class EntryTable extends TableImpl<EntryRecord> {
public final TableField<EntryRecord, ExistingEnum> MY_FIELD = createField(DSL.name("my_field"), SQLDataType.VARCHAR.asEnumDataType(mypackage.ExistingEnum.class), this, "");
}
Is there something I can do to fix this issue? Also we have a lot of enums, so writing a converter for each of them is not suitable.
The point of having custom enum types is that they are individual types, independent of whatever you encode with your database enum types. As such, the jOOQ code generator cannot make any automated assumptions related to how to map the generated types to the custom types. You'll have to implement Converter types of some sort.
If you're not relying on the jOOQ provided EnumType types, you could use the <enumConverter/> configuration, or write implementations based on org.jooq.impl.EnumConverter, which help reduce boilerplate code.
If you have some conventions or rules how to map things a bit more automatically (just because jOOQ doesn't know your convention doesn't mean you don't know it either), you could implement a programmatic code generation configuration, where you query your dictionary views (e.g. PG_CATALOG.PG_ENUM) to generate ForcedType objects. You can even use jOOQ-meta for that purpose.

RowType support in Presto

Question for those who knows Presto API for plugins.
I implement BigQuery plugin. BigQuery supports struct type, which could be represented as RowType class in Presto.
RowType creates RowBlockBuilder in RowType::createBlockBuilder, which has RowBlockBuilder::appendStructure method, which requires to accept only instance of AbstractSingleRowBlock class.
This means that in my implementation of Presto's RecordCursor BigQueryRecordCursor::getObject method I had to return something that is AbstractSingleRowBlock for field which has type RowType.
But AbstractSingleRowBlock has package private abstract method, which prevents me from implementing this class. The only child SingleRowBlock has package private constructor, and there are no factories or builders that could build an instance for me.
How to implement struct support in BigQueryRecordCursor::getObject?
(reminding: BigQueryRecordCursor is a child of RecordCursor).
You need to assemble the block for the row by calling beginBlockEntry, appending the values for each column via Type.writeXXX with the column's type and then closeEntry. Here's some pseudo-code.
BlockBuilder builder = type.createBlockBuilder(..);
builder = builder.beginBlockEntry();
for each column {
...
columnType.writeXXX(builder, ...);
}
builder.closeEntry();
return (Block) type.getObject(builder, 0);
However, I suggest you use the columnar APIs, instead (i.e., ConnectorPageSource and friends). Take a look at how the Elasticsearch connector implements it:
https://github.com/prestosql/presto/blob/master/presto-elasticsearch/src/main/java/io/prestosql/elasticsearch/ElasticsearchPageSourceProvider.java
https://github.com/prestosql/presto/blob/master/presto-elasticsearch/src/main/java/io/prestosql/elasticsearch/ElasticsearchPageSource.java
Here's how it handles Row types:
https://github.com/prestosql/presto/blob/master/presto-elasticsearch/src/main/java/io/prestosql/elasticsearch/decoders/RowDecoder.java
Also, I suggest you join #dev channel on the Presto Community Slack, where all the Presto developers hang out.

SOAPUI context variables - How does Groovy make this possible?

Sorry to all you Groovy dudes if this is a bit of a noob question.
In SOAPUI, i can create a Groovy script where i can define an arbitrary variable to the run context to retrieve at a later time.
context.previouslyUndefinedVariable = 3
def num = context.previouslyUndefinedVariable
What feature of Groovy allows previously undefined variables to be added to an object like this? I would like to learn more about it.
Many thanks in advance!
Groovy has the ability to dynamically add methods to a class through metaprogramming.
To learn more, see:
What is Groovy's MetaClass used for?
Groovy Goodness: Add Methods Dynamically to Classes with ExpandoMetaClass
Runtime and compile-time metaprogramming
The accepted answer is a bit of a poor explanation for how SoapUI is doing it.
In this case, context is always an instance of some SoapUI library java class (such as WsdlTestRunContext), and these are all implementations of Map. You can check context.getClass() and assert context in Map.
When you look up a property on a Map, Groovy uses the getAt and putAt methods. There are various syntaxes you can use. All of these are equivalent:
context.someUndef
context.'someUndef'
context[someUndef]
context['someUndef']
context.getAt('someUndef')
And
context.someUndef = 3
context.'someUndef' = 3
context[someUndef] = 3
context['someUndef'] = 3
context.putAt('someUndef', 3)
I like to use any of the above that include quote marks, so that Groovy-Eclipse doesn't flag it as a missing property.
It's also interesting that Groovy looks for a getAt() method before it checks for a get method being referred to as a property.
For example, consider evaluating "foo".class. The String instance doesn't have a property called class and it also doesn't have a method getAt(String), so the next thing it tries is to look for a "get" method with that name, i.e. getClass(), which it finds, and we get our result: String.
But with a map, ['class':'bar'].class refers to the method call getAt('class') first, which will be 'bar'. If we want to know what type of Map it is, we have to be more specific and write in full: ['class':'bar'].getClass() which will be LinkedHashMap.
We still have to specify getClass() even if that Map doesn't have a matching key, because ['foo':'bar'].class will still mean ['foo':'bar'].getAt('class'), which will be null.

In Groovy, can I override java-style casting syntax on POJO classes?

I would like to be able to use plain java-style implicit/explicit casting instead of asType overrides so that sources written in Java work properly. I've overridden asType on String similarly to the approach suggested in How to overload some Groovy Type conversion for avoiding try/catch of NumberFormatException? like:
oldAsType = String.metaClass.getMetaMethod("asType", [Class] as Class[])
String.metaClass.asType = {Class typ ->
if (Foo.class.isAssignableFrom(typ)) {
Foo.myCast(delegate)
} else {
oldAsType.invoke(delegate,typ)
}
}
I'd like all of these options to work:
// groovy
String barString
Foo foo = barString asType(Foo.class) // asType works but
Foo foo = barString // implicit cast fails
Foo foo = (Foo) barString // explicit cast fails
The latter two fail because groovy is using DefaultTypeTransformation.castToType, which doesn't attempt to invoke new Foo() unless the object to be cast is either one of a slew of special cases or is some sort of Collection type.
Note that the solution Can I override cast operator in Groovy? doesn't solve the issue because the code that is doing the casting is regular Java code that I cannot alter, at least not at the source code level. I'm hoping that there is either a secret hook into casting or a way to override the static castToType method (in a Java class, called by another Java class - which Can you use Groovy meta programming to override a private method on a Java class says is unsupported)... or some other clever approach I haven't thought of.
Edit: The question is about using Java-style casting syntax, essentially to use groovy facilities to add an autoboxing method. Groovy calls this mechanism "casting," for better or worse (see DefaultTypeTransformation.castToType as referenced above). In particular, I have replaced an enum with a resourced class and want to retain JSON serialization. Groovy's JSON package automatically un/marshals enum values of instance members to strings and I'm trying to make the replacement class serialize compatibly with a minimal changes to the source code.
Part of the problem here is you are confusing conversion with casting. Using the "as" operator is not the same thing as imposing a cast. They seem similar, but they serve separate purposes.
Foo foo = (Foo) barString
That doesn't say something like "create a Foo out of barString". That says "Declare a reference named foo, associate the static type Foo with that reference and then point that reference at the object on the heap that the reference barString currently points to.". Unlike languages like C++, Groovy and Java do not allow you to ever get in a situation where a reference points at an object that is of a type that is incompatible with the reference's type. If you ever got into a situation where a Foo reference was pointing to a String on the heap, that would represent a bug in the JVM. It cannot be done. You can come up with ways to create Foo objects out of String objects, but that isn't what the code above is about.
The answer appears to be "no". Absent a rewrite of the DefaultTypeTransformation.castToType to allow for this sort of metaprogramming, the implication is to use another implementation strategy or use a different language.

Storing object in Esent persistent dictionary gives: Not supported for SetColumn Parameter error

I am trying to save an Object which implements an Interface say IInterface.
private PersistentDictionary<string, IInterface> Object = new PersistentDictionary<string, IInterface>(Environment.CurrentDirectory + #"\Object");
Since many classes implement the same interface(all of which need to cached), for a generic approach I want to store an Object of type IInterface in the dictionary.
So that anywhere I can pull out that object type cast it as IInterface and use that object's internal implementation of methods etc..
But, as soon as the Esent cache is initialized it throws this error:
Not supported for SetColumn
Parameter name: TColumn
Actual value was IInterface.
I have tried to not use XmlSerializer to do the same but is unable to deserialize an Interface type.Also, [Serializable] attribute cannot be used on top of a Interface, so I am stuck.
I have also tried to make all the implementations(classes) of the Interface as [Serializable] as a dying attempt but to no use.
Does any one know a way out ? Thanks in advance !!!
The only reason that only structs are supported (as well as some basic immutable classes such as string) is that the PersistentDictionary is meant to be a drop-in replacement for Dictionary, SortedDictionary and other similar classes.
Suppose I have the following code:
class MyClass
{
int val;
}
.
.
.
var dict = new Dictionary<int,MyClass>();
var x = new MyClass();
x.val = 1;
dict.Add(0,x);
x.val = 2;
var y = dict[0];
Console.WriteLine(y.val);
The output in this case would be 2. But if I'd used the PersistentDictionary instead of the regular one, the output would be 1. The class was created with value 1, and then changed after it was added to the dictionary. Since a class is a reference type, when we retrieve the item from the dictionary, we will also have the changed data.
Since the PersistentDictionary writes the data to disk, it cannot really handle reference types this way. Serializing it, and writing it to disk is essentially the same as treating the object as a value type (an entire copy is made).
Because it's intended to be used instead of the standard dictionaries, and the fact that it cannot handle reference types with complete transparency, the developers instead opted to support only structs, because structs are value types already.
However, if you're aware of this limitation and promise to be careful not to fall into this trap, you can allow it to serialize classes quite easily. Just download the source code and compile your own version of the EsentCollections library. The only change you need to make to it is to change this line:
if (!(type.IsValueType && type.IsSerializable))
to this:
if (!type.IsSerializable)
This will allow classes to be written to the PersistentDictionary as well, provided that it's Serializable, and its members are Serializable as well. A huge benefit is that it will also allow you to store arrays in there this way. All you have to keep in mind is that it's not a real dictionary, therefore when you write an object to it, it will store a copy of the object. Therefore, updating any of your object's members after adding them to the PersistentDictionary will not update the copy in the dictionary automatically as well, you'd need to remember to update it manually.
PersistentDictionary can only store value-structs and a very limited subset of classes (string, Uri, IPAddress). Take a look at ColumnConverter.cs, at private static bool IsSerializable(Type type) for the full restrictions. You'd be hitting the typeinfo.IsValueType() restriction.
By the way, you can also try posting questions about PersistentDictionary at http://managedesent.codeplex.com/discussions .
-martin

Resources