Elasticsearch DSL, store fields and dictionary - python-3.x

I have some fields with known mapping and some unknown, I want to store them.
Mapping:
class MyDoctype(DocType):
...
known_field = String(index='not_analyzed')
...
unknown_dict = Nested() # How can I store this dict ???
This should be possible as ElasticSearch 2.x can handle this mixed mapping.
Is ES dsl based on strict mappings behind the scene ?
I also looked at the persistence docs but it seems to rely on strong mappings everywhere.

You can use Object.
tested on Elasticsearch 6.x, Elasticsearch-dsl 6.x
from elasticsearch_dsl import DocType, Object
class MyDoctype(DocType):
...
known_field = String(index='not_analyzed')
...
unknown_dict = Object()

Related

JOOQ generator: use existing enums

I am using nu.studer.jooq gradle plugin to generate pojos, tables and records for a PostgreSQL database with tables that have fields of type ENUM.
We already have the enums in the application, so I would like that the generator uses those enums instead of generating new ones.
I defined in build.gradle for the generator: udts = false, so it doesn't generate the enums, and I wrote a custom generator strategy that sets the package for the enums to be the one of the already existing enums.
I have an issue in the generated table fields, the SQLDataType.VARCHAR.asEnumDataType(mypackage.ExistingEnum) doesn't work because the mypackage.ExistingEnum does not implement org.jooq.EnumType.
public enum ExistingEnum {
VAL1, VAL2
}
Generated table record:
public class EntryTable extends TableImpl<EntryRecord> {
public final TableField<EntryRecord, ExistingEnum> MY_FIELD = createField(DSL.name("my_field"), SQLDataType.VARCHAR.asEnumDataType(mypackage.ExistingEnum.class), this, "");
}
Is there something I can do to fix this issue? Also we have a lot of enums, so writing a converter for each of them is not suitable.
The point of having custom enum types is that they are individual types, independent of whatever you encode with your database enum types. As such, the jOOQ code generator cannot make any automated assumptions related to how to map the generated types to the custom types. You'll have to implement Converter types of some sort.
If you're not relying on the jOOQ provided EnumType types, you could use the <enumConverter/> configuration, or write implementations based on org.jooq.impl.EnumConverter, which help reduce boilerplate code.
If you have some conventions or rules how to map things a bit more automatically (just because jOOQ doesn't know your convention doesn't mean you don't know it either), you could implement a programmatic code generation configuration, where you query your dictionary views (e.g. PG_CATALOG.PG_ENUM) to generate ForcedType objects. You can even use jOOQ-meta for that purpose.

RowType support in Presto

Question for those who knows Presto API for plugins.
I implement BigQuery plugin. BigQuery supports struct type, which could be represented as RowType class in Presto.
RowType creates RowBlockBuilder in RowType::createBlockBuilder, which has RowBlockBuilder::appendStructure method, which requires to accept only instance of AbstractSingleRowBlock class.
This means that in my implementation of Presto's RecordCursor BigQueryRecordCursor::getObject method I had to return something that is AbstractSingleRowBlock for field which has type RowType.
But AbstractSingleRowBlock has package private abstract method, which prevents me from implementing this class. The only child SingleRowBlock has package private constructor, and there are no factories or builders that could build an instance for me.
How to implement struct support in BigQueryRecordCursor::getObject?
(reminding: BigQueryRecordCursor is a child of RecordCursor).
You need to assemble the block for the row by calling beginBlockEntry, appending the values for each column via Type.writeXXX with the column's type and then closeEntry. Here's some pseudo-code.
BlockBuilder builder = type.createBlockBuilder(..);
builder = builder.beginBlockEntry();
for each column {
...
columnType.writeXXX(builder, ...);
}
builder.closeEntry();
return (Block) type.getObject(builder, 0);
However, I suggest you use the columnar APIs, instead (i.e., ConnectorPageSource and friends). Take a look at how the Elasticsearch connector implements it:
https://github.com/prestosql/presto/blob/master/presto-elasticsearch/src/main/java/io/prestosql/elasticsearch/ElasticsearchPageSourceProvider.java
https://github.com/prestosql/presto/blob/master/presto-elasticsearch/src/main/java/io/prestosql/elasticsearch/ElasticsearchPageSource.java
Here's how it handles Row types:
https://github.com/prestosql/presto/blob/master/presto-elasticsearch/src/main/java/io/prestosql/elasticsearch/decoders/RowDecoder.java
Also, I suggest you join #dev channel on the Presto Community Slack, where all the Presto developers hang out.

Is use of enums justified in this case?

So I used to maintain configurations as dicts in the past and then stumbled over enums in python.
The following is what I used to do before:
CONFIG = {
"field1": {"field11": "value11", ....},
"field2": {"field12": "value22", .....},
}
This would be a global and would contain some configuration that my application would use.
I then converted the same using enums are follows:
from enum import Enum, unique
#unique
class Config(Enum):
field1 = {"field11": "value11", .....}
field2 = {"field22": "value22", .....}
The benefit of using enums was quite hazy at first but when I dug deep, I found out enums are immutable, one can enforce uniqueness and it offers a cleaner way to iterate across its members.
I checked if this was used in any of the python third party or standard libraries. I found out that majority of them were using a class as follows:
class Config:
field1 = {"field11": "value11", .....}
field2 = {"field22": "value22", .....}
So my question is, is enums a good choice to hold configs which shouldn't be accidentally changed or its just overkill and one can get away with using a class instead?
Would like to know which one is considered as the best practise.
The main advantage of using enum in your question is that it allows the writing of symbolic constants in the code, whereas using dictionary you'd have have to check the dictionary for the key, e.g:
Config.field1
versus
Config["field1"]
So the difference would be advantage in syntax, but also that enum is inherently immutable unlike dictionary, and also that enum can't be extended unlike class.

Why does spark mllib define DoubleParam, IntParam, FloatParam, etc?

In Spark params.scala, there are definitions for DoubleParam, IntParam, FloatParam, etc. I want to know why the developers define those classes?
As can be seen in the param.scala file, Param is a class used by Spark. There are two comments in the code explaining the need of primitive-typed params is needed to make them more Java friendly:
... Primitive-typed param should use the specialized versions, which are more friendly to Java users.
and:
// specialize primitive-typed params because Java doesn't recognize scala.Double, scala.Int, ...
Hence, Double, Int, Float, Long, Boolean and some Array types all ahve their own, specific implementation. These use the Java classes as can be seen here (for Array[Array[Double]]):
/** Creates a param pair with a `java.util.List` of values (for Java and Python). */
def w(value: java.util.List[java.util.List[java.lang.Double]]): ParamPair[Array[Array[Double]]] =
w(value.asScala.map(_.asScala.map(_.asInstanceOf[Double]).toArray).toArray)

Spring Data Mongo: mapping objects using Jackson annotations

I'm using spring-data Mongo (1.3.3) as a mechanism for accessing Mongo.
My domain objects are written in Groovy and I use Jackson annotations to define properties and names:
#JsonProperty('is_author')
boolean author = false
#JsonProperty('author_info')
AuthorInfo authorInfo
When I persist one of my domain objects to Mongo, the JsonProperty annotation is ignored and the field is persisted using the standard object's field name.
By digging in the Spring Data Mongo documentation, I found out that the library expects a #Field annotation to modify the actual field's name in Mongo.
Is there a way to use only the Jackson annotations instead of using two annotations to achieve the same results. Maybe a "customized" version of MappingMongoConverter?
Since my application is in Groovy, I have used the new #AnnotationCollectorAST Transformation (http://blog.andresteingress.com/2013/01/25/groovy-2-1-the-annotationcollector-annotation/) to "merge" the Jackson and the Spring Data Mongo annotations. Here is how it looks like: simple and effective!
package com.someapp
import com.fasterxml.jackson.annotation.JsonProperty
import groovy.transform.AnnotationCollector
import org.springframework.data.mongodb.core.mapping.Field
#AnnotationCollector([Field, JsonProperty])
public #interface JsonMongoProperty {}
And here is how it is used:
#JsonMongoProperty('is_author')
boolean author = false
#JsonMongoProperty('author_info')
AuthorInfo authorInfo

Resources