Fsharp.Data: Optional Elements in XML - f#-data

Within a sample xml file for the F#-Data type provider, I have elements that are optional, like so:
<RootElement>
<MandatoryElement> ... </MandatoryElement>
<OptionalElement> ... </OptionalElement>
<AnotherElement> ... </AnotherElement>
</RootElement>
I don't know how to specify the OptionalElement as optional. There can be only one RootElement so I cannot add another one lacking the OptionalElement. How can I tell the parser, that OptionalElement is actually optional?

The XML Type Provider works by inferring the type from the sample. You can provide more than one sample by using the optional SampleIsList argument:
open FSharp.Data
type RootElement = XmlProvider<"""
<samples>
<RootElement>
<MandatoryElement> ... </MandatoryElement>
<OptionalElement> ... </OptionalElement>
<AnotherElement> ... </AnotherElement>
</RootElement>
<RootElement>
<MandatoryElement> ... </MandatoryElement>
<AnotherElement> ... </AnotherElement>
</RootElement>
</samples>""", SampleIsList = true>
From this list of samples, the XML Type Provder infers that OptionalElement is, well... optional, and types it as a string option:
let x = RootElement.Parse """
<RootElement>
<MandatoryElement> ... </MandatoryElement>
<OptionalElement> ... </OptionalElement>
<AnotherElement> ... </AnotherElement>
</RootElement>"""
let y = RootElement.Parse """
<RootElement>
<MandatoryElement> ... </MandatoryElement>
<AnotherElement> ... </AnotherElement>
</RootElement>"""
Usage:
> y.OptionalElement.IsSome;;
val it : bool = false
> x.OptionalElement.IsSome;;
val it : bool = true
> x.OptionalElement |> Option.get;;
val it : string = " ... "

There is no way to explicitly specify "optional" within XML itself. The XML type provider infers this if it sees the element in some places, but not others, but that is only an educated guess.
For explicitly, strictly specifying which elements are optional, which can be multiple, etc., we have something called "XML Schema", also known as "XSD". Unfortunately, the XML type provider does not support XSD at the moment, although there is an open issue for it.
One hack I can offer you is this: make your root element nested under another, "super-root" element, and then make two of the "real root" ones, which will let the type provider infer the optional-ness. The type provider will then generate a "super-root" type for you, which you can promptly disregard, and only use the nested one, the real root.
Of course, since XML Type Provider, sadly, doesn't support parsing non-root elements, you will also have to "wrap" the XML text in the "super-root" element every time you parse it, which limits the solution to small documents only.
type Xml = XmlProvider<"""
<SuperRoot>
<RootElement>
<MandatoryElement> ... </MandatoryElement>
<OptionalElement> ... </OptionalElement>
<AnotherElement> ... </AnotherElement>
</RootElement>
<RootElement>
<MandatoryElement> ... </MandatoryElement>
<AnotherElement> ... </AnotherElement>
</RootElement>
</SuperRoot>
""">
let parse xml = (Xml.Parse ("<SuperRoot>" + xml + "</SuperRoot>")).RootElements.[0]

Related

Jooq XML Database generation

I am manually defining a Database XML schema to use the Jooq capabilities to generate the corresponding code from the definition.
I am using Gradle to generate the code with Jooq:
jooq {
version = '3.13.5'
edition = nu.studer.gradle.jooq.JooqEdition.OSS
configurations {
crate {
generationTool {
logging = org.jooq.meta.jaxb.Logging.INFO
generator {
database {
name = 'org.jooq.meta.xml.XMLDatabase'
properties {
property {
key = 'dialect'
value = 'POSTGRES'
}
property {
key = 'xmlFile'
value = 'src/main/resources/crate_information_schema.xml'
}
}
}
target {
packageName = 'it.fox.crate'
directory = 'src/generated/crate'
}
strategy.name = "it.fox.generator.CrateGenerationStrategy"
}
}
}
}
}
and this is the XML file crate_information_schema.xml I am referencing:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<information_schema xmlns="http://www.jooq.org/xsd/jooq-meta-3.14.0.xsd">
<schemata>
<schema>
<catalog_name></catalog_name>
<schema_name>doc</schema_name>
<comment></comment>
</schema>
</schemata>
<tables>
<table>
<table_catalog></table_catalog>
<table_schema>doc</table_schema>
<table_name>events</table_name>
<table_type>BASE TABLE</table_type>
<comment></comment>
</table>
</tables>
<columns>
<column>
<table_catalog></table_catalog>
<table_schema>doc</table_schema>
<table_name>events</table_name>
<column_name>data_block['angularPositionArray']</column_name>
<data_type>real_array</data_type>
<character_maximum_length>0</character_maximum_length>
<numeric_precision>19</numeric_precision>
<numeric_scale>0</numeric_scale>
<ordinal_position>1</ordinal_position>
<is_nullable>false</is_nullable>
<comment>angularPositionArray</comment>
</column>
<column>
<table_catalog></table_catalog>
<table_schema>doc</table_schema>
<table_name>events</table_name>
<column_name>data_block['eventId']</column_name>
<data_type>bigint(20)</data_type>
<character_maximum_length>0</character_maximum_length>
<numeric_precision>19</numeric_precision>
<numeric_scale>0</numeric_scale>
<ordinal_position>1</ordinal_position>
<is_nullable>false</is_nullable>
<comment>eventId</comment>
</column>
</columns>
</information_schema>
The code generated is not good, because it indicate the Data Type used is unknown:
/**
* #deprecated Unknown data type. Please define an explicit {#link org.jooq.Binding} to specify how this type should be handled. Deprecation can be turned off using {#literal <deprecationOnUnknownTypes/>} in your code generator configuration.
*/
#java.lang.Deprecated
public final TableField<EventsRecord, Object> angularPositionArray = createField(DSL.name("data_block['angularPositionArray']"), org.jooq.impl.DefaultDataType.getDefaultDataType("\"real_array\"").nullable(false), this, "angularPositionArray");
I have a couple of questions:
which is the correct data type for Real Array?
where is the list of supported data type with the keys to use in the XML?
N.B. CrateDB is an unsupported DataBase but Jooq could talk to the DB using the Postgres driver, the only problem is to create manually the schema.
which is the correct data type for Real Array?
Use <data_type>REAL ARRAY</data_type> (with a whitespace, and upper case, see comments and issue #12611)
where is the list of supported data type with the keys to use in the XML?
It's the same as for any other code generation data source: All the types in SQLDataType are supported. The convention around array types is currently undocumented, but any of HSQLDB's or PostgreSQL's notations should work. The feature request to formally support array types as user defined types via standard SQL INFORMATION_SCHEMA.ELEMENT_TYPES is here: https://github.com/jOOQ/jOOQ/issues/8090
N.B. CrateDB is an unsupported DataBase but Jooq could talk to the DB using the Postgres driver, the only problem is to create manually the schema.
You can obviously use the XMLDatabase for this. I'm guessing you cannot use the JDBCDatabase, because the INFORMATION_SCHEMA is too different, and the PG_CATALOG schema doesn't exist? However, you could easily implement your own org.jooq.meta.Database, too, if that makes more sense.

Groovy DSL given syntax validation

Actually I'm experimenting writing a DSL with groovy. So far ...
There are some things unclear to be regarding delegation and intercepting unwanted (Closure) structures:
first of all: How can I throw a (type of?) Exception to point to the correct line of code in the DSL that fails?
assuming
abstract MyScript extends Script {
def type(#DelegateTo(MyType) Closure cl) {
cl.delegate = new MyType()
cl()
this
}
}
under
new GroovyShell(this.class.classLoader, new CompilerConfiguration(scriptBaseClass: MyScript.name)).evaluate(…)
the passed DSL / closure
type {
foo: "bar"
}
passes silently.
I'm aware of, that foo: is just a POJ label but I'm not that sure what that defined Closure is interpreted as?
Neither did I found anything regarding the AST metaprogramming to get in touch of any defined labels to use them?
giving in
type {
foo = "bar"
}
it's clear that he will try to set the property foo, but do I really have to intercept unwanted fields/props by
class MyType {
def propertyMissing(String name) {
… // where I'm unable to println name since this leads to field access 'name' ...
}
}
while user is still allowed to pass
type {
foo "bar"
}
which leads to method not defined .. so I have to write additionally some metaClass.methodMissing or metaClass.invokeMethod stuff ..
meanwhile I tend to dismiss any closures in my dsl only working with simple
def type(Map vars) {
store << new MyType(vars)
// where in the constructor I was forced to write metaClass stuff to validate that only fields are given in the map that are defined in the class
}
that works, but both drafts are not what I expected to do when reading "groovy is so great for making DSLs" ...
I would experiment with the different options and then settle for one.
To guide your users you should give feedback similar to that of the regular compiler (i.e. line-number and column, maybe the expression).
Enforcing the correctness of the input can be non-trivial -- depending on your DSL.
For example:
type {
foo: "bar"
}
Is just a closure that returns the String bar. Is that something your user is supposed to do? The label will be part of the AST, AFAIK in org.codehaus.groovy.ast.stmt.Statement.statementLabels. If you want this syntax to assign something to foo then you'll need to rewrite the AST. The Expression could become a Declaration for the local Variable foo or could become an assignment for the Field foo. That's really up to you, however, Groovy gives you some capabilities that make creating a DSL easier:
You already used #DelegateTo(MyType) so you could just add a Field foo to MyType:
class MyType {
String foo
}
And then either use #CompileStatic or #TypeChecked to verify your script. Note that #CompileStatic will deactivate Run-time Metaprogramming (i.e. propertyMissing etc. won't be called anymore.) while #TypeChecked does not. This, however, will only verify Type-Correctness. That is: assigning to anything but a declared Field will fail and assigning an incompatible Type will fail. It does not verify that something has been assigned to foo at all. If this is required you can verify the contents of the delegate after calling the Closure.

Setting types of parsed values in Antlr

I have a rule that looks like this:
INTEGER : [0-9]+;
myFields : uno=INTEGER COMMA dos=INTEGER
Right now to access uno I need to code:
Integer i = Integer.parseInt(myFields.uno.getText())
It would be much cleaner if I could tell antler to do that conversion for me; then I would just need to code:
Integer i = myFields.uno
What are my options?
You could write the code as action, but it would still be explicit conversion (eventually). The parser (like every parser) parses the text and then it's up to "parsing events" (achieved by listener or visitor or actions in ANTLR4) to create meaningful structures/objects.
Of course you could extend some of the generated or built-in classes and then get the type directly, but as mentioned before, at some point you'll always need to convert text to some type needed.
A standard way of handling custom operations on tokens is to embed them in a custom token class:
public class MyToken extends CommonToken {
....
public Integer getInt() {
return Integer.parseInt(getText()); // TODO: error handling
}
}
Also create
public class MyTokenFactory extends TokenFactory { .... }
to source the custom tokens. Add the factory to the lexer using Lexer#setTokenFactory().
Within the custom TokenFactory, override the method
Symbol create(int type, String text); // (typically override both factory methods)
to construct and return a new MyToken.
Given that the signature includes the target token type type, custom type-specific token subclasses could be returned, each with their own custom methods.
Couple of issues with this, though. First, in practice, it is not typically needed: the assignment var is statically typed, so as in the the OP example,
options { TokenLabelType = "MyToken"; }
Integer i = myFields.uno.getInt(); // no cast required
If Integer is desired & expected, use getInt(). If Boolean ....
Second, ANTLR options allows setting a TokenLabelType to preclude the requirement to manually cast custom tokens. Use of only one token label type is supported. So, to use multiple token types, manual casting is required.

marshal JAXB generated classes without XmlRootElement with Apache camel

In order to marshal jaxb classes with Apache Camel the jaxb class needs to include a XmlRootElement annotation.
When generating jaxb classes from XSD the XmlRootElement annotation might not be generated.
This will lead to an Exception during marshalling
"No type converter available to convert from type: "
As soon as I add the #XmlRootElement manually, everything works fine, but since these Jaxb classes are generated, adding the anntotation manually is no option.
According to the Camel documentation in such a case, the JaxbDataFormat can be set to 'fragement(true)
JaxbDataFormat jaxbMarshal = new JaxbDataFormat();
jaxbMarshal.setContextPath(ObjectFactory.class.getPackage().getName());
jaxbMarshal.setFragment(true);
Unfortunately I still get the same exception.
Is there a way to configure JaxbDataFormat different, i.e. to define the JAXBElement which is the root element, like I would do in Java
marshaller.marshal( new JAXBElement( new QName("uri","local"),
MessageType.class, messageType ));
or is there another strategy available to get the XML marshalled?
EDIT
the used route :
from("file://inbox").unmarshal(jaxbDataFormat)
.marshal(jaxbDataFormat).to("file://outbox");
the stacktrace:
java.io.IOException: org.apache.camel.NoTypeConversionAvailableException: No type converter
available to convert from type: com.xyz.AddressType to the required
type: java.io.InputStream with value com.xyz.AddressType#32317e9d at
org.apache.camel.converter.jaxb.JaxbDataFormat.marshal(JaxbDataFormat.java:148)
~[camel-jaxb-2.16.0.jar:2.16.0] at
org.apache.camel.processor.MarshalProcessor.process(MarshalProcessor.java:83)
~[camel-core-2.16.0.jar:2.16.0] at
...
[na:1.8.0_25] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_25]
Caused by: org.apache.camel.NoTypeConversionAvailableException: No
type converter available to convert from type: com.xyz.AddressType to
the required type: java.io.InputStream with value
com.xyz.AddressType#32317e9d at
org.apache.camel.impl.converter.BaseTypeConverterRegistry.mandatoryConvertTo(BaseTypeConverterRegistry.java:185)
~[camel-core-2.16.0.jar:2.16.0] at
...
In Camel 2.17, the #XmlRootElement was not required. As of 2.21, it is. Unless...
The class org.apache.camel.converter.jaxb.FallBackTypeConverter changed it's implementation from:
protected <T> boolean isJaxbType(Class<T> type) {
return hasXmlRootElement(type) || JaxbHelper.getJaxbElementFactoryMethod(camelContext, type) != null;
}
To:
protected <T> boolean isJaxbType(Class<T> type) {
if (isObjectFactory()) {
return hasXmlRootElement(type) || JaxbHelper.getJaxbElementFactoryMethod(camelContext, type) != null;
} else {
return hasXmlRootElement(type);
}
}
By default the isObjectFactory() method returns false. If you set the property CamelJaxbObjectFactoryon your CamelContext to true. then the JaxbHelper.getJaxbElementFactoryMethod(camelContext, type) will return true and the deserialization works again as before without the need for an #XmlRootElement. For completeness:
<camelContext xmlns="http://camel.apache.org/schema/spring" id="camelContext">
<properties>
<property key="CamelJaxbObjectFactory" value="true"/>
</properties>
</camelContext>
I experienced the equivalent behaviour with JaxB (#XmlRootElement annotation not present in the generated class), and I suppose it comes from the way the root element is defined in the XML schema.
For example:
<xsd:element name="DiffReport" type="DiffReportType" />
<xsd:complexType name="DiffReportType">
...
</xsd:complexType>
it will generate you the DiffReportType class without the #XmlRootElement annotation. But if you directly define your root element as following, you'll get the annotation set in your generated class (the name of the root class is then DiffReport in my example).
<xsd:element name="DiffReport">
<xsd:complexType>
...
Note: I used the first way to define the complex types in my schema for class name consistency.
You can use the "partClass" option of the jaxb data format of camel. Your question is answered in the camel docs for jaxb, which describes how to marshall XML fragments (or XML generated without the XmlRootElement annotation).
Use partClass and provide the actual class name to which you wish to marshall. In case of marshalling you also have to provide the partNamespace which is the target namespace of the desired XML object.

How to define a CAS in database as external resource for an annotator in uimaFIT?

I am trying to structure my a data processing pipeline using uimaFit as follows:
[annotatorA] => [Consumer to dump annotatorA's annotations from CAS into DB]
[annotatorB (should take on annotatorA's annotations from DB as input)]=>[Consumer for annotatorB]
The driver code:
/* Step 0: Create a reader */
CollectionReader readerInstance= CollectionReaderFactory.createCollectionReader(
FilePathReader.class, typeSystem,
FilePathReader.PARAM_INPUT_FILE,"/path/to/file/to/be/processed");
/*Step1: Define Annotoator A*/
AnalysisEngineDescription annotatorAInstance=
AnalysisEngineFactory.createPrimitiveDescription(
annotatorADbConsumer.class, typeSystem,
annotatorADbConsumer.PARAM_DB_URL,"localhost",
annotatorADbConsumer.PARAM_DB_NAME,"xyz",
annotatorADbConsumer.PARAM_DB_USER_NAME,"name",
annotatorADbConsumer.PARAM_DB_USER_PWD,"pw");
builder.add(annotatorAInstance);
/* Step2: Define binding for annotatorB to take
what-annotator-a put in DB above as input */
/*Step 3: Define annotator B */
AnalysisEngineDescription annotatorBInstance =
AnalysisEngineFactory.createPrimitiveDescription(
GateDateTimeLengthAnnotator.class,typeSystem)
builder.add(annotatorBInstance);
/*Step 4: Run the pipeline*/
SimplePipeline.runPipeline(readerInstance, builder.createAggregate());
Questions I have are:
Is the above approach correct?
How do we define the depencdency of annotatorA's output in annotatorB in step 2?
Is the approach suggested at https://code.google.com/p/uimafit/wiki/ExternalResources#Resource_injection
, the right direction to achieve it ?
You can define the dependency with #TypeCapability like this:
#TypeCapability(inputs = { "com.myproject.types.MyType", ... }, outputs = { ... })
public class MyAnnotator extends JCasAnnotator_ImplBase {
....
}
Note that it defines a contract at the annotation level, not the engine level (meaning that any Engine could create com.myproject.types.MyType).
I don't think there are ways to enforce it.
I did create some code to check that an Engine is provided with the right required Annotations in the upstream of a pipeline, and prints an error log otherwise (see Pipeline.checkAndAddCapabilities() and Pipeline.addCapabilities() ). Note however that it will only work if all Engines define their TypeCapabilities, which is often not the case when one uses external Engines/libraries.

Resources