Databricks spark-xml when reading tags ending in "/>" return values are null - apache-spark

I'm using the latest version of spark-xml (0.4.1) with scala 11, when I read some xml that contains tags ending with "/>" the corresponding values ​​are null, fallow the example:
XML:
<Clients>
<Client ID="1" name="teste1" age="10">
<Operation ID="1" name="operation1">
</Operation>
<Operation ID="2" name="operation2">
</Operation>
</Client>
<Client ID="2" name="teste2" age="20"/>
<Client ID="3" name="teste3" age="30">
<Operation ID="1" name="operation1">
</Operation>
<Operation ID="2" name="operation2">
</Operation>
</Client>
</Clients>
Dataframe:
+----+------+----+--------------------+
| _ID| _name|_age| Operation|
+----+------+----+--------------------+
| 1|teste1| 10|[[1,operation1], ...|
|null| null|null| null|
+----+------+----+--------------------+
Code:
Dataset<Row> clients = sparkSession.sqlContext().read()
.format("com.databricks.spark.xml")
.option("rowTag", "Client")
.schema(getSchemaClient())
.load(dirtorio);
clients.show(10);
public StructType getSchemaClient() {
return new StructType(
new StructField[] {
new StructField("_ID", DataTypes.StringType, true, Metadata.empty()),
new StructField("_name", DataTypes.StringType, true, Metadata.empty()),
new StructField("_age", DataTypes.StringType, true, Metadata.empty()),
new StructField("Operation", DataTypes.createArrayType(this.getSchemaOperation()), true, Metadata.empty()) });
}
public StructType getSchemaOperation() {
return new StructType(new StructField[] {
new StructField("_ID", DataTypes.StringType, true, Metadata.empty()),
new StructField("_name", DataTypes.StringType, true, Metadata.empty()),
});
}

Version 0.5.0 was just released, which resolved issues with self-closing tags. It may resolve this issue. See https://github.com/databricks/spark-xml/pull/352

Related

Clojure: passing a list of records to a Java object

I am experimenting with Clojure. I was able to generate a report in JasperReports, but
it gives null values. The Java object in question is JRBeanCollectionDataSource.
I pass it a list of records, but somehow, the PDF contains only nulls.
Also, why am I not able to pass it {} as an empty HashMap? In Groovy, the [:]
syntax works OK.
(ns jasper.core
(:import
(net.sf.jasperreports.engine JasperCompileManager
JasperFillManager
JasperExportManager)
(net.sf.jasperreports.engine.data JRBeanCollectionDataSource)))
(import 'java.util.HashMap)
(defrecord Car [id name price])
(def data [(->Car 1, "Audi", 52642)
(->Car 2, "Mercedes", 57127)
(->Car 3, "Skoda", 9000)
(->Car 4, "Volvo", 29000)
(->Car 5, "Bentley", 350000)
(->Car 6, "Citroen", 21000)
(->Car 7, "Hummer", 41400)
(->Car 8, "Volkswagen", 21600)])
(def xmlFile "resources/report.xml")
(def jrReport (JasperCompileManager/compileReport xmlFile))
;; (def params {})
(def params (HashMap.))
(def ds (JRBeanCollectionDataSource. data))
(println (.toString ds))
(def jrPrint (JasperFillManager/fillReport jrReport params ds))
(defn -main
[]
(JasperExportManager/exportReportToPdfFile jrPrint "report.pdf"))
This is a rewrite of the following working Groovy solution:
#Grab(group='net.sf.jasperreports', module='jasperreports', version='6.17.0')
import net.sf.jasperreports.engine.JasperCompileManager
import net.sf.jasperreports.engine.JasperFillManager
import net.sf.jasperreports.engine.JasperExportManager
import net.sf.jasperreports.engine.data.JRBeanCollectionDataSource
import groovy.transform.Immutable
#Immutable
class Car {
Long id;
String name;
int price;
}
def data = [
new Car(1L, 'Audi', 52642),
new Car(2L, 'Mercedes', 57127),
new Car(3L, 'Skoda', 9000),
new Car(4L, 'Volvo', 29000),
new Car(5L, 'Bentley', 350000),
new Car(6L, 'Citroen', 21000),
new Car(7L, 'Hummer', 41400),
new Car(8L, 'Volkswagen', 21600),
]
def empty = []
def xmlFile = "report.xml"
def jrReport = JasperCompileManager.compileReport(xmlFile)
def ds = new JRBeanCollectionDataSource(data)
def params = [:]
def jrPrint = JasperFillManager.fillReport(jrReport, params, ds)
JasperExportManager.exportReportToPdfFile(jrPrint, "report.pdf")
Edit For a fully working example, place the following file in the current working directory for Groovy, and resources directory for Clojure.
<?xml version = "1.0" encoding = "UTF-8"?>
<!DOCTYPE jasperReport PUBLIC "//JasperReports//DTD Report Design//EN"
"http://jasperreports.sourceforge.net/dtds/jasperreport.dtd">
<jasperReport xmlns="http://jasperreports.sourceforge.net/jasperreports"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://jasperreports.sourceforge.net/jasperreports
http://jasperreports.sourceforge.net/xsd/jasperreport.xsd"
whenNoDataType="NoDataSection"
name="report" topMargin="20" bottomMargin="20">
<field name="id" class="java.lang.Long"/>
<field name="name"/>
<field name="price" class="java.lang.Integer"/>
<detail>
<band height="15">
<textField>
<reportElement x="0" y="0" width="50" height="15"/>
<textElement textAlignment="Right" verticalAlignment="Middle"/>
<textFieldExpression class="java.lang.Long">
<![CDATA[$F{id}]]>
</textFieldExpression>
</textField>
<textField>
<reportElement x="150" y="0" width="100" height="15" />
<textElement textAlignment="Left" verticalAlignment="Middle"/>
<textFieldExpression class="java.lang.String">
<![CDATA[$F{name}]]>
</textFieldExpression>
</textField>
<textField>
<reportElement x="200" y="0" width="100" height="15" />
<textElement textAlignment="Right" verticalAlignment="Middle"/>
<textFieldExpression class="java.lang.Integer">
<![CDATA[$F{price}]]>
</textFieldExpression>
</textField>
</band>
</detail>
<noData>
<band height="15">
<staticText>
<reportElement x="0" y="0" width="200" height="15"/>
<box>
<bottomPen lineWidth="1.0" lineColor="#CCCCCC"/>
</box>
<textElement />
<text><![CDATA[The report has no data]]></text>
</staticText>
</band>
</noData>
</jasperReport>
With the hints from the comments, I was able to resolve the issue. The problem is that Clojure records do not implement the JavaBeans spec, while the JRBeanCollectionDataSource expects such beans.
With the clj-bean library, I was able to make it work.
(defbean Car
[[Long id]
[String name]
[Integer price]])
(def data [(Car. 1 "Audi" 52642),
(Car. 2 "Mercedes" 57127),
(Car. 3 "Skoda" 9000),
(Car. 4 "Volvo" 29000),
(Car. 5 "Bentley" 350000),
(Car. 6 "Citroen" 21000),
(Car. 7 "Hummer" 41400),
(Car. 8 "Volkswagen" 21600)])
Now the report contains data.

Spark-Xml: Array within an Array in Dataframe to generate XML

I have a requirement to generate a XML which has a below structure
<parent>
<name>parent</name
<childs>
<child>
<name>child1</name>
</child>
<child>
<name>child1</name>
<grandchilds>
<grandchild>
<name>grand1</name>
</grandchild>
<grandchild>
<name>grand2</name>
</grandchild>
<grandchild>
<name>grand3</name>
</grandchild>
</grandchilds>
</child>
<child>
<name>child1</name>
</child>
</childs>
</parent>
As you see a parent will have child(s) and a child node may have grandchild(s) nodes.
https://github.com/databricks/spark-xml#conversion-from-dataframe-to-xml
I understand from spark-xml that when we have an nested array structure the data-frame should be as below
+------------------------------------+
| a|
+------------------------------------+
|[WrappedArray(aa), WrappedArray(bb)]|
+------------------------------------+
Can you please help me with this small example on how to make a flattened DataFrame for my desired xml. I am working on Spark 2.X Spark-Xml 0.4.5(Latest)
My Schema
StructType categoryMapSchema = new StructType(new StructField[]{
new StructField("name", DataTypes.StringType, true, Metadata.empty()),
new StructField("childs", new StructType(new StructField[]{
new StructField("child",
DataTypes.createArrayType(new StructType(new StructField[]{
new StructField("name", DataTypes.StringType, true, Metadata.empty()),
new StructField("grandchilds", new StructType(new StructField[]{
new StructField("grandchild",
DataTypes.createArrayType(new StructType(new StructField[]{
new StructField("name", DataTypes.StringType, true,
Metadata.empty())
})), true, Metadata.empty())
}), true, Metadata.empty())
})), true, Metadata.empty())
}), true, Metadata.empty()),
});
My Row RDD data.. Not actual code, but somewhat like this.
final JavaRDD<Row> rowRdd = mapAttributes
.map(parent -> {
return RowFactory.create(
parent.getParentName(),
RowFactory.create(RowFactory.create((Object) parent.getChild))
);
});
What i have tried till now i have the WrappedArray within parent WrappedArray which does not work.

how to pass parameter as object with xmlrpc - node module (github.com/baalexander/node-xmlrpc)

actually need to pass a parameter which is an xml as an object with xmlrpc node module github.com/baalexander/node-xmlrpc;
client call is :
client.methodCall('anAction', ['aParam'], function (error, value) {})
need to replace aParam with something like that:
<?xml version='1.0'?>
<methodCall>
<methodName>GetBalanceAndDate</methodName>
<params>
<param>
<value><struct>
<member>
<name>originTransactionID</name>
<value><string>90000001</string></value>
</member>
<member>
<name>originHostName</name>
<value><string>PADFVPZ452P01</string></value>
</member>
<member>
<name>subscriberNumber</name>
<value><string>24107482594</string></value>
</member>
<member>
<name>subscriberNumberNAI</name>
<value><int>1</int></value>
</member>
<member>
<name>originTimeStamp</name>
<value><dateTime.iso8601>20140626T14:53:53+0100</dateTime.iso8601></value>
</member>
<member>
<name>originNodeType</name>
<value><string>EXT</string></value>
</member>
</struct></value>
</param>
</params>
</methodCall>
I've tried :
var req1 = {
value: {
struct:{
member:{$name:'originTransactionID',value:'90000001'},
mamber:{$name:'originHostName',value:'GAGAPOPAAPZP01'},
member:{$name:'subscriberNumber',value:'24104151708'},
member:{$name:'subscriberNumberNAI',value: 1},
member:{$name:'originTimeStamp', value: new Date()},
member:{$name:'originNodeType', value: 'EXT'}
}
}
};
client.methodCall('GetBalanceAndDate', [req1], function (error, value) {})
but having error : faultString: 'Illegal request message'
any idea on how it works please??
try to pass xml as a parameter, I do so:
const request = `
<?xml version="1.0"?>
<methodCall>
<methodName>CheckBal</methodName>
<params>
<param>
<value>
<struct>
<member>
<name>RMID</name>
<value><string>${rmid}</string></value>
</member>
<member>
...
</methodCall>`;
client.methodCall('CheckBal', [request], (error, value) => {
console.log(value);
});

how to append child to element using smooks groovy

I am trying to use Smooks with Groovy to appendChild() to elements. So far I've been unable to do so.
Sample input file
<?xml version="1.0"?>
<orders>
<order res="0">
<header>
<date>Wed Nov 27 13:45:28 EST 2013</date>
<customer number="9404089839">Harish Nanda M</customer>
</header>
<orderitems>
<orderitem>
<product>007</product>
<quantity>7</quantity>
<price>7.90</price>
</orderitem>
<orderitem>
<product>005</product>
<quantity>3</quantity>
<price>7.20</price>
</orderitem>
</orderitems>
</order>
<order res="0">
<header>
<date>Wed Nov 27 13:45:28 EST 2013</date>
<customer number="2036139296">Sandesh G</customer>
</header>
<orderitems>
<orderitem>
<product>0023</product>
<quantity>87</quantity>
<price>57.90</price>
</orderitem>
<orderitem>
<product>005</product>
<quantity>3</quantity>
<price>7.20</price>
</orderitem>
</orderitems>
</order>
</orders>
Smooks config file
<?xml version="1.0"?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.3.xsd" xmlns:g="http://www.milyn.org/xsd/smooks/groovy-1.1.xsd">
<core:filterSettings type="SAX"/>
<g:groovy executeOnElement="order">
<g:imports>
import org.milyn.xml.DomUtils;
import org.milyn.javabean.context.BeanContext;
import org.milyn.javabean.repository.BeanRepository;
import org.w3c.dom.*;
import groovy.xml.dom.DOMCategory;
import groovy.xml.dom.DOMUtil;
import groovy.xml.DOMBuilder;
import groovy.util.XmlSlurper;
</g:imports>
<g:script>
<!-- javax.xml.transform.dom.DOMSource source = new javax.xml.transform.dom.DOMSource(element);
javax.xml.transform.Transformer transformer = javax.xml.transform.TransformerFactory.newInstance().newTransformer();
javax.xml.transform.stream.StreamResult result = new
javax.xml.transform.stream.StreamResult(new java.io.StringWriter());
transformer.transform(source, result); String ss =
result.getWriter().toString(); def ord = new
XmlSlurper().parseText(ss) //ord.xxxx = "2222"; println
ord.orderitems.orderitem.product.text()*.toInteger().sum();
element.appendChild("res"); //element['#res'] = element.'#res'.toInteger() +
ord.orderitems.orderitem.product.text()*.toInteger().sum();
writeFragment(element);
-->
</g:script>
</g:groovy>
</smooks-resource-list>

Parse XML using Groovy: How do I keep the CDATA when parsing a XML file

Using Groovy 2.0.5 JVM 1.6.0_31, I have created a script that takes an existing XML-file as input
def root = new XmlParser().parse(new File('filename'))
I parse the file and replaces certain attributes like this
root.Settings.Setting.each {
if (it.'#NAME' =~ 'CASEID_SEQUENCE_SIZE') {
it.'#VALUE' = '100'
And then at the end writes the changes to a new file like this
def outputfile = new File( levelConfig.RESULTFILE )
new XmlNodePrinter(new PrintWriter(outputfile)).print(root)
All this is fine, no problem, except when the XML has CDATA, like this
<HandlerURL>
<![CDATA[admin/MainWindow.jsp]]>
</HandlerURL>
the result is then
<HandlerURL>
admin/MainWindow.jsp
</HandlerURL>
Question is
How can I get my script to not do anything with the CDATA?
Found you can do:
import groovy.xml.*
import groovy.xml.dom.DOMCategory
def xml = '''<root>
| <Settings>
| <Setting name="CASEID_SEQUENCE_SIZE">
| <HandlerURL>
| <![CDATA[ admin/MainWindow.jsp ]]>
| </HandlerURL>
| </Setting>
| <Setting name="SOMETHING_ELSE">
| <HandlerURL>
| <![CDATA[ admin/MainWindow.jsp ]]>
| </HandlerURL>
| </Setting>
| </Settings>
|</root>'''.stripMargin()
def document = DOMBuilder.parse( new StringReader( xml ) )
def root = document.documentElement
use(DOMCategory) {
root.Settings.Setting.each {
if( it.'#name' == 'CASEID_SEQUENCE_SIZE' ) {
it[ '#value' ] = 100
}
}
}
def result = XmlUtil.serialize( root )
println result
To get the output:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<Settings>
<Setting name="CASEID_SEQUENCE_SIZE" value="100">
<HandlerURL>
<![CDATA[ admin/MainWindow.jsp ]]>
</HandlerURL>
</Setting>
<Setting name="SOMETHING_ELSE">
<HandlerURL>
<![CDATA[ admin/MainWindow.jsp ]]>
</HandlerURL>
</Setting>
</Settings>
</root>

Resources