Error : "An accumulator rule that matches attribute or namespace nodes has no effect" - xslt-3.0

This is my data
<ECC>
<Grp EId="2123" CC="1"/>
<Grp EId="4345" CC="1"/>
<Grp EId="1074" CC="2"/>
<Grp EId="1254" CC="1"/>
<Grp EId="1342" CC="3"/>
<Grp EId="1261" CC="1"/>
</ECC>
I'm trying to load this into an accumulator using this
<xsl:accumulator name="CurrentLookupValue" as="xs:string" initial-value="''" streamable="yes">
<xsl:accumulator-rule match="ECC/Grp/#EId/text()" select="."/>
</xsl:accumulator>
<xsl:accumulator name="EmplIDLookup" as="map(xs:string,xs:decimal)" initial-value="map{}"
streamable="yes">
<xsl:accumulator-rule match="ECC/Grp/#CC/text()"
select="map:put($value, accumulator-before('CurrentLookupValue'), xs:decimal(.))"/>
</xsl:accumulator>
I get the warning "An accumulator rule that matches attribute or namespace nodes has no effect"
Documentation says "Pattern defining the set of nodes to which the accumulator rule applies"
Is there a workaround to use attributes?
Or should I create this xml in nodes?

#EId/text() does not make any sense as attribute nodes don't have any child nodes.
In general, as even with streaming you can read out attributes when matching on an element node I think you simply want
<xsl:accumulator name="CurrentLookupValue" as="xs:string" initial-value="''" streamable="yes">
<xsl:accumulator-rule match="ECC/Grp" select="string(#EId)"/>
</xsl:accumulator>
and
<xsl:accumulator name="EmplIDLookup" as="map(xs:string,xs:decimal)" initial-value="map{}"
streamable="yes">
<xsl:accumulator-rule match="ECC/Grp"
select="map:put($value, accumulator-before('CurrentLookupValue'), xs:decimal(#CC))"/>
</xsl:accumulator>
or simply one accumulator
<xsl:accumulator name="EmplIDLookup" as="map(xs:string,xs:decimal)" initial-value="map{}"
streamable="yes">
<xsl:accumulator-rule match="ECC/Grp"
select="map:put($value, string(#EId), xs:decimal(#CC))"/>
</xsl:accumulator>

Related

How to get the text based on xml tags in python?

A Newbie here!
Can anyone help me to extract the text SAMPLE HEADING between the XML tags? And is there also a way to extract text based on headings 1 to headings 6 present in XML tags? If yes, how to factor it in?
Below is the XML code of it:
<w:p w:rsidR="00000000" w:rsidDel="00000000" w:rsidP="00000000" w:rsidRDefault="00000000" w:rsidRPr="00000000" w14:paraId="00000033">
<w:pPr>
<w:pStyle w:val="Heading2"/>
<w:numPr>
<w:ilvl w:val="0"/>
<w:numId w:val="1"/>
</w:numPr>
<w:ind w:left="520" w:hanging="360"/>
<w:rPr>
<w:b w:val="1"/>
<w:color w:val="000000"/>
</w:rPr>
</w:pPr>
<w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000">
<w:rPr>
<w:b w:val="1"/>
<w:color w:val="000000"/>
<w:rtl w:val="0"/>
</w:rPr>
<w:t xml:space="preserve">SAMPLE HEADING</w:t>
</w:r>
</w:p>
You could read the XML file as a dictionary, which would then allow you to parse the dictionary based on the elements of the XML file.
As a sample:
import xmltodict
with open('file.xml') as xml:
data = xmltodict.parse(xml_file.read())
xml_file.close()
This will yield an ordered dictionary from your XML file:
OrderedDict([('w:p',
OrderedDict([('#w:rsidR', '00000000'),
('#w:rsidDel', '00000000'),
('#w:rsidP', '00000000'),
('#w:rsidRDefault', '00000000'),
('#w:rsidRPr', '00000000'),
('#w14:paraId', '00000033'),
('w:pPr',
OrderedDict([('w:pStyle',
OrderedDict([('#w:val',
'Heading2')])),
('w:numPr',
OrderedDict([('w:ilvl',
OrderedDict([('#w:val',
'0')])),
('w:numId',
OrderedDict([('#w:val',
'1')]))])),
('w:ind',
OrderedDict([('#w:left', '520'),
('#w:hanging', '360')])),
('w:rPr',
OrderedDict([('w:b',
OrderedDict([('#w:val',
'1')])),
('w:color',
OrderedDict([('#w:val',
'000000')]))]))])),
('w:r',
OrderedDict([('#w:rsidDel', '00000000'),
('#w:rsidR', '00000000'),
('#w:rsidRPr', '00000000'),
('w:rPr',
OrderedDict([('w:b',
OrderedDict([('#w:val',
'1')])),
('w:color',
OrderedDict([('#w:val',
'000000')])),
('w:rtl',
OrderedDict([('#w:val',
'0')]))])),
('w:t',
OrderedDict([('#xml:space',
'preserve'),
('#text',
'SAMPLE '
'HEADING')]))]))]))])
From this dictionary, you can access the SAMPLE HEADING through:
data['w:p']['w:r']['w:t']['#text']

Assertion to validate columns and data types using groovy

I am using jdbc call in ready api and running a describe query to get the columns and respective data types, how can I assert these columns and data types with the expected columns and data types?
Assuming you're using MySQL, a DESCRIBE query will return a XML that looks like this :
<Results>
<ResultSet fetchSize="0">
<Row rowNumber="1">
<COLUMNS.COLUMN_NAME>id</COLUMNS.COLUMN_NAME>
<COLUMNS.COLUMN_TYPE>bigint(20)</COLUMNS.COLUMN_TYPE>
<COLUMNS.IS_NULLABLE>NO</COLUMNS.IS_NULLABLE>
<COLUMNS.COLUMN_KEY>PRI</COLUMNS.COLUMN_KEY>
<COLUMNS.COLUMN_DEFAULT/>
<COLUMNS.EXTRA>auto_increment</COLUMNS.EXTRA>
</Row>
...`
If you want to test that column 'id' is a bigint(20), you should add a XPath Match assertion with the following XPath Expression :
//ResultSet/Row/COLUMNS.COLUMN_NAME[text()='id']/following-sibling::COLUMNS.COLUMN_TYPE
With expected result being 'bigint(20)'

Hive Query to parse an xml nested as value

I have an xml of the below format. The value of the attribute raw_xml is a nested xml which I am trying to parse.
The nested xml has nodes and attributes which I want to parse them only when the active flag = true as shown in results.
**
XML
**
<?xml version="1.0" encoding="utf-16"?>
<ClassApplications xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" id="1234567" bundle_id="323232" version="1.0">
<Reports>
<Report active="True" raw_xml="<Response Score="474">
<StudentXML>
<StudentSegments><StudentSegment MaleORFemale="M" StudentID="FS54AS44F" studentname="Kathy" kob="F" ></StudentSegment><StudentSegment MaleORFemale="M" StudentID="ASD555ASF" studentname="Kelli" kob="A" ></StudentSegment><StudentSegment MaleORFemale="M" StudentID="AD5A5S5D5" studentname="Christy" kob="F" ></StudentSegment> <StudentSegment MaleORFemale="M" StudentID="AS5FE84AD" studentname="Julia" kob="Z" > </StudentSegment> <StudentSegment MaleORFemale="M" StudentID="ASD5FD1D8" studentname="Martina" kob="F" > </StudentSegment> <StudentSegment MaleORFemale="M" StudentID="ASD45454A" studentname="Sam" kob="F"> </StudentSegments> </StudentXML> </Response>"/>
<Report active="False" raw_xml="<Response Score="474">
<StudentXML>
<StudentSegments><StudentSegment MaleORFemale="M" StudentID="FS54AS44F" studentname="Kathy" kob="F" ></StudentSegment><StudentSegment MaleORFemale="F" StudentID="145sfg51g" studentname="Kelli" kob="A" ></StudentSegment><StudentSegment MaleORFemale="M" StudentID="AD5A5S5D5" studentname="Christy" kob="F" ></StudentSegment> <StudentSegment MaleORFemale="M" StudentID="AS5FE84AD" studentname="Julia" kob="Z" > </StudentSegment> <StudentSegment MaleORFemale="M" StudentID="ASD5FD1D8" studentname="Martina" kob="F" > </StudentSegment> <StudentSegment MaleORFemale="M" StudentID="ASD45454A" studentname="Sam" kob="F"> </StudentSegments> </StudentXML> </Response>"/>
</Reports>
</ClassApplications>
**
Results :
**
ID MaleOrFemale StudentID StudentName
1234567 M FS54AS44F Kathy
1234567 M ASD555ASF Kelli
1234567 M AD5A5S5D5 Christy
1234567 M AS5FE84AD Julia
1234567 M ASD5FD1D8 Martina
1234567 M ASD45454A Sam
I tried to write code using Lateral View and Xplode but it results in error
Error :
Hive Runtime Error while processing row {"id":1234567,"rawxml":null,"input__file__name":"wasb://root#microsofttest.blob.core.windows.net/Test/Testxml001.xml"}
Please let me know how to use hive query to parse this situation.
****Code****
ADD JAR wasb:///user/hivexmlserde-1.0.5.3.jar;
SET mapred.input.dir.recursive=true;
SET hive.mapred.supports.subdirectories=true;
DROP TABLE IF EXISTS ClassTable;
CREATE EXTERNAL TABLE ClassTable(
ID BIGINT,
rawxml string
)
ROW FORMAT SERDE
'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.ID" ="/ClassApplications/#id",
"column.xpath.rawxml" = "/ClassApplications/Reports/Report[#active = 'True']/#raw_xml"
)
STORED AS INPUTFORMAT
'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'wasb://root#microsofttest.blob.core.windows.net/Test/'
TBLPROPERTIES ('serialization.null.format'='', "xmlinput.start"="<ClassApplications xmlns","xmlinput.end"="</ClassApplications>" );

Pyspark : ValueError: need more than 2 values to unpack

My data is in the following format after join
# (u'session_id', ((u'prod_id', u'user_id'), (u'prod_label', u'user_id')))
# (u'u'session_id', ((u'20133', u'129001032'), None))
# (u'u'session_id', ((u'2024574', u'61370212'), (u'Loc1', u'61370212')))
I want to treat the cases where the second tuple is None versus the one where it is not None. I tried filtering it using the following code, but I get error. How do I filter these out?
left_outer_joined_no_null = left_outer_joined.filter(lambda (session_id, ((tuple1), (tuple2))): (tuple2) != None)
ValueError: need more than 2 values to unpack
at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:135)
at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:176)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
You code works fine on my machine: Spark version is 1.5.1.
I have tried like this:
left_outer_joined = sc.parallelize([(u'session_id', ((u'prod_id', u'user_id'), (u'prod_label', u'user_id'))),
(u'session_id', ((u'20133', u'129001032'), None)),
(u'session_id', ((u'2024574', u'61370212'), (u'Loc1', u'61370212')))])
left_outer_joined_no_null = left_outer_joined \
.filter(lambda (session_id, ((tuple1), (tuple2))): (tuple2) != None)
for value in left_outer_joined_no_null.collect():
print(value)
Result is like as above this as you expected.
(u'session_id', ((u'prod_id', u'user_id'), (u'prod_label', u'user_id')))
(u'session_id', ((u'2024574', u'61370212'), (u'Loc1', u'61370212')))
NOTE:
In you input lines (u'u'session_id you have extra symbol u'. On the second and third lines it repeated two times. Maybe problem is there?

Create a FetchXML query that uses ISNULL

I want to make a FetchXML query that is using ISNULL like in SQL query.
In SQL
SELECT * FROM Contact WHERE ISNULL(FirstName, '') = ''
Do they have any operators for it in FetchXML?
Not exactly the same but the below query should give you something to work with.
<fetch mapping="logical">
<entity name="contact">
<all-attributes />
<filter>
<condition attribute="firstname" operator="null" />
</filter>
</entity>
</fetch>

Resources