Consolidation of Data using certain keys in XSLT - xslt-3.0

I have an input XML file:
<root>
<row>
<col1>cust001</col1>
<col2>cc1</col2>
<col3>po1</col3>
<col4>2020-02-22</col4>
<col5>Men</col5>
<col6>item1</col6>
<col7>60</col7>
</row>
<row>
<col1>cust001</col1>
<col2>cc1</col2>
<col3>po1</col3>
<col4>2020-02-22</col4>
<col5>Men</col5>
<col6>item2</col6>
<col7>50</col7>
</row>
</root>
Desired output: (if col1 to col5 are the same, consolidate into one row.)
<root>
<row>
<col1>cust001</col1>
<col2>cc1</col2>
<col3>po1</col3>
<col4>2020-02-22</col4>
<col5>Men</col5>
<col6>item1</col6>
<col7>60</col7>
<col6>item2</col6>
<col7>50</col7>
</row>
</root>
I'm trying the code here XSLT Consolidating data when ID is the same but I'm getting, Error in Expression current-group(): Unknown system function: current group.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.castiron.com//response">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="resultSets">
<xsl:apply-templates select="#*|node()"/>
</xsl:template>
<xsl:template match="resultSet">
<xsl:apply-templates select="#*"/>
<xsl:for-each-group select="root/row" group-by="concat(col1,col2,col3,col4,col5)">
<xsl:value-of select="current-grouping-key()"/>
<xsl:apply-templates select="current-group()" />
</xsl:for-each-group>
</xsl:template>

Assuming you use an XSLT 3 processor like Saxon 9.8 or later or AltovaXML 2017 R3 or later you can use a composite grouping key of the elements you want to use as a grouping key, then, inside, you of course need to make sure you create a single row for each group and only process the items forming the grouping key once (e.g. for the first item in the group which is the context item inside of the for-each-group) and then all the other elements for all items in the group:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
version="3.0">
<xsl:output indent="yes"/>
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="root">
<xsl:copy>
<xsl:for-each-group select="row" composite="yes" group-by="col1, col2, col3, col4, col5">
<xsl:copy>
<xsl:apply-templates select="col1, col2, col3, col4, col5, current-group()/(* except (col1, col2, col3, col4, col5))"/>
</xsl:copy>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
https://xsltfiddle.liberty-development.net/jz1Q1yt
Or, perhaps, the body of the for-each-group is a bit less repetitive in terms of the grouping key elements with
<xsl:template match="root">
<xsl:copy>
<xsl:for-each-group select="row!copy-of()" composite="yes" group-by="col1, col2, col3, col4, col5">
<xsl:copy>
<xsl:apply-templates select="current-group()/(if (position() eq 1) then * else (* except (col1, col2, col3, col4, col5)))"/>
</xsl:copy>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
https://xsltfiddle.liberty-development.net/jz1Q1yt/1

Related

Pivot String Values in Snowflake

How can I pivot this table
ID
attribute_name
attribute_value
1
Name
John
1
Country
UK
1
City
London
into structure?
ID
Name
Country
City
1
John
UK
London
According to the documentation pivot requires a aggregate function
SELECT ...
FROM ...
PIVOT ( <aggregate_function> ( <pivot_column> )
FOR <value_column> IN ( <pivot_value_1> [ , <pivot_value_2> ... ] ) )
How can I apply this to string values?
The aggregating function can be max(). For example:
select *
from (
select xx.seq, xx.value:"#id" id, xx.value:"$" title
from BooksXML, table(flatten(xml:"$":"$")) xx
)
pivot(max(title) for id in ('bk101', 'bk102', 'bk103', 'bk104', 'bk105')) as p
order by seq
With the table:
CREATE temp TABLE BooksXML
as
select parse_xml('<catalog issue="spring">
<Books>
<book id="bk101">The Good Book</book>
<book id="bk102">The OK Book</book>
<book id="bk103">The NOT Ok Book</book>
<book id="bk104">All OK Book</book>
<book id="bk105">Every OK Book</book>
</Books>
</catalog>') xml
union all select parse_xml('
<catalog issue="spring">
<Books>
<book id="bk102">The OK Book1</book>
<book id="bk103">The NOT Ok Book1</book>
<book id="bk104">All OK Book1</book>
</Books>
</catalog>')
union all select parse_xml('
<catalog issue="spring">
<Books>
<book id="bk101">The Good Book2</book>
<book id="bk103">The NOT Ok Book2</book>
<book id="bk104">All OK Book2</book>
<book id="bk105">Every OK Book2</book>
</Books>
</catalog>');

Inserting a single string text file with multi character delimiters into a spark dataframe

New to spark and I am learning as I go. I have a very large text file with columns delimited by "|||||" that I would like to insert into a spark dataframe. However, the file is just one line string. The file looks something like this:
col1|||||col2|||||col3|||||col4|||||col5|||||col1|||||col2|||||col3...
So column 1 through 5 just essentially loop in the one line. I've tried to insert a new line after every 5th "|||||" with a sed command via:
sed -r 's/([^|||||]*|||||){5}/&\n/g'
Which worked the most part but ultimately ended up not working properly for some reason. I suspect col4 (which is an enormous text field) is causing some issues in this but I do not know enough as to why it is doing so.
Now when I read the single line text file into spark via:
val df = spark.read.textFile(file)
This puts everything into one column, and I would like to split it out into 5 columns and kind of have the dataframe "wrap" the string after every 5 columns.
My Goal is to get it into something like this:
+--------------------+---------------+--------------------+--------------------+--------------------+
| col1| col2| col3| col4| col5|
+--------------------+---------------+--------------------+--------------------+--------------------+
| val| val| val| val| val|
| val| val| val| val| val|
+--------------------+---------------+--------------------+--------------------+--------------------+
So my question is: Because my file is just one massive string, is there a way to get the dataframe to enter a new record/row after 5 columns?
This is a solution to your first question.
Usually You would read as a regular text file and later use the split method to convert the line into columns.
df.withColumn("tmp", split($"value", "|||||")).select(
$"tmp".getItem(0).as("first"),
$"tmp".getItem(1).as("second"),
$"tmp".getItem(2).as("third")
).drop("tmp")
For your second question. You can use this regex to match the pattern:
(([a-z0-9A-Z]+)(\|\|\|\|\|)([a-z0-9A-Z]+)(\|\|\|\|\|)([a-z0-9A-Z]+)(\|\|\|\|\|)([a-z0-9A-Z]+)(\|\|\|\|\|))
If you have enough memory you can read all your file and then use this pattern to extract the parts of it.
If not then you have to read it byte by byte and see if you match this pattern.
Good luck!
If the file is big with a single line, then go with Perl solution. Perl variables can store file contents ( even in GBs) and you can manage easily. You do all the preprocessing in perl itself. See if the below works for you
> cat 5cols
col1|||||col2|||||col3|||||col4|||||col5|||||col1|||||col2|||||col3|||||col4|||||col5|||||col1|||||col2|||||col3|||||col4|||||col5|||||col1|||||col2|||||col3|||||col4|||||col5|||||col1|||||col2|||||col3|||||col4|||||col5|||||col1|||||col2|||||col3|||||col4|||||col5|||||col1|||||col2|||||col3|||||col4|||||col5|||||col1|||||col2|||||col3|||||col4|||||col5|||||col1|||||col2|||||col3|||||col4|||||col5|||||col1|||||col2|||||col3|||||col4|||||col5|||||
> perl -e ' BEGIN {$x=qx(cat 5cols);while($x=~m/([^|]+?)(?=[|]{5})/g){ print "$1,\n"} exit } ' | xargs -n5 | sed 's/,$//g'
col1, col2, col3, col4, col5
col1, col2, col3, col4, col5
col1, col2, col3, col4, col5
col1, col2, col3, col4, col5
col1, col2, col3, col4, col5
col1, col2, col3, col4, col5
col1, col2, col3, col4, col5
col1, col2, col3, col4, col5
col1, col2, col3, col4, col5
col1, col2, col3, col4, col5
>
Redirect the above output to another csv file. Now, you can read with spark.csv as a regular csv file with 5 columns

Converting Excel spreadsheet to XML

I have an excel spreadsheet. When I tried to save the Excel spreadsheet as XML, all the leading zeros are lost. I am using Excel 2016. I have 63138 rows in excel spreadsheet. Below is the sample data of my excel spreadsheet.
Col1 col2 col3 col4 col5
32 000001 000 001 1
32 000032 000 032 22
32 000111 000 111 032
How can I prevent excel to drop off the leading zeros when I save the file as XML. All the last columns have been formatted as text. Below is the image:
Any help will be appreciated.
You can use XlRangeValueDataType enumeration for the Value property of Range object. All zeros are preserved. For your example with five columns you will get:
Sub GGG()
Dim rng As Range
Dim xml$
xml = Range("A1").CurrentRegion.Value(XlRangeValueDataType.xlRangeValueXMLSpreadsheet)
'// Save string here somewhere
End Sub
Output:
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom"/>
<Borders/>
<Font ss:FontName="Calibri" x:CharSet="204" x:Family="Swiss" ss:Size="11"
ss:Color="#000000"/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
<Style ss:ID="s63">
<Alignment ss:Horizontal="Left" ss:Vertical="Center"/>
<Font ss:FontName="Inherit" x:CharSet="204" ss:Color="#303336"/>
</Style>
<Style ss:ID="s68">
<NumberFormat ss:Format="#"/>
</Style>
</Styles>
<Worksheet ss:Name="Sheet1">
<Table ss:ExpandedColumnCount="5" ss:ExpandedRowCount="4" ss:StyleID="s68"
ss:DefaultRowHeight="15">
<Row>
<Cell ss:StyleID="s63"><Data ss:Type="String">Col1</Data></Cell>
<Cell><Data ss:Type="String">col2</Data></Cell>
<Cell><Data ss:Type="String">col3</Data></Cell>
<Cell><Data ss:Type="String">col4</Data></Cell>
<Cell><Data ss:Type="String">col5</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="String">32</Data></Cell>
<Cell><Data ss:Type="String">000001</Data></Cell>
<Cell><Data ss:Type="String">000</Data></Cell>
<Cell><Data ss:Type="String">001</Data></Cell>
<Cell><Data ss:Type="String">1</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="String">32</Data></Cell>
<Cell><Data ss:Type="String">000032</Data></Cell>
<Cell><Data ss:Type="String">000</Data></Cell>
<Cell><Data ss:Type="String">032</Data></Cell>
<Cell><Data ss:Type="String">22</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="String">32</Data></Cell>
<Cell><Data ss:Type="String">000111</Data></Cell>
<Cell><Data ss:Type="String">000</Data></Cell>
<Cell><Data ss:Type="String">111</Data></Cell>
<Cell><Data ss:Type="String">032</Data></Cell>
</Row>
</Table>
</Worksheet>
</Workbook>

Create a table grid using a single repeat control in xpages

I would like to create a table with 4 columns and 4 or more rows (so 16 or more items per page) using a single repeat control. Is this possible at all? I have achieved the desired affect in the past using div tags and display in-line, but would like to know whether it's possible to achieve this using a table. When the code is generated by a repeat control, how could I tell it to create a new row when it reaches the 4th element?? Any ideas at all?
The repeat control has facets for the header and footer that you can use to output the html tags required for the table header and footer like this...
<xp:this.facets>
<xp:text disableTheme="true" xp:key="header" escape="false">
<xp:this.value><![CDATA[
<table>
<thead>
<tr>
<th>Column 1</th>
<th>Column 2</th>
<th>Column 3</th>
<th>Column 4</th>
</tr>
</thead>
<tbody>]]></xp:this.value>
</xp:text>
<xp:text disableTheme="true" xp:key="footer" escape="false">
<xp:this.value><![CDATA[
</tbody>
</table>]]></xp:this.value>
</xp:text>
</xp:this.facets>
Then inside your repeat control you could repeat a single computed field which will output the html and cell contents for the table. use the Repeat Index variable to determine if the computed field control should include the <tr> or </tr> tags and make sure the control has been set to display contents as html.

SharePoint 2010 Populate Lookup Field on Activation

I have two lists: [Parents] [Children]. The Children list has a one-to-one lookup column to the Parents list. When deploying the Children List I have data written into the tag using
<Data>
<Rows>
<Row>
<Field Name="ChildName">Stephanie</Field>
<Field Name="ParentNameLookup">What value goes here?</Field>
</Row>
</Rows>
</Data>
My question: is there a way to populate the data into the tag for the ParentNameLookup field?
Try this
<Field Name="ParentNameLookup">ID;#VALUE</Field>
Where ID is ID of the Parent List Item and VALUE Represents actual Text.

Resources