extract words of a certain language out of an xml file - xslt-3.0

given the following xml (which of course consists of many records),
i would like to output unique values out of it, and also generate a report, that would have the records each word was found.
<collection>
<record>
<controlfield tag="001">1</controlfield>
<datafield tag="200" ind1="1" ind2=" ">
<subfield code="a">Metafore po</subfield>
<subfield code="e">Δοκίμια</subfield>
<subfield code="f">Περικλής αρχαία Ελλάδα</subfield>
</datafield>
<datafield tag="210" ind1="|" ind2="|">
<subfield code="a">Η Αθήνα</subfield>
<subfield code="c">Νοέμβριος</subfield>
<subfield code="d">1999</subfield>
</datafield>
<datafield tag="215" ind1=" " ind2=" ">
<subfield code="a">263 s.</subfield>
</datafield>
<datafield tag="606" ind1="|" ind2=" ">
<subfield code="3">250000087120140311174609</subfield>
<subfield code="a">Πλάτων ιστορία</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2="1">
<subfield code="3">200000000120140228092156</subfield>
<subfield code="4">070</subfield>
<subfield code="a">Liper</subfield>
<subfield code="b">Berit von der</subfield>
</datafield>
</record>
<record>
<controlfield tag="001">here text may also exist</controlfield>
<datafield tag="200" ind1="1" ind2=" ">
<subfield code="a">Metafore po</subfield>
<subfield code="e">Δοκίμια</subfield>
<subfield code="f">Περικλής</subfield>
</datafield>
</collection>
desired output (xml format, or whatever is more easily achieved)
Δοκίμια: 1, here text may also exist
Περικλής: 1, here text may also exist
αρχαία: 1
Η: 1
etc...
regex i have tried with:
/[Α-Ωα-ω]{1,}/

It seems you can treat that like a grouping problem:
<xsl:template match="collection">
<xsl:where-populated>
<ul>
<xsl:for-each-group select="record" group-by="datafield/subfield!tokenize(., '\s')[matches(., '\p{IsGreek}')]">
<li>
{current-grouping-key()} : <xsl:value-of select="current-group()/controlfield" separator=", "/>
</li>
</xsl:for-each-group>
</ul>
</xsl:where-populated>
</xsl:template>
https://xsltfiddle.liberty-development.net/gWmuiKi/1 outputs
<ul>
<li>
Δοκίμια : 1, here text may also exist
</li>
<li>
Περικλής : 1, here text may also exist
</li>
<li>
αρχαία : 1
</li>
<li>
Ελλάδα : 1
</li>
<li>
Η : 1
</li>
<li>
Αθήνα : 1
</li>
<li>
Νοέμβριος : 1
</li>
<li>
Πλάτων : 1
</li>
<li>
ιστορία : 1
</li>
</ul>
that way.
Of course identifying a "word" by simply tokenizing on white space is going to fail in mosts texts and languages, due to punctuation characters and language specific rules. But XSLT/XPath/XQuery regular expressions don't have a word break metacharacter anyway so somehow one has to use tokenize or analyze-string.

Related

xml search and update using python

my sample xml object is:
<Assessment version="10" dateCreated="4/19/2020 10:41:20 PM">
<Section name="Space">
<Glossary>
<Item name="***"><b>Indicates high priority data that is required</b></Item>
</Glossary>
<InputNumber type="int" min="0" max="100" title="[1a] q2 some data">
<Value>20</Value>
</InputNumber>
<InputNumber type="int" min="0" max="10000" title="[2] some data">
<Value>1</Value>
</InputNumber>
<InputNumber type="int" min="0" max="10000" title="[3] some text">
<Value>2</Value>
</InputNumber>
</Section>
<Section name="Power">
<Glossary>
<Item name="***"><b>Indicates high priority data that is required</b></Item>
</Glossary>
<InputNumber type="int" min="0" max="100000" title="[8] some text">
<Value>15</Value>
</InputNumber>
<PickList title="[11] some text">
<Option selected="true">Yes</Option>
<Option>No</Option>
<Option>there is no UPS</Option>
</PickList>
</Section>
<Section name="Cooling">
<Glossary>
<Item name="***"><b>Indicates high priority data that is required</b></Item>
</Glossary>
<InputNumber type="int" min="0" max="100000" title="[18] some text">
<Value>30</Value>
</InputNumber>
</InputText>
<InputText title="[21] some data">
<Value>3</Value>
</InputText>
</Section>
<Section name="Comments">
<InputTextArea title="[22] General Comments" format="normal">
<Value>test value for capacity assessment test by monika</Value>
</InputTextArea>
</Section>
</Assessment>
i get this from database
In my python script:
template= cursor.fetchall()
for row in template:
#xmlTemplate = ET.ElementTree(ET.fromstring(row[6]))
xmlTemplateStr = row[6]
xmlTemplate = ET.ElementTree(ET.fromstring(xmlTemplateStr))
root = ET.fromstring(xmlTemplateStr)
i need to find the node based on the number and update the value of that perticular node
for example:
if in need to update the following value to 4:
</InputText>
<InputText title="[21] some data">
<Value>3</Value>
</InputText>
i need to search the node with key as [21] and update the value of that node to 4
</InputText>
<InputText title="[21] some data">
<Value>4</Value>
</InputText>
How do i do it?
You can do this with lxml, using xpath:
assess = """[your xml above - corrected]""" #note: the sample xml in the question was invalid because it had an extra tag
from lxml import etree
doc = etree.XML(assess.encode('utf-8'))
val = doc.xpath('//InputText[#title="[21] some data"]/Value')
val[0].text = '4'
print(etree.tostring(doc).decode())
Relevant part of the output:
<InputText title="[21] some data">
<Value>4</Value>
</InputText>

vim sparkup item number in content

Using the sparkup plugin for vim (specifically vim-gnome on Ubuntu 15.04, although I doubt that matters), I am generating a list with item numbers:
ion-content.has-tabs > .list > a.item[href=#/item/$]{Item $}*3
The result substitutes the item number in [href=#/item/$] but not in {Item $}:
<ion-content class="has-tabs">
<div class="list">
Item $
Item $
Item $
</div>
</ion-content>
Feature, bug, or user error?
I don't remember Sparkup ever supporting incrementing numbers inside "content" braces so I would say "feature".
Don't waste your time asking for a fix on the plugin's issue tracker, though.
I am afriad that this plugin deems the $ in curly braces as a text which should not be affected. To numerate your Item $ list, you can try this command
:let #a=1 | %s/\$/\=(#a+setreg('a',#a+1))/g
or for selected block in visual mode
:let #a=1 | '<,'>s/\$/\=(#a+setreg('a',#a+1))/g
I accepted an answer to the question as I asked it, but I'm adding this to show an alternate approach for a slightly different requirement. I needed to add a nested tag along with the text, which Sparkup does not support. So I found a single 2-step solution to this that also solved the original item number problem (more complex in my real world solution but simplified here).
This sparkup:
ion-content > .list > a.item.item-icon-left.Item$[href=#/items/$]*3 > i.icon.ion-email
generates this:
<ion-content>
<div class="list">
<a href="#/items/1" class="item item-icon-left Item1">
<i class="icon ion-email"></i>
</a>
<a href="#/items/2" class="item item-icon-left Item2">
<i class="icon ion-email"></i>
</a>
<a href="#/items/3" class="item item-icon-left Item3">
<i class="icon ion-email"></i>
</a>
</div>
</ion-content>
Then after I select the lines in visual mode, running this:
:'<,'>s/ Item\([0-9]*\)">/">Item \1/g
produces my desired result:
<ion-content>
<div class="list">
<a href="#/items/1" class="item item-icon-left">Item 1
<i class="icon ion-email"></i>
</a>
<a href="#/items/2" class="item item-icon-left">Item 2
<i class="icon ion-email"></i>
</a>
<a href="#/items/3" class="item item-icon-left">Item 3
<i class="icon ion-email"></i>
</a>
</div>
</ion-content>

expression engine search result related

I want to display up to 200 words of the related results just in the next line to title
But i am not getting the text that {excerpt} should Display
My code is written below
{exp:search:search_results switch="resultRowOne|resultRowTwo"}
<table border="0" cellpadding="6" cellspacing="1" width="100%">
{exp:search:search_results switch="resultRowOne|resultRowTwo"}
<tr class="{switch}">
{if page_meta_title != ""} <td width="30%" valign="top"><b>{title}</b></td>{/if}
</tr>
<tr><td style="color:red!important">{excerpt}</td></tr>
{if count == total_results}
</table>
{/if}
{paginate}
<p>Page {current_page} of {total_pages} pages {pagination_links}</p>
{/paginate}
{/exp:search:search_results}
</table>
Maybe this was just a typo in your question, but it looks like you have the opening search tag listed twice.
{exp:search:search_results switch="resultRowOne|resultRowTwo"}
<table border="0" cellpadding="6" cellspacing="1" width="100%">
{exp:search:search_results switch="resultRowOne|resultRowTwo"}
Also, the excerpt tag by default allows 50 characters. You can also consider the character limiter plugin (http://devot-ee.com/add-ons/character-limiter) which is a free plugin from Ellis Lab. Once you have that setup, you would use it like so....
{exp:char_limit total="200" exact="no"}{your_text_field}{/exp:char_limit}

XSLT raising a webpart error because of a list structure

I'm developping a custom CQWP using a custom ContentQueryMain.xsl, I am using a list structure with which I would like to have a separator creating a new list each three items. Here is the code of the template:
<xsl:template name="CustomGroupTemplateSimple2">
<ul>
<li>
<ul class="liste1">
<xsl:variable name="Rows" select="/dsQueryResponse/Rows/Row"/>
<xsl:for-each select="$Rows">
<xsl:call-template name="OuterTemplate.CallPresenceStatusIconTemplate"/>
<li>
test
</li>
<xsl:if test="position() mod 3 = 0">
</ul>
</li>
<li>
<ul class="separator">
</xsl:if>
</xsl:for-each>
</ul>
</li>
</ul>
</xsl:template>
The separator is:
</ul>
</li>
<li>
<ul class="separator">
is responsible of the webpart error raised. The following code is working perfectly:
<xsl:template name="CustomGroupTemplateSimple2">
<ul>
<li>
<ul class="liste1">
<xsl:variable name="Rows" select="/dsQueryResponse/Rows/Row"/>
<xsl:for-each select="$Rows">
<xsl:call-template name="OuterTemplate.CallPresenceStatusIconTemplate"/>
<li>
test
</li>
<xsl:if test="position() mod 3 = 0">
SEPARATOR
</xsl:if>
</xsl:for-each>
</ul>
</li>
</ul>
</xsl:template>
And when I DIRECTLY replace the "SEPARATOR" with:
</ul>
</li>
<li>
<ul class="separator">
in the aspx page (after compilation), everything is perfectly working, too.
Therefore, I am really lost with this situation as I really need this separator.
Thank you very much
The reason for the error is obvious: Any XSLT stylesheet must be a well-formed XML document and this provided stylesheet isn't. This is why even the XML parser that the XSLT processor uses to get its stylesheet module, raises a non-well-formedness exception.
In particular, this fragment:
<xsl:if test="position() mod 3 = 0">
</ul>
</li>
<li>
<ul class="separator">
</xsl:if>
isnt a well-formed XML fragment, becausethere isn't any start tag for the end tags </ul> and </li>.
Finally, here is a correct example of such positional grouping:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="num[position() mod 3 = 1]">
<group>
<xsl:copy-of select=
". | following-sibling::*[not(position() > 2)]"/>
</group>
</xsl:template>
<xsl:template match="num"/>
</xsl:stylesheet>
when this transformation is applied to the following XML document:
<nums>
<num>01</num>
<num>02</num>
<num>03</num>
<num>04</num>
<num>05</num>
<num>06</num>
<num>07</num>
<num>08</num>
<num>09</num>
<num>10</num>
</nums>
the wanted, correctly grouped result is produced:
<nums>
<group>
<num>01</num>
<num>02</num>
<num>03</num>
</group>
<group>
<num>04</num>
<num>05</num>
<num>06</num>
</group>
<group>
<num>07</num>
<num>08</num>
<num>09</num>
</group>
<group>
<num>10</num>
</group>
</nums>
At a surface level, your stylesheet is invalid because it is not well-formed XML.
At a deeper level, you have failed to understand that XSLT deals with XML as a tree of nodes. You are trying to think of <a> and </a> as two separate instructions, one of which writes a start tag to the output, the other writing an end tag to the output. That's the wrong mental model of how XSLT works. In fact <a>...</a> is the lexical representation of an element node in the stylesheet; the element node in the stylesheet is a single instruction, whose effect when evaluated is to write an element node to the result tree. Nodes are indivisible, and you can't separate the operation of writing a node into two parts, each of which writes half a node.
Your problem is a grouping problem. Grouping problems are much easier to solve in XSLT 2.0 than in 1.0 - but solutions are always possible even in 1.0, without departing from the XSLT processing model.
the only way to achieve the functionality you are asking for is the option 2 suggested by you put all the closing tag in xsl text and render it.
other wise you can not plase the closing tag in if condition that would be treated as error
I have found the solution:
In fact the problem was that inside a for-each, you cannot insert unmatched tags, therefore, to do it, you have to wrap them into:
<xsl:text disable-output-escaping="yes"><![CDATA[
any HTML in here will not be validated
]]></xsl:text>

Compare Author to UserID in SharePoint XSLT

I've got a simple DataFormWebPart where I'm using XSLT to render out the contents of list. I want to compare the #Author field each list item to the current user, however the following won't evaluate to true:
in the header of the XSL:
<xsl:param name="UserID" />
and within the template that evaluates the rows:
<xsl:value-of select="#Author" />
<xsl:if test="#AuthorID = $UserID">(you)</xsl:if>
I have values for both #Author and $UserID:
#Author renders as a hyperlink to their user-profile
$UserID renders as the same text, but without the hyperlink.
What expression can I use to get the non-hyperlink value of the user-profile?
Found a quick win:
<xsl:value-of select="contains(#Author,concat('>',$UserID,'<'))" />
Should refer
https://sharepoint.stackexchange.com/questions/21202/custom-form-does-not-display-created-by-value
<tr>
<td valign="top" class="ms-formlabel"><nobr>Created by</nobr></td>
<td valign="top" class="ms-formbody">
<SharePoint:CreatedModifiedInfo ControlMode="Display" runat="server">
<CustomTemplate>
<SharePoint:FormField FieldName="Author" runat="server" ControlMode="Display" DisableInputFieldLabel="true" /><br/>
<SharePoint:FieldValue FieldName="Modified" runat="server" ControlMode="Display" DisableInputFieldLabel="true"/>
</CustomTemplate>
</SharePoint:CreatedModifiedInfo>
</td>

Resources