docx4j reports differences on unchanged table data - jaxb

I have created a *.docx file with a 2x2 table, each cell containing the text Cell x-y where x=row number and y=column number.
When I pass this document through a simple transformation process, docx4j's Differencer.diff() method reports no differences (i.e. no w:ins or w:del tags).
This is expected and handled cleanly, inspite of the fact that the .docx has the text of the original document broken up like this inside the <w:tc> -> <w:p> tags:
<w:r>
<w:t>Cell</w:t>
</w:r>
<w:r>
<w:t xml:space="preserve"> 1-1</w:t>
</w:r>
and this in the transformed document:
<w:r>
<w:t xml:space="preserve">Cell 1-1</w:t>
</w:r>
However, if I add the text "Table Title" above the table in the document, the contents of the original document (Word's handling, nothing I can do about it) cells merges into one <w:r>:
<w:r>
<w:t>Cell 1-1</w:t>
</w:r>
And the only difference in the transformed document is that xml:space="preserve" is inserted:
<w:r>
<w:t xml:space="preserve">Cell 1-1</w:t>
</w:r>
However, docx4j's Differencer.diff() method now reports that the content of each cell is inserted, and shows the following as the content of each w:tc's w:p in the generated diff document:
<w:ins xmlns:xalan="http://xml.apache.org/xalan" xmlns:pkg="http://schemas.microsoft.com/office/2006/xmlPackage" w:date="2009-03-11T17:57:00Z" w:author="someone" w:id="1">
<w:r>
<w:t xml:space="preserve">Cell 1-1</w:t>
</w:r>
</w:ins>
and shows the content of each cell as deleted, immediately following the closing <w:tbl> tag:
<!--Handling simple deleted w:p-->
<w:p xmlns:xalan="http://xml.apache.org/xalan" xmlns:pkg="http://schemas.microsoft.com/office/2006/xmlPackage">
<w:del w:date="2009-03-11T17:57:00Z" w:author="someone" w:id="5">
<w:r>
<w:delText>Cell 1-1
</w:r>
</w:del>
</w:p>
I know that the Differencer is capable of ignoring the xml:space="preserve" attributes because it does so with the inserted text before the table, so I doubt that's the cause.
Are these table scenarios outside the intended use case for the Differencer? Is it an error in usage / invocation? Bug?
Any guidance is appreciated.

Related

Excel - parent-child from a list of values

I've a list a values (column-1) which contains multiple parents and childs levels and I need to associate in another column which is the parent of each cell.
Any ideas on how to do that easily in excel?
really thanks!
COLUMN-1 COLUMN-2
**A
A.01** **A**
**A.01.01** **A.01**
A.01.01.01 **A.01.01**
A.01.01.01.01 A.01.01.01
A.01.01.01.02 A.01.01.01
A.01.01.01.03 A.01.01.01
A.01.01.01.04 A.01.01.01
A.01.01.02 **A.01.01**
A.01.01.02.01 A.01.01.02
A.01.01.02.02 A.01.01.02
A.01.01.02.03 A.01.01.02
A.01.01.02.04 A.01.01.02
A.01.01.03 **A.01.01**
A.01.01.03.01 A.01.01.03
A.01.01.03.02 A.01.01.03
A.01.01.03.03 A.01.01.03
A.01.01.03.04 A.01.01.03
FINAL GOAL
Jos Woolley's approach looks likely.
For how to do it, see How can I perform a reverse string search in Excel without using VBA? (adapating . for spaces)

SVG - reuse a line node with <def> and <use>

My goal is to re-use a line node as follows:
<defs>
<desc>x1 and x2 values never change, would like to provide y1 and y2 in use</desc>
<line id="p" x1="5" x2="1019" stroke-width="1" stroke="#808080" opacity=".3"/>
</defs>
<use xlink:href="#p" y1="718.5" y2="718.5"/>
In learning SVG I thought any parameter provided in the use statement was passed to the template in the defs, but apparently not? According to W3 docs:
The ‘use’ element has optional attributes ‘x’, ‘y’, ‘width’ and ‘height’ which are used to map the graphical contents of the referenced element onto a rectangular region within the current coordinate system
However, 'use' is supposed to support "Any...graphical element...", line included. Well, line doesn't have x,y,width or height attributes; it has x1, y1, x2, y2.
Also this would preclude passing in all sorts of other attributes like stroke, stroke-width, etc.
Is the use statement really limited to just x,y,width, and height or is there another way to get attributes merged into the def template node?
Since line is neither an <svg> element nor a <symbol> element it is covered here:
In the generated content, the ‘use’ will be replaced by ‘g’, where all attributes from the ‘use’ element except for ‘x’, ‘y’, ‘width’, ‘height’ and ‘xlink:href’ are transferred to the generated ‘g’ element. An additional transformation translate(x,y) is appended to the end (i.e., right-side) of the ‘transform’ attribute on the generated ‘g’, where x and y represent the values of the ‘x’ and ‘y’ attributes on the ‘use’ element. The referenced object and its contents are deep-cloned into the generated tree.
So width and height are ignored and x and y become ways to translate the line. That's basically all you can do with it.

MarkLogic search:search not returning snippets

I am doing a search:search on a MarkLogic database. I can search on the term "pineal" and return 297 results with snippets. I can search on "city:Vancouver" and return 83 results with snippets. The query "pineal OR city:Vancouver" returns 374 results with snippets. However, the query "pineal AND city:Vancouver" returns a count of 6 results, but no result elements and no snippets. Any idea why I am not getting result text?
Thanks!
Ravi Har
I seem to have found the problem.
The xml being searched looks like this:
<lecture objectType="lecture">
<city>Vancouver</city>
<state>British Columbia</state>
<country>Canada</country>
<formattedTranscript>
<body class="lecture-transcript" xmlns="http://www.w3.org/1999/xhtml">
...
The city constraint looks like this:
<constraint name="city">
<range type="xs:string" facet="true">
<element ns="" name="city"/>
<facet-option>frequency-order</facet-option>
<facet-option>descending</facet-option>
</range>
</constraint>"
I had the following statement in my $options declaration:
<searchable-expression>
//(formattedTranscript|title|city|state|country|objectDate)
</searchable-expression>
When I take this statement out the search returns results as expected. I'm curious why the searchable-expression statement breaks the search results.
Thanks everyone for your comments.

sparql query about object to find another object

Given This RDF:
<?xml version="1.0" encoding="utf-8"?><!DOCTYPE rdf:RDF [<!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
<!ENTITY rdfs 'http://www.w3.org/2000/01/rdf-schema#'>
<!ENTITY xsd 'http://www.w3.org/2001/XMLSchema#'>]>
<rdf:RDF xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xml:base="http://www.example.org/"
xmlns:dnr="http://www.dotnetrdf.org/configuration#"
xmlns:nss="http://www.example.org/startTime"
xmlns:nse="http://www.example.org/endTime#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>
<rdf:Description rdf:about="Fadi">
<ns2914:be xmlns:ns2914="http://example.org/">May</ns2914:be>
<nss:startTime>00:00:13</nss:startTime>
<nse:endTime>00:00:16</nse:endTime>
</rdf:Description>
<rdf:Description rdf:about="Fadi">
<ns194:not xmlns:ns194="http://example.org/">Good</ns194:not>
<nss:startTime>00:00:19</nss:startTime>
<nse:endTime>00:00:21</nse:endTime>
</rdf:Description>
<rdf:Description rdf:about="She">
<ns195:be xmlns:ns195="http://example.org/">Good</ns195:be>
<nss:startTime>00:00:21</nss:startTime>
<nse:endTime>00:00:24</nse:endTime>
</rdf:Description>
</rdf:RDF>
how to get the startTime and endTime with query about Object?
i Tried to use:
PREFIX nss: <http://www.example.org/startTime>
PREFIX nse: <http://www.example.org/endTime#>
SELECT *
WHERE
{
?s ?p ?o .
FILTER(REGEX(?o, 'Good', 'i'))
?s nss:startTime ?startTime ;
nse:endTime ?endTime .
}
But it only gave me the first ?startTime and ?endTime For The Subject it find for Object Good.
I need The following answers:
?s,?p,?o,?startTime,?endTime
Fadi,not,Good,00:00:19,00:00:21
She,be,Good,00:00:21,00:00:24
Your query doesn't select that data so why are you surprised it isn't returned? As I suggested in the comment go read a good SPARQL tutorial like SPARQL by Example or pick up a copy of the excellent Learning SPARQL book from O'Reilly
The query you wrote selects triples where the object matches a regular expression and only those triples. If you want to select the start and end times as well you need to add additional patterns to your queries e.g.
PREFIX nss: <http://www.example.org/startTime>
PREFIX nse: <http://www.example.org/endTime#>
SELECT *
WHERE
{
?s ?p ?o .
FILTER(REGEX(?o, "May", "i"))
?s nss:startTime ?startTime ;
nse:endTime ?endTime .
}

Structure of docx field

A field in docx is represented this way.
<w:r>
<w:fldChar w:fldCharType="begin"/>
</w:r>
AAA
<w:r>
<w:instrText xml:space="preserve"> NOTEREF _Ref111111 \h </w:instrText>
</w:r>
BBB
<w:r>
<w:fldChar w:fldCharType="separate"/>
</w:r>
CONTENT
<w:r>
<w:fldChar w:fldCharType="end"/>
</w:r>
The field content goes to the CONTENT placeholder. My question is: can anything go to AAA or BBB? Or they are always empty? I suspect the creators of this format had something in mind to have four separator elements instead of just two, but I haven't seen any examples of using this.
It's better to think of it as only three separator elements and two slots for content, which can be complex thanks to the separators.
<w:r><w:fldChar w:fldCharType="begin"/></w:r>
LABEL
<w:r><w:fldChar w:fldCharType="separate"/></w:r>
VALUE
<w:r><w:fldChar w:fldCharType="end"/></w:r>
So your AAA and BBB are just extra content for the LABEL.
There's an example in the spec, where LABEL is:
<w:r><w:rPr><w:b/><w:color w:val="ED1C24"/><w:u w:val="single"/></w:rPr>
<w:instrText>D</w:instrText></w:r>
<w:r><w:instrText xml:space="preserve">ATE</w:instrText></w:r>
to make the D in DATE a different style.

Resources