guys
I'm using schematron and I need to do the following:
Sometimes in the xml document I want to validate, there's elements like this:
<Var.X name="B">
For these elements (which name() has a dot in the middle) I need to see in the xml file if there's a diretory named Var with a child element with the attribute name = X (in this case), like this:
<Var>
<Obj name="X">
</Var>
I thought of transforming the name() of those objects to a string representing the path, so for this case particularly:
Var.X would be /*/Var/child::*[#name="X"]
Having this string, then I wanted to check if there's, actually, an element belonging to the path the string represents, but I can't cast the string to path type, and I don't even know if that's possible...
Is there a simpler way of doing this?
You can also use the name-function without an saxon-Extension!
<rule context="*[matches(name(),'\w\.\w')]">
<let name="beforePoint" value="substring-before(name(),'.')"/>
<let name="afterPoint" value="substring-after(name(),'.')"/>
<assert test="/*/*[name() = $beforePoint]/*[#name=$afterPoint]">error message</assert>
</rule>
I've realised that what I wanted to achieve is done with saxon:evaluate function... and I already achieved what I wanted
Related
I have a situation in which an XML document has information in varying depth (according to S1000D schemas), and I'm looking for a generic method to extract correct sentences.
I need to interpret a simple element containing text as one individual part/sentence, and when an element that's containing text contains other elements that in turn contain text, I need to flatten/concatenate it into one string/sentence. The nested elements shall not be visited again if this is done.
Using Pythons lxml library and applying the tostring function works ok if the source XML is pretty-printed, so that I may split the concatenated string into new lines in order to get each sentence. If the source isn't pretty-printed, in one single line, there won't be any newlines to make the split.
I have tried the iter function and applying xpaths to each node, but this often renders other results in Python than what I get when applying the xpath in XMLSpy.
I have started down some of the following paths, and my question is if you have some input on which ones to continue on, or if you have other solutions.
I think I could use XSLT to preprocess the XML file, and then use a simpler Python script to divide the content into a list of sentence for further processing. Using Saxon with Python is now doable, but here I run into problems if the XML source contains entities that I cannot redirect Saxon to resolve (such as & nbsp;). I have no problem parsing files with lxml, so I tend to lean towards a cleaner Python solution.
lxml doesn't seem to have xpath support that can give me all nodes with text that contains one or more children containing text, and all nodes that are simple elements with no parents containing text nodes. Is there way to preprocess the parsed tree so that I can ensure it is pretty printed in memory, so that tostring works the same way for every XML file? Otherwise, my logic gives me one string for a document with no white space, and multiple sentences/strings if the source had been pretty printed. This doesn't feel ok.
What are my options? Use XSLT 1.0 in Python, other parsers to get a better handle on where I am in the tree, ...
Just to reiterate the issue here; I am looking for a generic way to extract text, and the only rules to the XML source are that a sentence may be built from an element with child elements with text, but there won't be additional levels. The other possibility is the simple element, but this one cannot be included in a parent element with text since this is included in the first rule.
Help/thoughts are appreciated.
This is a downright ugly code, a hastily hack with no real thought on form, beauty or finesse. All I am after is one way of doing this in Python. I'll tidy things up when I find a good solution that I want to keep. This is one possible solution so I figured I'd post it to see if someone can be kind enough to show me how to do this instead.
The problems has been to have xpath expressions that could get me all elements with text content, and then to act upon the depending on their context. All my xpath expressions has given me the correct nodes, but also a root, or ancestor that has pulled a more or less complete string at the beginning, so I gave up on those. My xpath functions as they should in XSLT, but not in Python - don't know why...
I had to revert to regex to find nodes that contains strings that are not white space only.
Using lxml with xpath and tostring gives different results depending on how the source XML is formatted, so I had to get around that.
The following formats have been tested:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<subroot>
<a>Intro, element a: <b>Nested b to be included in a, <c>and yet another nested c-element</c> and back to b.</b></a>
<!-- Comment -->
<a>Simple element.</a>
<a>Text with<b> 1st nested b</b>, back in a, <b>and yet another b-element</b>, before ending in a.</a>
</subroot>
</root>
<?xml version="1.0" encoding="UTF-8"?>
<root>
<subroot>
<a>Intro, element a: <b>Nested b to be included in a, <c>and yet another nested c-element,
</c> and back to b.</b>
</a>
<!-- Comment -->
<a>Simple element.</a>
<a>Text with<b> 1st nested b</b>, back in a, <b>and yet another b-element</b>, before ending in a.</a>
</subroot>
</root>
<?xml version="1.0" encoding="UTF-8"?><root><subroot><a>Intro, element a: <b>Nested b to be included in a, <c>and yet another nested c-element</c> and back to b.</b></a><!-- Comment --><a>Simple element.</a><a>Text with<b> 1st nested b</b>, back in a, <b>and yet another b-element</b>, before ending in a.</a></subroot></root>
Python code:
dmParser=ET.XMLParser(resolve_entities=False, recover=True)
xml_doc = r'C:/Temp/xml-testdoc.xml'
parsed = ET.parse(xml_doc)
for elem in parsed.xpath("//*[re:match(text(), '\S')]", namespaces={"re": "http://exslt.org/regular-expressions"}):
tmp = elem.xpath("parent::*[re:match(text(), '\S')]", namespaces={"re": "http://exslt.org/regular-expressions"})
if(tmp and tmp[0].text and tmp[0].text.strip()): #Two first checks can yield None, and if there is something check if only white space
continue #If so, discard this node
elif(elem.xpath("./*[re:match(text(), '\S')]", namespaces={"re": "http://exslt.org/regular-expressions"})): #If a child node also contains text
line =re.sub(r'\s+', ' ',ET.tostring(elem, encoding='unicode', method='text').strip()) #Replace all non wanted whitespace
if(line):
print(line)
else: #Simple element
print(elem.text.strip())
Always yields:
Intro, element a: Nested b to be included in a, and yet another nested c-element, and back to b.
Simple element.
Text with 1st nested b, back in a, and yet another b-element, before ending in a.
Given a Julia object of composite type, how can one determine its fields?
I know one solution if you're working in the REPL: First you figure out the type of the object via a call to typeof, then enter help mode (?), and then look up the type. Is there a more programmatic way to achieve the same thing?
For v0.7+
Use fieldnames(x), where x is a DataType. For example, use fieldnames(Date), instead of fieldnames(today()), or else use fieldnames(typeof(today())).
This returns Vector{Symbol} listing the field names in order.
If a field name is myfield, then to retrieve the values in that field use either getfield(x, :myfield), or the shortcut syntax x.myfield.
Another useful and related function to play around with is dump(x).
Before v0.7
Use fieldnames(x), where x is either an instance of the composite type you are interested in, or else a DataType. That is, fieldnames(today()) and fieldnames(Date) are equally valid and have the same output.
suppose the object is obj,
you can get all the information of its fields with following code snippet:
T = typeof(obj)
for (name, typ) in zip(fieldnames(T), T.types)
println("type of the fieldname $name is $typ")
end
Here, fieldnames(T) returns the vector of field names and T.types returns the corresponding vector of type of the fields.
I'm interested in knowing the data structure that a phonebook would use. One that contains objects with fields like a name string, a number string, etc. and allows searching (and partial searching, like the first few letters of the name) via ALL the fields.
What is the method that a phonebook would use? I was thinking it would be some version of a tree, but I'm having difficulty wrapping my head around efficient methods of doing so.
You could use an Array of Maps:
ArrayList<Map<String, String>> a;
// ...
a.get(i).get("name")
But XML is much better:
org.w3c.dom is quite easy to use and XML is extremely simple to save to a file etc.
<contacts>
<contact name="..." phone="..." />
</contacts>
or
<contacts>
<contact>
<name>...</name>
<phone>...</phone>
</contact>
</contacts>
I have started using .net API for yaml and it seems to be helpful. However I have few questions and wondering if you can provide some sample/work around for the same.
(1) I have an object consisting 4 strings I would like to serialize its collection (List or String[]). I wrote a helper method to return me the strings in the format I want, however it adds an extra single quote before and after the string. So I am getting
-'{str1: str2, str3: str4}'
-'{str5: str6, str7: str8}'
instead of
-{str1: str2, str3: str4}
-{str5: str6, str7: str8}
Can you suggest any workarounds?
(2) I am trying to insert xaml as a string in a yaml document. My xaml is well formed xml but when I serialize it, it cuts before 3rd last element. Any idea why?
Regarding the first question, if you are serializing an array of strings, then it is normal that each element is quoted because it starts with a '{'. In this case, you should be serializing the list of objects directly instead of converting them to string first.
Regarding the second question, you should add some code to the question to clarify what you are doing.
As of scala 2.10, the following interpolation is possible.
val name = "someName"
val interpolated = s"Hello world, my name is $name"
Now it is also possible defining custom string interpolations, as you can see in the scala documentation in the "Advanced usage" section here http://docs.scala-lang.org/overviews/core/string-interpolation.html#advanced_usage
Now then, my question is... is there a way to obtain the original string, before interpolation, including any interpolated variable names, from inside the implicit class that is defining the new interpolation for strings?
In other words, i want to be able to define an interpolation x, in such a way that when i call
x"My interpolated string has a $name"
i can obtain the string exactly as seen above, without replacing the $name part, inside the interpolation.
Edit: on a quick note, the reason i want to do this is because i want to obtain the original string and replace it with another string, an internationalized string, and then replace the variable values. This is the main reason i want to get the original string with no interpolation performed on it.
Thanks in advance.
Since Scala's string interpolation can handle arbitrary expressions within ${} it has to evaluate the arguments before passing them to the formatting function. Thus, direct access to the variable names is not possible by design. As pointed out by Eugene, it is possible to get the name of a plain variable by using macros. I don't think this is a very scalable solution, though. After all, you'll lose the possibility to evaluate arbitrary expressions. What, for instance, will happen in this case:
x"My interpolated string has a ${"Mr. " + name}"
You might be able to extract the variable name by using macros but it might get complicated for arbitrary expressions. My suggestions would be: If the name of your variable should be meaningful within the string interpolation, make it a part of the data structure. For example, you can do the following:
case class NamedValue(variableName: String, value: Any)
val name = NamedValue("name", "Some Name")
x"My interpolated string has a $name"
The objects are passed as Any* to the x. Thus, you now can match for NamedValue within x and you can do specific things depending on the "variable name", which now is part of your data structure. Instead of storing the variable name explicitly you could also exploit a type hierarchy, for instance:
sealed trait InterpolationType
case class InterpolationTypeName(name: String) extends InterpolationType
case class InterpolationTypeDate(date: String) extends InterpolationType
val name = InterpolationTypeName("Someone")
val date = InterpolationTypeDate("2013-02-13")
x"$name is born on $date"
Again, within x you can match for the InterpolationType subtype and handle things according to the type.
It seems that's not possible. String interpolation seems like a compile feature that compiles the example to:
StringContext("My interpolated string has a ").x(name)
As you can see the $name part is already gone. It became really clear for me when I looked at the source code of StringContext: https://github.com/scala/scala/blob/v2.10.0/src/library/scala/StringContext.scala#L1
If you define x as a macro, then you will be able to see the tree of the desugaring produced by the compiler (as shown by #EECOLOR). In that tree, the "name" argument will be seen as Ident(newTermName("name")), so you'll be able to extract a name from there. Be sure to take a look at macro and reflection guides at docs.scala-lang.org to learn how to write macros and work with trees.