get attributes of a node xquery - attributes

I'm trying to get all attributes of a node using the following code in xquery, logically it should work but it isn't
for $n in $nodes
return $n/#*

An attribute node must be an attribute of an element node and the result tree cannot contain a standalone attribute node.
If you want to produce a sequence of name - value for each attribute, do:
for $attr in $nodes/#*
return
(name($attr), string($attr), '
')
For example, given the following XML document:
<t topA="1">
<a x="2" z="3">
<b message="Hello"/>
</a>
<c y="5"/>
</t>
and applying this query to it:
for $nodes in //*,
$attr in $nodes/#*
return
(name($attr), string($attr), '
')
the result is:
topA 1
x 2
z 3
message Hello
y 5

Related

Find if text exist inside a nested Div, if yes print out the whole string, Selenium Python

i'm very new to selenium(3.141.0) and python3, and i got a problem that couldn't figure it out.
The html looks similar to this
<div class='a'>
<div>
<p><b>ABC</b></p>
<p><b>ABC#123</b></p>
<p><b>XYZ</b></p>
<div>
</div>
I want selenium to find if # exist inside that div, (can not target the paragraph only element because sometime the text i want to extract is inside different element BUT it's always inside that <div class='a'>) If # exist => print the whole <p><b>ABC#123</b></p> (or sometime <div>ABC#123<div> )
To find an element with contained text, you must use an XPath. From what you are describing, it looks like you want the locator
//div[#class='a']//*[contains(text(),'#')]
^ a DIV with class 'a'
^ that has a descendant element that contains the text '#' within itself or a descendant
The code would look something like
for e in driver.find_elements(By.XPATH, "//div[#class='a']//*[contains(text(),'#')]"):
print(e.get_attribute('outerHTML')
and it will print all instances of <b>ABC#123</b>, <div>ABC#123</div>, or <p>ABC#123</p>, whichever exists

How to parse the only the second span tag in an HTML document using python bs4

I want to parse only one span tag in my html document. There are three sibling span tags without any class or I'd. I am targeting the second one only using BeautifulSoup 4.
Given the following html document:
<div class="adress">
<span>35456 street</span>
<span>city, state</span>
<span>zipcode</span>
</div>
I tried:
for spn in soup.findAll('span'):
data = spn[1].text
but it didn't work. The expected result is the text in the second span stored in a a variable:
data = "city, state"
and how to to get both the first and second span concatenated in one variable.
You are trying to slice an individual span (a Tag instance). Get rid of the for loop and slice the findAll response instead, i.e.
>>> soup.findAll('span')[1]
<span>city, state</span>
You can get the first and second tags together using:
>>> soup.findAll('span')[:2]
[<span>35456 street</span>, <span>city, state</span>]
or, as a string:
>>> "".join([str(tag) for tag in soup.findAll('span')[:2]])
'<span>35456 street</span><span>city, state</span>'
Another option:
data = soup.select_one('div > span:nth-of-type(2)').get_text(strip=True)
print(data)
Output:
city, state

Extracting an xml value in Groovy

I have this code
String dbresponse = '''
<rows>
<row>
<file_data>One</file_data>
<time_inserted>2019-01-30T10:29:20.543</time_inserted>
</row>
<row>
<file_data>two</file_data>
<time_inserted>2019-01-30T10:29:20.547</time_inserted>
</row>
<row>
<file_data>three</file_data>
<time_inserted>2019-01-30T10:29:20.550</time_inserted>
</row>
<row>
<file_data>four</file_data>
<time_inserted>2019-01-30T10:29:20.550</time_inserted>
</row>
<row>
<file_data>five</file_data>
<time_inserted>2019-01-30T10:29:20.553</time_inserted>
</row>
</rows>
'''
def response = new XmlSlurper().parseText(dbresponse)
def data = response.rows.row[1].file_data
print data
I have two questions:
1] With the above code why am I not getting the response of: two ?
2] How do I iterate through the entire xml doc to get this response:
one
two
three
four
five
Thanks
1] With the above code why am I not getting the response of: two ?
As per the Official Groovy doc it should be
def rows = new XmlSlurper().parseText(dbresponse)
println(rows.row[1].file_data)
First line is "parsing the XML an returning the root node as a GPathResult". In your case, the root node is rows
2] How do I iterate through the entire xml doc to get this response: one two three four five
println("Iterating using each() method")
rows.row.file_data.each { row ->
println(row)
}
println("Iterating using Groovy for loop")
for (fileData in rows.row.file_data) {
println(fileData)
}
println("Getting a list of necessary elements using Groovy Spread operator")
def fileDataList = rows.row*.file_data
println(fileDataList)
Output:
Iterating using each() method
One
two
three
four
five
Iterating using Groovy for loop
One
two
three
four
five
Getting a list of necessary elements using Groovy Spread operator
[One, two, three, four, five]
Here is how it works:
def rows = new XmlSlurper().parseText(dbresponse)
print (rows.row[1])
print (rows.row[1].file_data)
The identifier, rows, gives a handle on the object returned when parsing dbresponse (<rows> in this case). I named it rows since this is the convention slurp'ers use; it doesn't have to be.
You are almost there, just trivial.
In the script you posted, it was trying to extract only first row data. That's why remaining data not shown.
Here is the script you can get all the data
def response = new XmlSlurper().parseText(dbresponse)
def data = response.'**'.findAll{it.name() =='row'}*.file_data*.text()
println data
You can quickly try it online Demo

Groovy XmlParser / XmlSlurper: node.localText() position?

I have a follow-up question for this question: Groovy XmlSlurper get value of the node without children.
It explains that in order to get the local inner text of a (HTML) node without recursively get the nested text of potential inner child nodes as well, one has to use #localText() instead of #text().
For instance, a slightly enhanced example from the original question:
<html>
<body>
<div>
Text I would like to get1.
extra stuff
Text I would like to get2.
link to example
Text I would like to get3.
</div>
<span>
extra stuff
Text I would like to get2.
link to example
Text I would like to get3.
</span>
</body>
</html>
with the solution applied:
def tagsoupParser = new org.ccil.cowan.tagsoup.Parser()
def slurper = new XmlSlurper(tagsoupParser)
def htmlParsed = slurper.parseText(stringToParse)
println htmlParsed.body.div[0].localText()[0]
would return:
[Text I would like to get1., Text I would like to get2., Text I would like to get3.]
However, when parsing the <span> part in this example
println htmlParsed.body.span[0].localText()
the output is
[Text I would like to get2., Text I would like to get3.]
The problem I am facing now is that it's apparently not possible to pinpoint the location ("between which child nodes") of the texts. I would have expected the second invocation to yield
[, Text I would like to get2., Text I would like to get3.]
This would have made it clear: Position 0 (before child 0) is empty, position 1 (between child 0 and 1) is "Text I would like to get2.", and position 2 (between child 1 and 2) is "Text I would like to get3." But given the API works as it does, there is apparently no way to determine whether the text returned at index 0 is actually positioned at index 0 or at any other index, and the same is true for all the other indices.
I have tried it with both XmlSlurper and XmlParser, yielding the same results.
If I'm not mistaken here, it's as a consequence also impossible to completely recreate an original HTML document using the information from the parser because this "text index" information is lost.
My question is: Is there any way to find out those text positions? An answer requiring me to change the parser would also be acceptable.
UPDATE / SOLUTION:
For further reference, here's Will P's answer, applied to the original code:
def tagsoupParser = new org.ccil.cowan.tagsoup.Parser()
def slurper = new XmlParser(tagsoupParser)
def htmlParsed = slurper.parseText(stringToParse)
println htmlParsed.body.div[0].children().collect {it in String ? it : null}
This yields:
[Text I would like to get1., null, Text I would like to get2., null, Text I would like to get3.]
One has to use XmlParser instead of XmlSlurper with node.children().
I don't know jsoup, and i hope it is not interfering with the solution, but with a pure XmlParser you can get an array of children() which contains the raw string:
html = '''<html>
<body>
<div>
Text I would like to get1.
extra stuff
Text I would like to get2.
link to example
Text I would like to get3.
</div>
<span>
extra stuff
Text I would like to get2.
link to example
Text I would like to get3.
</span>
</body>
</html>'''
def root = new XmlParser().parseText html
root.body.div[0].children().with {
assert get(0).trim() == 'Text I would like to get1.'
assert get(0).getClass() == String
assert get(1).name() == 'a'
assert get(1).getClass() == Node
assert get(2) == '''
Text I would like to get2.
'''
}

XSLT equation using SP list fields

I have a SP Dataview that I have converted to XSLT, so that I could add a header displaying a percentage (Complete). Before I converted the dvwp to xslt, I added two count headers- one on Complete, and another on LastName. They worked wonderfully- showing me the # of records and the # of records with a value in the complete field. However, when I converted the dv to xslt I realized that I lost my headers :(
So, I am adding them back in using xslt. Currently the XPath code for the equation that I have is <xsl:value-of select="count($Rows) div count($Rows)" />.
How do I get the total # of Yes values that are in my Complete field?
UPDATE1:
Found this http://www.endusersharepoint.com/STP/viewtopic.php?f=14&t=534 and tried it, however causes the following error- Failed setting processor stylesheet: 0x80004005: Argument 1 must return a node-set. -->count(/dsQueryResponse/Rows/Row='Y')<--
UPDATE2:
Complete is the name of a field w/i my XSLT dataset. The return type is either Y or blank. For grins I tried <xsl:value-of select="count(/xpath/to/parent/element[#Complete eq 'Y']) div count($Rows)" /> however I recieved the following error- Failed setting processor stylesheet: 0x80004005: Expected token ']' found 'NAME'.count((/xpath/to/parent/element[#Complete -->eq <--'Y']) div count($Rows) Am starting to think that there may be a problem w/ 'eq'.... Referencing my XML operators...
UPDATE3:
<xsl:value-of select="count(/xpath/to/parent/element[#Complete = 'Y']) div count($Rows)" />
Okay so it still says 0, but I think the reason why it's not showing the correct answer is b/c it is expecting to show an integer, and obviously the value being returned from the equation is going to be a decimal... Have been fiddling with the equation in XPath... here's what I've tried-
count(/xpath/to/parent/element[#Complete = 'Y']) div count($Rows)*100
(count(/xpath/to/parent/element[#Complete = 'Y']) div count($Rows))*100
100(count(/xpath/to/parent/element[#Complete = 'Y']) div count($Rows))
UPDATE4:
So I know my previous thought that the correct number not showing b/c it was a float is not correct, as all numbers in XPath and XSLT 1.0 are floats. Reference
UPDATE5:
Upon further investigation, I have found that the problem lies with the count(/xpath/to/parent/element[#Complete = 'Y']) part of my equation, as this is returning 0 instead of a value. [i know i have at least 3 'Y' vals in my Complete col]
UPDATE6:
<records*>
<record*>
<last_name></last_name>
<first_name></first_name>
<mi></mi>
<office_symbol></office_symbol>
<geo_location></geo_location>
<complete></complete>
<date_complete></date_complete>
<date_expires></date_expires>
<email></email>
<supervisor></supervior>
</record*>
</records*>
*i don't know what these nodes are called as my data is coming from a database and not an xml file, i just made up record/records
UPDATE7
Going back to my original question. I am still trying to find out the XPath equation to display the number of parents (record in the XML i posted above) where the complete node = Y.
UPDATE8
Ok. So I have edited and tested using http://www.w3schools.com/xsl/tryxslt.asp?xmlfile=cdcatalog&xsltfile=tryxsl_value-of. Working XSLT to count the # of Complete = Y is <xsl:value-of select="count(catalog/cd [complete = 'Y'])" /> so theen I put EXACTLY what works on W3schools into my SP Dataview and I get nothing... just an empty space. Why doesn't the code work in my SPDV?
If your "Complete" field is an element:
<xsl:value-of select="count(/xpath/to/complete/field/element[string(.) eq 'Yes])"/>
If your complete field is an attribute of an element:
<xsl:value-of select="count(/xpath/to/parent/element[#complete eq 'Yes'])"/>
Without knowing the structure of your XML I can't provide the specific XPATH required -- the predicate "[]" is what selects only the "Yes" values

Resources