Wih ElementTree, I can print every occurences of a specific tag (in my case ExpertSettingsSg
):
#!/usr/bin/env python3
import xml.etree.ElementTree as ET
root = ET.parse('mydoc.xml').getroot()
for children in root:
value=children.findall('.//ExpertSettingsSg')#tag I'm looking for
for settings in value:
if settings.text is not None:
print(settings.text)
But I didn't find a way to print the path of the occurence. Because my XML file has many levels and because ExpertSettingsSg can be almost at every level, I need to know where the ExpertSettingsSg come from. I'm looking for something like
Path to config xxxxxx = /root/xxx/aaaa/bbbb
If it's not possible with ElementTree, does any other library do the trick?
Thanks
If you already have the nodes, you can walk the tree and collect paths (borrowing the example from #valdi-bo):
from xml.etree import ElementTree as ET
txt ='''<main>
<x>
<a>
<ExpertSettingsSg id="1">x1</ExpertSettingsSg>
</a>
<b>
<dummy>xxxx</dummy>
</b>
</x>
<y>
<c>
<dummy>xxxx</dummy>
</c>
<d>
<ExpertSettingsSg id="2">x2</ExpertSettingsSg>
</d>
<e>
<ExpertSettingsSg id="3"/>
</e>
</y>
</main>'''
def node_walk(root: ET.Element):
path_to_node = []
node_stack = [root]
while node_stack:
node = node_stack[-1]
if path_to_node and node is path_to_node[-1]:
path_to_node.pop()
node_stack.pop()
yield (path_to_node, node)
else:
path_to_node.append(node)
for child in reversed(node):
node_stack.append(child)
root = ET.ElementTree(ET.fromstring(txt))
for node in root.findall('.//ExpertSettingsSg'):
for node_path, n in node_walk(root.getroot()):
if n is node:
xpath = "/".join(["."] + [n.tag for n in node_path[1:]] + [n.tag])
print(xpath, node)
# NOTE: Assert is to just show that the xpath is correct.
assert root.getroot().find(xpath) == node
You would get output like this:
./x/a/ExpertSettingsSg <Element 'ExpertSettingsSg' at 0x102cf5b80>
./y/d/ExpertSettingsSg <Element 'ExpertSettingsSg' at 0x102cf5db0>
./y/e/ExpertSettingsSg <Element 'ExpertSettingsSg' at 0x102cf5e50>
Instead of walking multiple times, we can walk once and collect all relevant nodes with path, like this:
xpaths = []
for node_path, n in node_walk(root.getroot()):
if n.tag == "ExpertSettingsSg":
xpath = "/".join(["."] + [n.tag for n in node_path[1:]] + [n.tag])
xpaths.append(xpath)
for xpath in xpaths:
node = root.getroot().find(xpath)
print(xpath, node)
Related
First, I want to capture pose values by subscribing to teleop_key from a turtle. Then I want to change these captured values and publish to a second turtle. The problem is that I couldn't capture the pose values as a global variables. And due to this I couldn't change the variables and published the modified ones.
I think I have an almost finished code. That's why I'm going to throw them all out directly.
#!/usr/bin/env python3
from turtlesim.msg import Pose
from geometry_msgs.msg import Twist
import rospy as rp
global pos_l_x,pos_l_y,pos_l_z,pos_a_x,pos_a_y,pos_a_z
def pose_callback(msg):
rp.loginfo("("+ str(msg.x) + "," + str(msg.y) + "," + str(msg.theta)+ ")")
pos_l_x = msg.x
pos_l_y = msg.y
pos_a_z = msg.theta
if __name__ == '__main__':
rp.init_node("turtle_inverse")
while not rp.is_shutdown():
sub = rp.Subscriber("/turtlesim1/turtle1/pose", Pose, callback= pose_callback)
rate = rp.Rate(1)
rp.loginfo("Node has been started")
cmd = Twist()
cmd.linear.x = -1*pos_l_x
cmd.linear.y = -1*pos_l_y
cmd.linear.z = 0
cmd.angular.x = 0
cmd.angular.y = 0
cmd.angular.z = -1*pos_a_z
pub = rp.Publisher("/turtlesim2/turtle1/cmd_vel", Twist, queue_size=10)
try:
pub.publish(cmd)
except rp.ServiceException as e:
rp.logwarn(e)
rate.sleep()
rp.spin()
And I did the connection between turtle1 and turtle2 in the lunch file below:
<?xml version="1.0"?>
<launch>
<group ns="turtlesim1">
<node pkg="turtlesim" type="turtlesim_node" name="turtle1">
<remap from="/turtle1/cmd_vel" to="vel_1"/>
</node>
<node pkg="turtlesim" type="turtle_teleop_key" name="Joyistic" output= "screen">
<remap from="/turtle1/cmd_vel" to="vel_1"/>
</node>
</group>
<group ns="turtlesim2">
<node pkg="turtlesim" type="turtlesim_node" name="turtle1">
</node>
</group>
<node pkg="turtlesim" type="mimic" name="mimic">
<remap from="input" to="turtlesim1/turtle1"/>
<remap from="output" to="turtlesim2/turtle1"/>
</node>
</launch>
And lastly here my package.xml code:
<?xml version="1.0"?>
<package format="2">
<name>my_robot_controller</name>
<version>0.0.0</version>
<description>The my_robot_controller package</description>
<!-- One maintainer tag required, multiple allowed, one person per tag -->
<!-- Example: -->
<!-- <maintainer email="jane.doe#example.com">Jane Doe</maintainer> -->
<maintainer email="(I delete it for sharing)">enes</maintainer>
<!-- One license tag required, multiple allowed, one license per tag -->
<!-- Commonly used license strings: -->
<!-- BSD, MIT, Boost Software License, GPLv2, GPLv3, LGPLv2.1, LGPLv3 -->
<license>TODO</license>
<buildtool_depend>catkin</buildtool_depend>
<build_depend>rospy</build_depend>
<build_depend>turtlesim</build_depend>
<build_depend>geometry_msgs</build_depend>
<build_export_depend>rospy</build_export_depend>
<build_export_depend>turtlesim</build_export_depend>
<build_export_depend>geometry_msgs</build_export_depend>
<exec_depend>rospy</exec_depend>
<exec_depend>turtlesim</exec_depend>
<exec_depend>geometry_msgs</exec_depend>
<export>
<!-- Other tools can request additional information be placed here -->
</export>
</package>
Not: I work in catkin workspace the mistake couldn't be here because I run many different code without trouble
As pointed out by one of the commenters, you need to declare you pos values are global inside your callback function. In Python variables must be declared global within the scope they are going to be used; i.e. function scope. When this doesn't happen, the interpreter doesn't know you to use global variables and simple creates a local variable. Note that this is only for assignment operations, so it does not need to be done when you get ready to publish. Take the following example:
def pose_callback(msg):
rp.loginfo("("+ str(msg.x) + "," + str(msg.y) + "," + str(msg.theta)+ ")")
global pos_l_x, pos_l_y, pos_a_z
pos_l_x = msg.x
pos_l_y = msg.y
pos_a_z = msg.theta
As another note, this will most likely break since the global variables will not always be assigned before trying to be used. So you should assign them at the very top of the file. Finally, you should not be declaring a subscriber in the main run loop. It should be done once right after node_init.
It is done !!!
#!/usr/bin/env python3
from turtlesim.msg import Pose
from geometry_msgs.msg import Twist
import rospy as rp
pos_l_x,pos_l_y,pos_l_z,pos_a_x,pos_a_y,pos_a_z = 0,0,0,0,0,0
def pose_callback(msg):
rp.loginfo("("+ str(msg.linear.x) + "," + str(msg.linear.y) + "," + str(msg.angular.z)+ ")")
global pos_l_x,pos_l_y,pos_l_z,pos_a_x,pos_a_y,pos_a_z
pos_l_x = msg.linear.x
pos_l_y = msg.linear.y
pos_l_z = msg.linear.z
pos_a_x = msg.angular.x
pos_a_y = msg.angular.y
pos_a_z = msg.angular.z
if __name__ == '__main__':
rp.init_node("turtle_inverse")
sub = rp.Subscriber("/turtlesim1/turtle1/cmd_vel", Twist, callback= pose_callback)
rate = rp.Rate(1)
rp.loginfo("Node has been started")
while not rp.is_shutdown():
cmd = Twist()
cmd.linear.x = -1*pos_l_x
cmd.linear.y = -1*pos_l_y
cmd.linear.z = -1*pos_l_z
cmd.angular.x = -1*pos_a_x
cmd.angular.y = -1*pos_a_y
cmd.angular.z = -1*pos_a_z
pub = rp.Publisher("/turtlesim2/turtle1/cmd_vel", Twist, queue_size=10)
try:
pub.publish(cmd)
except rp.ServiceException as e:
pass
pos_l_x,pos_l_y,pos_l_z,pos_a_x,pos_a_y,pos_a_z = 0,0,0,0,0,0
rate.sleep()
rp.spin()
I can an xml file and loop through the root printing, but root.iter('tag'), root.find('tag') and root.findall('tag') will not work.
Here is a sample of the XML:
<?xml version='1.0' encoding='UTF-8'?>
<cpe-list xmlns:config="http://scap.nist.gov/schema/configuration/0.1" xmlns="http://cpe.mitre.org/dictionary/2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:scap-core="http://scap.nist.gov/schema/scap-core/0.3" xmlns:cpe-23="http://scap.nist.gov/schema/cpe-extension/2.3" xmlns:ns6="http://scap.nist.gov/schema/scap-core/0.1" xmlns:meta="http://scap.nist.gov/schema/cpe-dictionary-metadata/0.2" xsi:schemaLocation="http://scap.nist.gov/schema/cpe-extension/2.3 https://scap.nist.gov/schema/cpe/2.3/cpe-dictionary-extension_2.3.xsd http://cpe.mitre.org/dictionary/2.0 https://scap.nist.gov/schema/cpe/2.3/cpe-dictionary_2.3.xsd http://scap.nist.gov/schema/cpe-dictionary-metadata/0.2 https://scap.nist.gov/schema/cpe/2.1/cpe-dictionary-metadata_0.2.xsd http://scap.nist.gov/schema/scap-core/0.3 https://scap.nist.gov/schema/nvd/scap-core_0.3.xsd http://scap.nist.gov/schema/configuration/0.1 https://scap.nist.gov/schema/nvd/configuration_0.1.xsd http://scap.nist.gov/schema/scap-core/0.1 https://scap.nist.gov/schema/nvd/scap-core_0.1.xsd">
<generator>
<product_name>National Vulnerability Database (NVD)</product_name>
<product_version>4.4</product_version>
<schema_version>2.3</schema_version>
<timestamp>2021-05-21T03:50:31.204Z</timestamp>
</generator>
<cpe-item name="cpe:/a:%240.99_kindle_books_project:%240.99_kindle_books:6::~~~android~~">
<title xml:lang="en-US">$0.99 Kindle Books project $0.99 Kindle Books (aka com.kindle.books.for99) for android 6.0</title>
<references>
<reference href="https://play.google.com/store/apps/details?id=com.kindle.books.for99">Product information</reference>
<reference href="https://docs.google.com/spreadsheets/d/1t5GXwjw82SyunALVJb2w0zi3FoLRIkfGPc7AMjRF0r4/edit?pli=1#gid=1053404143">Government Advisory</reference>
</references>
<cpe-23:cpe23-item name="cpe:2.3:a:\$0.99_kindle_books_project:\$0.99_kindle_books:6:*:*:*:*:android:*:*"/>
</cpe-item>
<cpe-item name="cpe:/a:%40thi.ng%2fegf_project:%40thi.ng%2fegf:-::~~~node.js~~">
<title xml:lang="en-US">#thi.ng/egf Project #thi.ng/egf for Node.js</title>
<references>
<reference href="https://github.com/thi-ng/umbrella/security/advisories/GHSA-rj44-gpjc-29r7">Advisory</reference>
<reference href="https://www.npmjs.com/package/#thi.ng/egf">Version</reference>
</references>
<cpe-23:cpe23-item name="cpe:2.3:a:\#thi.ng\/egf_project:\#thi.ng\/egf:-:*:*:*:*:node.js:*:*"/>
</cpe-item>
</cpe-list>
The followig Python (3.7) code works:
import xml.etree.ElementTree as ET
infile = open(filename, "r")
xml = infile.read()
infile.close()
parser = ET.XMLParser(encoding="utf-8")
root = ET.fromstring(xml, parser=parser)
print(root.tag)
for child in root:
print(child.tag)
Output:
{http://cpe.mitre.org/dictionary/2.0}cpe-list
{http://cpe.mitre.org/dictionary/2.0}cpe-item
{http://cpe.mitre.org/dictionary/2.0}cpe-item
{http://cpe.mitre.org/dictionary/2.0}cpe-item
{http://cpe.mitre.org/dictionary/2.0}cpe-item
...
But when I try:
for item in root.iter('cpe-item') or for item in root.iter('cpe-list'), nothing loops. When I try for item in root.findall('cpe-item') or for item in root.findall('cpe-list'), nothing loops. If I try item = root.find('cpe-list'), item = None.
I don't work with XML very often, but this seems so strage to me since I have some example code of other projects where this works perfectly fine. Many other examples online show this exact process is the correct process.
What is am I doing wrong?
It seems odd to me that when I print(root.tag) or print(child.tag) there is something before the tag prints. I don't know why that is happening.
You are getting entangled with namespaces. A lot has been written about it and starting here may be a good place.
As for you specific example, the tl;dr is to disregard them altogether. For example:
for item in root.findall('.//{*}cpe-item'):
print(item.tag)
Another option is to bite the bullet and declare the namespaces:
ns = {"xx":"http://cpe.mitre.org/dictionary/2.0"}
for item in root.findall('.//xx:cpe-item', ns):
print(item.tag)
output is
{http://cpe.mitre.org/dictionary/2.0}cpe-item
{http://cpe.mitre.org/dictionary/2.0}cpe-item
I have an xml like this:
<library>
<content content-id="title001">
<content-links>
<content-link content-id="Number1" />
<content-link content-id="Number2" />
</content-links>
</content>
<content content-id="title002">
<content-links>
<content-link content-id="Number3" />
</content-links>
</content>
<content content-id="Number1">
<content-links>
<content-link content-id="Number1b" />
</content-links>
</content
</library>
I would need to get all the content-id that are linked to specific content-id titles. For example, for this case I would need all the ids that are linked for title001 (I might need for more titles, so it would be a list of titles that need to be found). And all these ids be added to a list that would look like:
[title001, Number1, Number2, Number1b]
So I guess that I need to recursively check every content and then get the content-id from the content-link to go to the next content and check in this one all the content-link going to the next one until the xml is completely read.
I am not able to find the recursive solution to this.
Adding the code that I got until now for this:
from lxml import etree as et
def get_ids(content):
"""
"""
content_links = content.findall('content-links/content-link')
print(content_links)
if content_links:
for content_link in content_links:
print(content_link,content_link.get('content-id'))
cl = content_link.get('content-id')
cont = x.find(f'content[#id="{cl}"]')
if cont is not None:
get_ids(cont)
if __name__ == '__main__':
"""
"""
x = et.fromstring(xml)
ids = ['title001']
for id in ids:
content = x.find(f'content[#id="{content-id}"]')
get_ids(content)
Try the following code:
from lxml import etree as et
parser = et.XMLParser(remove_blank_text=True)
tree = et.parse('Input.xml', parser)
root = tree.getroot()
cidList = ['title001'] # Your source list
cidDct = { x: 0 for x in cidList }
for elem in root.iter('content'):
cid = elem.attrib.get('content-id', '')
# print(f'X {elem.tag:15} cid:{cid}')
if cid in cidDct.keys():
# print(f'** Found: {cid}')
for elem2 in elem.iter():
if elem2 is not elem:
cid2 = elem2.attrib.get('content-id', '')
# print(f'XX {elem2.tag:15} cid:{cid2}')
if len(cid2) > 0:
# print(f'** Add: {cid2}')
cidDct[cid2] = 0
For the test you may uncomment printouts above.
Now when you print list(cidDct.keys()), you will get the
wanted ids:
['title001', 'Number1', 'Number2', 'Number1b']
I got a big problem managing data in xml archives in python. I need the value in the tag ValorConta1 but I only have the value in NumeroConta which is child of PlanoConta.
<InfoFinaDFin>
<NumeroIdentificadorInfoFinaDFin>15501</NumeroIdentificadorInfoFinaDFin>
...
<PlanoConta>
<NumeroConta>2.02.01</NumeroConta>
</PlanoConta>
...
<ValorConta1>300</ValorConta1>
The code I write:
import xml.etree.ElementTree as ET
InfoDin = ET.parse('arquivos_xml/InfoFinaDFin.xml')
target_element_value = '2.01.01'
passivo = InfoDin.findall('.//PlanoConta[NumeroConta="' + target_element_value +'"]/../ValorConta1')
Try this.
from simplified_scrapy import SimplifiedDoc
html = '''
<InfoFinaDFin>
<NumeroIdentificadorInfoFinaDFin>15501</NumeroIdentificadorInfoFinaDFin>
...
<PlanoConta>
<NumeroConta>2.02.01</NumeroConta>
</PlanoConta>
...
<ValorConta1>300</ValorConta1>
</InfoFinaDFin>
'''
doc = SimplifiedDoc(html)
# print (doc.select('PlanoConta>NumeroConta>text()'))
# print (doc.select('ValorConta1>text()'))
ele = doc.NumeroConta.parent.getNext('ValorConta1')
# or
ele = doc.getElementByTag('ValorConta1',start='</NumeroConta>')
print (ele.text)
Result:
300
Here are more examples:https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples
So I created this Folder C:\TempFiles to test run the following code snippet
Inside this folder i had two files -> nd1.txt, nd2.txt and a folder C:\TempFiles\Temp2, inside which i had only one file nd3.txt
Now when i execute this code:-
import os,file,storage
database = file.dictionary()
tools = storage.misc()
lui = -1 # last used file index
fileIndex = 1
def sendWord(wrd, findex): # where findex is the file index
global lui
if findex!=lui:
tools.refreshRecentList()
lui = findex
if tools.mustIgnore(wrd)==0 and tools.toRecentList(wrd)==1:
database.addWord(wrd,findex) # else there's no point adding the word to the database, because its either trivial, or has recently been added
def showPostingsList():
print("\nPOSTING's LIST")
database.display()
def parseFile(nfile, findex):
for line in nfile:
pl = line.split()
for word in pl:
sendWord(word.lower(),findex)
def parseDirectory(dirname):
global fileIndex
for root,dirs,files in os.walk(dirname):
for name in dirs:
parseDirectory(os.path.join(root,name))
for filename in files:
nf = open(os.path.join(root,filename),'r')
parseFile(nf,fileIndex)
print(" --> "+ nf.name)
fileIndex+=1
nf.close()
def main():
dirname = input("Enter the base directory :-\n")
print("\nParsing Files...")
parseDirectory(dirname)
print("\nPostings List has Been successfully created.\n",database.entries()," word(s) sent to database")
choice = ""
while choice!='y' and choice!='n':
choice = str(input("View List?\n(Y)es\n(N)o\n -> ")).lower()
if choice!='y' and choice!='n':
print("Invalid Entry. Re-enter\n")
if choice=='y':
showPostingsList()
main()
Now I should Traverse the three files only once each, and i put a print(filename) to test that, but apparently I am traversing the inside folder twice:-
Enter the base directory :-
C:\TempFiles
Parsing Files...
--> C:\TempFiles\Temp2\nd3.txt
--> C:\TempFiles\nd1.txt
--> C:\TempFiles\nd2.txt
--> C:\TempFiles\Temp2\nd3.txt
Postings List has Been successfully created.
34 word(s) sent to database
View List?
(Y)es
(N)o
-> n
Can Anyone tell me how to modify the os.path.walk() as such to avoid the error
Its not that my output is incorrect, but its traversing over one entire folder twice, and that's not very efficient.
Your issue isn't specific to Python 3, it's how os.walk() works - iterating already does the recursion to subfolders, so you can take out your recursive call:
def parseDirectory(dirname):
global fileIndex
for root,dirs,files in os.walk(dirname):
for filename in files:
nf = open(os.path.join(root,filename),'r')
parseFile(nf,fileIndex)
print(" --> "+ nf.name)
fileIndex+=1
nf.close()
By calling parseDirectory() for the dirs, you were starting another, independant walk of your only subfolder.