Specific pathing to find XML elements using minidom in Python

2018-06-10 02:13:46

As per this thread, I am using xml.dom.minidom to do some very basic XML traversing, read-only.

What confuses me is why its getElementsByTagName is finding nodes several hierarchy levels deep without explicitly supplying it with their exact path.

XML:

<data>
    <items>
        <item name="item1"></item>
        <item name="item2"></item>
        <item name="item3"></item>
        <item name="item4"></item>
    </items>
    <secondSetOfItems>
        <item name="item5"></item>
        <item name="item6"></item>
        <item name="item7"></item>
        <item name="item8"></item>
    </secondSetOfItems>
</data>

Python code:

xmldoc = minidom.parse('sampleXML.xml')
items = xmldoc.getElementsByTagName('item') 

for item in items:
    print item.attributes['name'].value

Prints:

item1
item2
item3
item4
item5
item6
item7
item8

What bothers me is that it implicitly finds tags named item under both data->items as well as data->secondSetOfItems .

How do I make it follow an explicit path and only extract items under one of the two categories? Eg under data->secondSetOfItems :

item5
item6
item7
item8

If you want to get items from a specific category, you can do so by grabbing the parent element first.

For example:

Code :

xmldoc = minidom.parse('sampleXML.xml')
#Grab the first occurence of the "secondSetOfItems" element
second_items = xmldoc.getElementsByTagName("secondSetOfItems")[0]
item_list = second_items.getElementsByTagName("item")

for item in item_list:
    print item.attributes['name'].value

Output :

item5
item6
item7
item8

this is the declared behavior of getElementsByTagName

Search for all descendants (direct children, children's children, etc.) with a particular element type name.

some wrote a "filter" on it, see this answer

seem to me that minidom is too simple, consider using lxml xpath:

tree.xpath('//secondSetOfItems/item/@name')

or BeautifulSoup findAll:

data.secondSetOfItems.item.findAll('name')

链接地址: http://www.djcxy.com/p/29954.html

上一篇: Python XML解析XML属性

下一篇: 使用Python中的minidom查找XML元素的具体路径