Parse XML with Python Etree and Return Specified Tag Regardless of Namespace

2018-06-02 20:09:31

I am working with some XML data that, in some locations in each file, redefines the namespace. I'm trying to pull all tags of a specific type from the document regardless of the namespace that's active at the point where the tag resides in the XML.

I'm using findall('.//{namespace}Tag') to find the elements I'm looking for. But never knowing what the {namespace} will be at any given point in the file, makes it hit or miss whether I'll get all the requested Tags returned or not.

Is there a way to return all the Tag elements regardless of the {namespace} they fall under? Something along the lines of findall('.//{wildcard}Tag') ?

The xpath function of lxml supports local-name()!

Here is a Python 3 example:

import io
from lxml import etree
xmlstring = '''<root
xmlns:m="http://www.w3.org/html4/"
xmlns:n="http://www.w3.org/html5/">
<m:table>
  <m:tr>
    <m:name>Sometext</m:name>
  </m:tr>
</m:table>
<n:table>
  <n:name>Othertext</n:name>
</n:table>
</root>'''
root = etree.parse(io.StringIO(xmlstring))
names = root.xpath("//*[local-name() = 'name']")
for name in names:
    print(name.text)

Your question might have been aswered before at: lxml etree xmlparser namespace problem

链接地址: http://www.djcxy.com/p/10078.html

上一篇: 如何使用Twitter搜索API获取Tweets的位置

下一篇: 使用Python Etree解析XML并返回指定的标记而不考虑命名空间