How do I parse XML in Python?
I have many rows in a database that contains xml and I'm trying to write a Python script that will go through those rows and count how many instances of a particular node attribute show up. For instance, my tree looks like:
<foo>
<bar>
<type foobar="1"/>
<type foobar="2"/>
</bar>
</foo>
How can I access the attributes 1 and 2 in the XML using Python?
I suggest ElementTree
. There are other compatible implementations of the same API, such as lxml
, and cElementTree
in the Python standard library itself; but, in this context, what they chiefly add is even more speed -- the ease of programming part depends on the API, which ElementTree
defines.
After building an Element instance e
from the XML, eg with the XML function, or by parsing a file with something like
import xml.etree.ElementTree
e = xml.etree.ElementTree.parse('thefile.xml').getroot()
or any of the many other ways shown at ElementTree
, you just do something like:
for atype in e.findall('type'):
print(atype.get('foobar'))
and similar, usually pretty simple, code patterns.
minidom
is the quickest and pretty straight forward:
XML:
<data>
<items>
<item name="item1"></item>
<item name="item2"></item>
<item name="item3"></item>
<item name="item4"></item>
</items>
</data>
PYTHON:
from xml.dom import minidom
xmldoc = minidom.parse('items.xml')
itemlist = xmldoc.getElementsByTagName('item')
print(len(itemlist))
print(itemlist[0].attributes['name'].value)
for s in itemlist:
print(s.attributes['name'].value)
OUTPUT
4
item1
item1
item2
item3
item4
你可以使用BeautifulSoup
from bs4 import BeautifulSoup
x="""<foo>
<bar>
<type foobar="1"/>
<type foobar="2"/>
</bar>
</foo>"""
y=BeautifulSoup(x)
>>> y.foo.bar.type["foobar"]
u'1'
>>> y.foo.bar.findAll("type")
[<type foobar="1"></type>, <type foobar="2"></type>]
>>> y.foo.bar.findAll("type")[0]["foobar"]
u'1'
>>> y.foo.bar.findAll("type")[1]["foobar"]
u'2'
链接地址: http://www.djcxy.com/p/5084.html
上一篇: 如何解析Java中的JSON
下一篇: 如何解析Python中的XML?