使用ElementTree示例在Python中解析XML

2018-06-20 22:09:09

我很难找到一个很好的基本示例，说明如何使用Element Tree解析python中的XML。从我所能找到的，这似乎是用于解析XML的最简单的库。以下是我正在使用的XML示例：

<timeSeriesResponse>
    <queryInfo>
        <locationParam>01474500</locationParam>
        <variableParam>99988</variableParam>
        <timeParam>
            <beginDateTime>2009-09-24T15:15:55.271</beginDateTime>
            <endDateTime>2009-11-23T15:15:55.271</endDateTime>
        </timeParam>
     </queryInfo>
     <timeSeries name="NWIS Time Series Instantaneous Values">
         <values count="2876">
            <value dateTime="2009-09-24T15:30:00.000-04:00" qualifiers="P">550</value>
            <value dateTime="2009-09-24T16:00:00.000-04:00" qualifiers="P">419</value>
            <value dateTime="2009-09-24T16:30:00.000-04:00" qualifiers="P">370</value>
            .....
         </values>
     </timeSeries>
</timeSeriesResponse>

我能够用硬编码的方法做我所需要的。但我需要我的代码更具活力。这是什么工作：

tree = ET.parse(sample.xml)
doc = tree.getroot()

timeseries =  doc[1]
values = timeseries[2]

print child.attrib['dateTime'], child.text
#prints 2009-09-24T15:30:00.000-04:00, 550

以下是我尝试过的一些事情，他们都没有工作，报告说他们找不到timeSeries（或其他我尝试过的）：

tree = ET.parse(sample.xml)
tree.find('timeSeries')

tree = ET.parse(sample.xml)
doc = tree.getroot()
doc.find('timeSeries')

基本上，我想加载XML文件，搜索timeSeries标签，并遍历值标签，返回dateTime和标签本身的值; 在上面的例子中，我正在做的所有事情，但没有硬编码XML我感兴趣的部分。任何人都可以指向我的一些例子，或者给我一些关于如何解决这个问题的建议？

感谢所有的帮助。但是，对我提供的示例文件使用了以下两个建议，但它们不能在完整文件上工作。这是我使用Ed Carrel方法时从真实文件中得到的错误：

 (<type 'exceptions.AttributeError'>, AttributeError("'NoneType' object has no attribute 'attrib'",), <traceback object at 0x011EFB70>)

我觉得在真实文件中有一些它不喜欢的东西，所以我逐渐删除了一些东西，直到它工作。以下是我更改的行：

originally: <timeSeriesResponse xsi:schemaLocation="a URL I removed" xmlns="a URL I removed" xmlns:xsi="a URL I removed">
 changed to: <timeSeriesResponse>

 originally:  <sourceInfo xsi:type="SiteInfoType">
 changed to: <sourceInfo>

 originally: <geogLocation xsi:type="LatLonPointType" srs="EPSG:4326">
 changed to: <geogLocation>

删除具有'xsi：...'的属性解决了问题。 'xsi：...'是不是有效的XML？我很难以编程方式删除这些内容。任何建议的解决方法？

以下是完整的XML文件：http://www.sendspace.com/file/lofcpt

当我最初提出这个问题时，我并不知道XML中的命名空间。现在我知道发生了什么，我不必删除“xsi”属性，这是名称空间声明。我只是将它们包含在我的xpath搜索中。有关lxml中的命名空间的更多信息，请参阅此页面。

所以我现在在我的盒子上有了ElementTree 1.2.6，并且针对您发布的XML块运行以下代码：

import elementtree.ElementTree as ET

tree = ET.parse("test.xml")
doc = tree.getroot()
thingy = doc.find('timeSeries')

print thingy.attrib

并得到以下回：

{'name': 'NWIS Time Series Instantaneous Values'}

它似乎找到了timeSeries元素而不需要使用数字索引。

现在有用的是知道你的意思，当你说“它不工作”。由于在给定相同输入的情况下它适用于我，ElementTree不太可能以某种明显的方式被破坏。用任何错误消息，回溯或您可以提供的任何帮助我们帮助您的问题更新您的问题。

如果我正确理解你的问题：

for elem in doc.findall('timeSeries/values/value'):
    print elem.get('dateTime'), elem.text

或者如果你愿意（如果只有一次timeSeries/values ：

values = doc.find('timeSeries/values')
for value in values:
    print value.get('dateTime'), elem.text

findall()方法返回所有匹配元素的列表，而find()只返回第一个匹配元素。第一个示例遍历所有找到的元素，第二个循环遍历values元素的子元素，在这种情况下导致相同的结果。

但是我没有看到没有找到timeSeries的问题来自哪里。也许你只是忘了getroot()调用？（请注意，您并不需要它，因为如果您将路径表达式更改为例如/timeSeriesResponse/timeSeries/values或//timeSeries/values ，您也可以从elementtree本身进行工作）

链接地址: http://www.djcxy.com/p/58791.html

上一篇: Parsing XML in Python using ElementTree example

下一篇: lxml etree xmlparser remove unwanted namespace