Delphi, MSXML: how to retrieve node XML without the document namespace?

2018-07-02 16:21:40

I need to do some parsing and information retrieval from XML documents. The XML document is bound to an XML data binding, then parsed for specific elements. Once I have isolated the elements I need to dissect, I take each one in turn (lets call it E_parent) and try to identify the location of each non-text child element (E_child) within the overall XML text of E_parent and do some manipulation or other.

The problem I'm having, is that the XML document's namespace is added to the child elements' XML when they are accessed individually.

To give an example, say the original document looks like:

<?xml version="1.0" encoding="windows-1252"?>
<RootNode xml:lang="en" xmlns="urn:blah:names:blahblah">
<E_parent>Some text <E_child>child text</E_child> more parent text</E_parent>
</RootNode>
</xml>

When I try to access the XML from either the E_parent or E_child element by doing something like:

xmlParent := parentNode.XML;

I get:

<E_parent xmlns="urn:blah:names:blahblah">Some text <E_child>child text</E_child> more parent text</E_parent>

same thing if I try to access the XML for E_child, I get:

<E_child xmlns="urn:blah:names:blahblah">child text</E_child>

That's a problem when I then try to do a text search on the parent element, since the "real" text does not contain that namespace declaration:

Some text <E_child>child text</E_child> more parent text

So far, I've dealt with this by finding/deleting unwanted namespace attributes in the strings, but it's highly inefficient, and kind of ugly ;o) So, my question is, how can I retrieve the various nodes' XML from a bound XML document, without the document namespace being added to the tags?

=========

Thanks Remy, it was so obvious, I just need to start from a blank string and build it up rather than start from the inner XML!

Note though, that this is a better workaround than the one I had for this specific situation, but not quite what I wanted - obtaining the XML of elements without the namespace would still be useful for other things, such as logging, where I would want the exact XML of the node as it appears in the original document.

Use the DOM for processing E_parent's contents. Rather then retreiving the XML of E_parent and then searching for an E_child tag inside of it, use the DOM to determine what plain text exists in front of the E_child node (the plain text will have its own child node), and the length of that plain-text will tell you the exact text position of E_Child without needing to retreive E_parent's XML at all. E-parent will have multiple plain-text child nodes in the relevant positions for each section of untagged text.

In other words, given the XML you showed, the structure of the DOM will look something like this:

RootNode
|
-- E_parent
   |
   |- "Some text "
   |
   |- E_child
   |  |
   |  -- "child text"
   |
   -- " more parent text"

Another approach would be to use XPath to navigate your xml.

Given the sample XML

<?xml version="1.0" encoding="windows-1252"?>
<RootNode xml:lang="en" xmlns="urn:blah:names:blahblah">
<E_parent>Some text <E_child>child text</E_child> more parent text</E_parent>
</RootNode>

You could use the MSXML parser to navigate to your E_child element directly using a little bit of XPath. First you need to make your own copy of the MSXML2_TLB unit. The you can use Delphi code that looks something like this to access the E_child nodes:

uses MSXMLDOM,MSXML2_TLB;

procedure Sample;
var
  doc: IXMLDOMDocument2;
  root: IXMLDomElement;
  nodes: IXMLDOMNodeList;
  node: IXMLDOMNode;
begin

  doc := CoDOMDocument60.Create;
  doc.async := false;
  // Use same namespace as the default namespace here
  doc.setProperty('SelectionNamespaces', 'xmlns:t="urn:blah:names:blahblah"');
  doc.setProperty('SelectionLanguage', 'XPath');
  doc.loadXML(XmlSource.Text);

  root := doc.documentElement;
  nodes := root.selectNodes('//t:E_child');

  // Now thee nodes contains all E_child nodes
  // Processs them here
  // ...
end;

The key point is that you use a specific prefix for the documents default namespace for the XPath querying. The //t:E_child is the actual XPath expression used to find the E_child elements.

使用您的代码，然后使用Pos / PoxEx来查找E_Child元素的开始和结束。

var
  cStart, cEnd: Integer;
  ChildName, ChildText: string;
begin
  ... other code
  xmlParent := parentNode.XML;
  ChildName := 'E_Child';
  // Find starting position of child tag
  cStart := Pos('<' + E_Child, xmlParent);
  // You now have the opening <
  cEnd   := PosEx('</' + E_Child, xmlParent, cStart);
  // You now have the final < of the child.
  // Add the length of the child's name + the closing >
  Inc(cEnd, Length('</' + E_Child + '>'));
  // Grab the entire child XML
  ChildText := System.Copy(xmlParent, cStart, cEnd - cStart);
  // Do whatever you want with the child. For instance,
  // remove the original text.
  System.Delete(xmlParent, cStart, cEnd - cStart);
  // Replace it with new text
  System.Insert(NewChildText, xmlParent, cStart);
end;

链接地址: http://www.djcxy.com/p/91090.html

上一篇: Delphi XML遍历

下一篇: Delphi，MSXML：如何在没有文档名称空间的情况下检索节点XML？