Ignoring DTD when parsing XML

How can I ignore the DTD declaration when parsing file with XOM xml library. My file has the following line :

<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dtd">
//rest of stuff here 

And when I try to build() my document I get a filenotfound exception for the DTD file. I know I don't have this file and I don't care about it, so how can it be removed when using XOM?

Here is a code snippet:

public BlastXMLParser(String filePath) {
    Builder b = new Builder(false);
     //not a good idea to have exception-throwing code in constructor
    try {

        _document = b.build(filePath);
    } catch (ParsingException ex) {
        Logger.getLogger(BlastXMLParser.class.getName()).log(Level.SEVERE,"err", ex);
    } catch (IOException ex) {
        //
    }

private Elements getBlastReads() {
    Element root = _document.getRootElement();
    Elements rootChildren = root.getChildElements();

    for (int i = 0; i < rootChildren.size(); i++) {
        Element child = rootChildren.get(i);
        if (child.getLocalName().equals("BlastOutput_iterations")) {

            return child.getChildElements();
        }
    }

    return null;
}
}

I get a NullPointerException at this line:

Element root = _document.getRootElement();

With the DTD line removed from the source XML file I can successfully parse it, but this is not an option in the final production system.


The preferred solution would be to implement an EntityResolver that intercepts requests for the DTD and redirects these to an embedded copy. If you

  • don't have access to the DTD and
  • are absolutely sure you won't need it (apart from validation it might also declare character entities that are used in the document) and
  • you are using the Xerces XML Parser implementation
  • you can disable fetching of DTD by setting the corresponding SAX feature. In XOM this should be possible by passing an XMLReader to the Builder constructor like this:

    import org.xml.sax.XMLReader;
    import org.xml.sax.helpers.XMLReaderFactory;
    
    ...
    
    XMLReader xmlreader = XMLReaderFactory.createXMLReader();
    xmlreader.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
    Builder builder = new Builder(xmlreader);
    

    According to their documentation this is the way to parse document without any validation.

    try {
      Builder parser = new Builder();
      Document doc = parser.build("http://www.cafeconleche.org/");
    }
    catch (ParsingException ex) {
      System.err.println("Cafe con Leche is malformed today. How embarrassing!");
    }
    catch (IOException ex) {
      System.err.println("Could not connect to Cafe con Leche. The site may be down.");
    }
    

    If you do want to validate XML schema you have to call new Builder(true) :

    try {
      Builder parser = new Builder(true);
      Document doc = parser.build("http://www.cafeconleche.org/");
    }
    catch (ValidityException ex) {
      System.err.println("Cafe con Leche is invalid today. (Somewhat embarrassing.)");
    }
    catch (ParsingException ex) {
      System.err.println("Cafe con Leche is malformed today. (How embarrassing!)");
    }
    catch (IOException ex) {
      System.err.println("Could not connect to Cafe con Leche. The site may be down.");
    }
    

    Pay attention that now yet another exception can be thrown: ValidityException

    链接地址: http://www.djcxy.com/p/5948.html

    上一篇: XmlPullParser.END的模糊解释

    下一篇: 解析XML时忽略DTD