Java Unescaping XML/HTML before JAXB parsing doesn't work

Can anyone help me? In HTML/XML: A numeric character reference refers to a character by its Universal Character Set/Unicode code point, and uses the format: &#nnnn; or &#x hhhh; I have to unescape (convert to unicode) these references before I use the JAXB parser. When I use Apache StringEscapeUtils.unescapeXml() also &amp ; and &gt ; and &lt ; are unescaped, and

Java在JAXB解析之前无法解决XML / HTML问题

谁能帮我? 在HTML / XML中: 数字字符引用通过其通用字符集/ Unicode代码点引用字符,并使用以下格式: &#NNNN; 或&#x hhhh; 在使用JAXB解析器之前,我必须将这些引用unescape(转换为unicode)。 当我使用Apache StringEscapeUtils.unescapeXml()时& 和> 和< 没有转义,而且这不是我想要的,因为解析将会失败。 有没有只将&#nnnn转换为unicode的库? 但是,其余的不会让其他人失望?

Handling Empty Tags in XML using Sax Parser, Java

I'm Using Sax parser to handle a pre written xml file....i have no way of changing the xml as it is held by another application but need to parse data from it. The Xml file contains a Tag < ERROR_TEXT/> which is empty when no error is occurred. as a result the parser takes the next character after the tag close which is "n". i have tried result.replaceAll("n", &qu

使用Sax Parser,Java处理XML中的空标签

我使用Sax解析器来处理预先编写的xml文件....我没有办法改变xml,因为它由另一个应用程序持有,但需要从它解析数据。 Xml文件包含一个Tag <ERROR_TEXT />,当没有错误发生时它是空的。 结果解析器在标签关闭之后的下一个字符是“ n”。 我已经尝试过result.replaceAll(“ n”,“”); 和result.replaceAll(“ n”,“”); 如何让sax识别这是一个空标签并将值返回为“”? 你没有。 SAX的工作是解析数据,而不是决定数据的

XML parsing with SAX: how to handle html as text within xml

I get an xml response from an external server. Using some tutorials I got SAX-Parser working. There is a small problem still remaining. Within the response there is eg description tag containing html like this: <description><p><strong>Title</strong></p>Description</description> After parsing description field of my object contains only "<"

用SAX解析XML:如何在xml中将html作为文本处理

我从外部服务器得到一个xml响应。 使用一些教程,我得到了SAX-Parser的工作。 还有一个小问题仍然存在。 在响应中有例如包含html这样的描述标签: <description><p><strong>Title</strong></p>Description</description> 解析我的对象的描述字段后只包含“<”。 有没有可能告诉我的解析器将html处理为纯文本? 或者也许有其他可能性来解决这个问题。 谢谢。 既然你不包含

SAX handling special characters

I'm trying to parse an XML file with Java and SAX for an android device. I got from the internet and while parsing it I'm getting an ExpatException : not well-formed (invalid token) on the character "é". Is there a way to handle those characters without having to change all the specials characters in the xml file? edit : Here is the part of my code writing the file to my SDc

SAX处理特殊字符

我正在尝试使用Java和SAX为Android设备解析XML文件。 我从互联网上获得,并解析它时,我得到一个ExpatException:对字符“é”没有格式良好(无效标记)。 有没有办法处理这些字符,而不必更改XML文件中的所有特殊字符? 编辑:这是我的代码写入我的SDcard文件的一部分。 File SDCardRoot = Environment.getExternalStorageDirectory(); File f = new File(SDCardRoot,"edt.xml"); f.createNewFile();

Stripping Invalid XML characters in Java

I have an XML file that's the output from a database. I'm using the Java SAX parser to parse the XML and output it in a different format. The XML contains some invalid characters and the parser is throwing errors like 'Invalid Unicode character (0x5)' Is there a good way to strip all these characters out besides pre-processing the file line-by-line and replacing them? So far

在Java中剥离无效的XML字符

我有一个XML文件,它是数据库的输出。 我正在使用Java SAX解析器来解析XML并以不同的格式输出它。 XML包含一些无效字符,解析器抛出错误,如'无效的Unicode字符(0x5)' 除了预先逐行处理文件并替换它们之外,是否有一种很好的方法可以去除所有这些字符? 到目前为止,我已经遇到了3个不同的无效字符(0x5,0x6和0x7)。 这是一个大约4GB的数据库转储,我们将要处理它很多次,所以每次我们得到一个新的转储以运行预

Make DocumentBuilder.parse ignore DTD references

When I parse my xml file (variable f) in this method, I get an error C:Documents and SettingsjoeDesktopaicpcudevOnlineModulemap.dtd (The system cannot find the path specified) I know I do not have the dtd, nor do I need it. How can I parse this File object into a Document object while ignoring DTD reference errors? private static Document getDoc(File f, String docId) throws Exception{ D

使DocumentBuilder.parse忽略DTD引用

当我在这个方法中解析我的xml文件(变量f)时,出现错误 C: Documents and Settings joe Desktop aicpcudev OnlineModule map.dtd(系统找不到指定的路径) 我知道我没有dtd,也不需要它。 我该如何解析这个File对象到一个Document对象中而忽略DTD引用错误? private static Document getDoc(File f, String docId) throws Exception{ DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); D

DTD parsing with Stax

i want to parse xml files which declare a HTML 4.01 Doctype. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> [...] </html> I using Stax and an XMLResolver for load local dtd XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance(); xmlInputFactory.setXMLResolver(new LocalXmlResolver()); xmlOutputFactory = XMLOutputFacto

使用Stax进行DTD分析

我想解析声明HTML 4.01文档类型的XML文件。 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> [...] </html> 我使用Stax和XMLResolver来加载本地dtd XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance(); xmlInputFactory.setXMLResolver(new LocalXmlResolver()); xmlOutputFactory = XMLOutputFactory.newInstance(); xmlOutputFactory

Conflict between Spring and XOM

In my Java program, I made a class that uses XOM to read XML files. I am also using Spring. When the line: ApplicationContext ctx = new ClassPathXmlApplicationContext("dataIO-beans.xml"); is executed, I get an exception that includes: javax.xml.parsers.ParserConfigurationException: Unable to validate using XSD: Your JAXP provider [org.apache.xerces.jaxp.DocumentBuilderFactoryImpl@4d4

Spring与XOM之间的冲突

在我的Java程序中,我创建了一个使用XOM读取XML文件的类。 我也在使用Spring。 当行: ApplicationContext ctx = new ClassPathXmlApplicationContext("dataIO-beans.xml"); 被执行,我得到一个异常,其中包括: javax.xml.parsers.ParserConfigurationException: Unable to validate using XSD: Your JAXP provider [org.apache.xerces.jaxp.DocumentBuilderFactoryImpl@4d48f152] does not support XML Schema. A

DTD download error while parsing XHTML document in XOM

I am trying to parse an HTML document with the doctype declared to use the transitional dtd as follows: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> When I do Builder.build on the document, I get the following exception: java.io.IOException: Server returned HTTP response code: 503 for URL

在XOM中解析XHTML文档时DTD下载错误

我试图解析一个HTML文档,声明的doctype使用过渡性dtd,如下所示: <!DOCTYPE html PUBLIC“ - // W3C // DTD XHTML 1.0 Transitional // EN”“http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”> 当我在文档上执行Builder.build时,出现以下异常: java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd at sun.net.ww

SAX character buffer size

I'm trying to use Sax to parse very large XML files. 100's of megs. The problem is the Parser reads in exactly 2048 characters at a time and terminates. I get a los of tag's value splitted into two parts using the callback "public void characters(...)". For example, the first part is in the character array on position 2044 with length 4 "2013" and the second pa

SAX字符缓冲区大小

我试图用Sax来解析非常大的XML文件。 百万的megs。 问题是解析器一次只能读取2048个字符并终止。 我使用回调“public void characters(...)”得到了标签值的分解成两部分的问题。 例如,第一部分位于长度为4“2013”​​的位置2044上的字符数组中,第二部分位于长度为6的位置0上的“-09-30”。它应该是日期值“2013-09-30”如果收到一部分。 何可以避免这种分裂? 任何人都可以帮助我? public void characters(char[] ch, int