Charset in Jsoup

I use Jsoup library.

After the execution of the following code:

Document doc = new Document(language);

File input = new File("filePath" + "filename.html");
PrintWriter writer = new PrintWriter(input, "UTF-8");

String contentType = "<%@ page contentType="text/html; charset=UTF-8" %>";
doc.appendText(contentType);

writer.write(doc.toString());
writer.flush();
writer.close();

In the output html file I receive the following line of text:

&lt;%@ page contentType=&quot;text/html; charset=UTF-8&quot; %&gt;

instead of

<%@ page contentType="text/html; charset=UTF-8" %>

What could be the problem?


Those are escape characters for preventing the browser from treating them as html tags. It's not a problem. It will render correctly when you open the page via a browser


Some problems here:

Document doc = new Document(language);

Don't do this. Use Jsoup.parse(...) instead.

<%@ page contentType="text/html; charset=UTF-8" %>

This is not HTML, and will probably not get parsed correctly.

Now, for your problem. You should use something like

Document document = Jsoup.parse(new ByteArrayInputStream(myHtmlString.getBytes(StandardCharsets.UTF_8)), "ISO-8859-1", BaseUrl);

Check this, this, and this for the outputSetting which you may need.

链接地址: http://www.djcxy.com/p/78474.html

上一篇: 使用JSch的Java文件传输

下一篇: 在Jsoup的字符集