Charset in Jsoup
I use Jsoup library.
After the execution of the following code:
Document doc = new Document(language);
File input = new File("filePath" + "filename.html");
PrintWriter writer = new PrintWriter(input, "UTF-8");
String contentType = "<%@ page contentType="text/html; charset=UTF-8" %>";
doc.appendText(contentType);
writer.write(doc.toString());
writer.flush();
writer.close();
In the output html file I receive the following line of text:
<%@ page contentType="text/html; charset=UTF-8" %>
instead of
<%@ page contentType="text/html; charset=UTF-8" %>
What could be the problem?
Those are escape characters for preventing the browser from treating them as html tags. It's not a problem. It will render correctly when you open the page via a browser
Some problems here:
Document doc = new Document(language);
Don't do this. Use Jsoup.parse(...)
instead.
<%@ page contentType="text/html; charset=UTF-8" %>
This is not HTML, and will probably not get parsed correctly.
Now, for your problem. You should use something like
Document document = Jsoup.parse(new ByteArrayInputStream(myHtmlString.getBytes(StandardCharsets.UTF_8)), "ISO-8859-1", BaseUrl);
Check this, this, and this for the outputSetting which you may need.
链接地址: http://www.djcxy.com/p/78474.html上一篇: 使用JSch的Java文件传输
下一篇: 在Jsoup的字符集