Parse HTML "style" attribute using Java

I have HTML code parsed to org.w3c.dom.Document . I need check all tag style attributes, parse them, change some CSS properties and put modified style definition back to attribute.

Is there any standard ways to parse style attribute? How can I use classes and interfaces from org.w3c.dom.css package?

I need a Java solution.


If you want a way to do this without any dependencies you can use the javax.swing.text.html package classes to get you most of the way there:

import javax.swing.text.html.*;

StyleSheet styleSheet = new StyleSheet()
AttributeSet dec = ss.getDeclaration("margin:2px;padding:3px");
Object marginLeft = dec.getAttribute(CSS.Attribute.MARGIN_LEFT);
String marginLeftString = marginLeft.toString(); // "2px"

This returns a StyleSheet.CssValue , which is unfortunately not public. Thus the need to convert it to a String. Also, it won't handle em units. It is sort of smart about various styles, though. Not ideal, but avoids dependencies.


First, I would check out the classes in the javax.xml packages. The javax.xml.parsers package contains parsers for two styles of parsing: SAXParser and DocumentBuilder. It sounds like you want the DocumentBuilder to create a DOM. You can either traverse the DOM manually (slow and painful), or you can use the XPath standard to look up elements in the DOM. Java support for that is in javax.xml.xpath .

XPathExpression xpath = XPath.compile("//@style");
Object results = xpath.evaluate(dom, XPathConstants.NODESET);

It's your responsibility to cast the results to the NodeList and iterate properly, but its the most direct way to get at what you want. Check out Java's DOM API for more information about reading and changing values.

I don't believe there is any support for a CSS parser built into Java, but you can look at these projects:

  • http://www.w3.org/Style/CSS/SAC/Overview.en.html
  • http://cssparser.sourceforge.net/
  • That may help you with your goals. NOTE: the Batik CSS parser is incorporated into the larger Apache Batik project: http://xmlgraphics.apache.org/batik/index.html which may have more than what you need, but it's a corporate friendly license.


    I'm not sure I completely understand your requirements, but basically, you'll have to:

  • Read the stylesheet(s) and extract the CSS rules.
  • Read the HTML page(s) and find the attributes.
  • Substitute the new CSS properties for the old CSS properties.
  • Write the HTML page(s).
  • It looks like you would use the CSSStyleSheet interface to extract the CSS rules from the sytlesheet(s).

    链接地址: http://www.djcxy.com/p/3642.html

    上一篇: 我如何测试使用VisualTreeHelper的东西?

    下一篇: 使用Java解析HTML“style”属性