Extracting data from website with XSLT

2018-07-01 20:41:28

I'm trying to learn XSLT and I came across a problem. The thing I would like to do is to extract some data from a website, transform it with xslt templates and finally show it in my own xhtml page.

Lets say i have a xml file (this will be my xhtml site):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<?xml-stylesheet type="text/xsl" href="myXSLTFile.xsl"?>


<!--here I want to have markup produced by xslt file-->

The question is how to achieve this? I want my xslt file to work on nodes from a particular web site (for example http://www.example.com) and produce result into my own xml file.

If you find my explanation confusing please ask and I'll try to explain that problem a little better.

EDIT. I'll give an example. Lets say we have this page: http://www.w3.org/TR/xhtml1/. I want to Develop XSLT document extracting titles of chapters and sections from Full table of contents and putting them into a table in my own xml file. The thing I have problem with is how to reference page: http://www.w3.org/TR/xhtml1/ in my xslt file so that it works on its nodes (this page is written in xhtml so I don't have to worry about transforming html to xml).

EDIT2. After further research it seems as though Thomas W.'s answer is the solution to the problem, but you have to deal with XSS problems (tips in LarsH's answer).

In theory, you can do something like

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="test.xsl"?>
<page href="http://www.w3.org/TR/xslt/index.htm"/>

and have a stylesheet like

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    xmlns="http://www.w3.org/1999/xhtml"
    xmlns:h="http://www.w3.org/1999/xhtml">

  <xsl:template match="/">
    <html>
      <head></head>
      <body>
        <xsl:for-each select="document(*/@href)//h:h2">
          <xsl:copy-of select="."/>
        </xsl:for-each>
      </body>
    </html>
  </xsl:template>

</xsl:stylesheet>

But this doesn't really work across browsers (Chrome only, as it seems to me). One reason might be XSS security features that block loading the foreign page.

A couple of ways to get around XSS restrictions... see AJAX and Cross-Site Scripting to Read the Header

Add a local PHP or other server page to proxy to the other web site.

Use CORS.

链接地址: http://www.djcxy.com/p/88832.html

上一篇: Google多语言网站地图问题

下一篇: 使用XSLT从网站提取数据