Headless Browser and scraping

I'm trying to put list of possible solutions for browser automatic tests suits and headless browser platforms capable of scraping.


BROWSER TESTING / SCRAPING:

  • Selenium - polyglot flagship in browser automation, bindings for Python, Ruby, JavaScript, C#, Haskell and more, IDE for Firefox (as an extension) for faster test deployment. Can act as a Server and has tons of features.
  • JAVASCRIPT

  • PhantomJS - JavaScript , headless testing with screen capture and automation, uses Webkit . As of version 1.8 Selenium's WebDriver API is implemented, so you can use any WebDriver binding and tests will be compatible with Selenium
  • SlimerJS - similar to PhantomJS, uses Gecko (Firefox) instead of WebKit
  • CasperJS - JavaScript , build on both PhantomJS and SlimerJS, has extra features
  • Ghost Driver - JavaScript implementation of the WebDriver Wire Protocol for PhantomJS .
  • new PhantomCSS - CSS regression testing. A CasperJS module for automating visual regression testing with PhantomJS and Resemble.js.
  • new WebdriverCSS - plugin for Webdriver.io for automating visual regression testing
  • new PhantomFlow - Describe and visualize user flows through tests. An experimental approach to Web user interface testing.
  • new trifleJS - ports the PhantomJS API to use the Internet Explorer engine.
  • new CasperJS IDE (commercial)
  • NODE.JS

  • Node-phantom - bridges the gap between PhantomJS and node.js
  • WebDriverJs - Selenium WebDriver bindings for node.js by Selenium Team
  • WD.js - node module for WebDriver/Selenium 2
  • yiewd - WD.js wrapper using latest Harmony generators! Get rid of the callback pyramid with yield
  • ZombieJs - Insanely fast, headless full-stack testing using node.js
  • NightwatchJs - Node JS based testing solution using Selenium Webdriver
  • Chimera - Chimera: can do everything what phantomJS does, but in a full JS environment
  • Dalek.js - Automated cross browser testing with JavaScript through Selenium Webdriver
  • Webdriver.io - better implementation of WebDriver bindings with predefined 50+ actions
  • Nightmare - Electron bridge with a high-level API.
  • jsdom - Tailored towards web scraping. A very lightweight DOM implemented in Node.js, it supports pages with javascript.
  • WEB SCRAPING / MINING

  • Scrapy - Python , mainly a scraper/miner - fast, well documented and, can be linked with Django Dynamic Scraper for nice mining deployments, or Scrapy Cloud for PaaS (server-less) deployment, works in terminal or an server stand-alone proces, can be used with Celery , built on top of Twisted
  • Snailer - node.js module, untested yet.
  • Node-Crawler - node.js module, untested yet.
  • ONLINE TOOLS

  • new Online HTTP client - Dedicated SO answer
  • dead CasperBox - Run CasperJS scripts online

  • RELATED LINKS & RESOURCES

  • Comparsion of Webscraping software
  • new Resemble.js : Image analysis and comparison
  • Questions:

  • Any pure Node.js solution or Nodejs to PhanthomJS/CasperJS module that actually works and is documented?
  • Answer: Chimera seems to go in that direction, checkout Chimera

  • Other solutions capable of easier JavaScript injection than Selenium?

  • Do you know any pure ruby solutions?

  • Answer: Checkout the list created by rjk with ruby based solutions

  • Do you know any related tech or solution?
  • Feel free to reedit this question and add content as you wish! Thank you for your contributions!


    Updates

  • added SlimerJS to the list
  • added Snailer and Node-Crawler and Node-phantom
  • added Yiewd WebDriver wrapper
  • added WebDriverJs and WD.js
  • added Ghost Driver
  • added Comparsion of Webscraping software on Screen Scraper Blog
  • added ZombieJs
  • added Resemble.js and PhantomCSS and PhantomFlow, categorised and reedited content
  • 04.01.2014, added Chimera, answered 2 questions
  • added NightWatchJs
  • added DalekJS
  • added WebdriverCSS
  • added CasperBox
  • added trifleJS
  • added CasperJS IDE
  • added Nightmare
  • added jsdom
  • added Online HTTP client, updated CasperBox (dead)

  • If Ruby is your thing, you may also try:

  • https://github.com/chriskite/anemone (dev stopped)
  • https://github.com/sparklemotion/mechanize
  • https://github.com/postmodern/spidr
  • https://github.com/stewartmckee/cobweb
  • http://watirwebdriver.com/ (Selenium)
  • also, Nokogiri gem can be used for scraping:

  • http://nokogiri.org/
  • there is a dedicated book about how to utilise nokogiri for scraping by packt publishing


    http://triflejs.org/就像phantomjs,但基于IE


    A kind of JS-based Selenium is Dalek.js. It not only aims for automated frontend-tests, you can also do screenshots with it. It has webdrivers for all important browsers. Unfortunately those webdrivers seem to be worth improving (just not to say "buggy" to Firefox).

    链接地址: http://www.djcxy.com/p/50686.html

    上一篇: 如何在使用TestNG的Selenium测试期间关闭打开的驱动程序

    下一篇: 无头浏览器和刮