Cloud-based service for web scraping, data extraction, and other ETL tasks. Schedule and run scrapers in the cloud or compile and run on your PC.
Ranked in these QuestionsQuestion Ranking
Pros
Pro Supports nested data structures
Supports not only flat data structures but also nested, so you may have list of objects in parent object and so on.
Pro OCR support
Its possible to OCR images and get textual data from them while you scraping other data from web.
Pro More than HTML parser
Service can scrape from not only HTML pages, but also XML, JSON, JS, iCal, and images. All these types gets converted to XML, and then you can traverse them by DOM as standard HTML.
Pro Can be self-hosted
Useful for people who don't want to scrape from cloud and prefer run it on own hardware. You can compile digger to very lightweight application to be run on Linux, windows or Mac.
Cons
Con No PDF support
You cannot scrape data from PDF files.