Tuesday, July 16, 2013

Python Libraries For Scraping

List of python scraping libraries I use to develop crawlers based on my choice of scraping library:
  1. No other than - SCRAPY - fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing
  2. UrlLib2 + Beautiful Soup - If I had to build framework from scratch this is the first choice
  3. Mechanize + Beautiful Soup - Replace UrlLib2 with Mechanize - Easy HTML form filling, any URL can be opened, not just HTTP, Automatic handling of HTTP-Equiv and Refresh, Easy link parsing and following
Please leave a feedback if you are using some other library that I should list here.

BeClasp Consulting provides Python and .NET based website scraping service and have wrote 1000's of parsers so far ranging from data crawling for Bank Accounts reconciliation, e-commerce stores or other data mining services. Drop us an email at mail@beclaspconsulting.net to know more about the services we offer.