Meet Scrapy
An open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way.
Sample Scrapy Code
pip install scrapy
cat > myspider.py <<EOF
from scrapy import Spider, Item, Field
class Post(Item):
title = Field()
class BlogSpider(Spider):
name, start_urls = 'blogspider', ['http://blog.scrapinghub.com']
def parse(self, response):
return [Post(title=e.extract()) for e in response.css("h2 a::text")]
EOF
scrapy runspider myspider.py
Build your own webcrawlers
Fast and powerful
write the rules to extract the data and let Scrapy do the rest
Easily extensible
extensible by design, plug new functionality easily without having to touch the core
Portable, Python
written in Python and runs on Linux, Windows, Mac and BSD
Healthy community:
- - 5,8k stars, 1,6k forks and 550 watchers on GitHub
- - 1,5k followers on Twitter
- - 2,7k questions on StackOverflow
- - 2k members on mailing list
Want to know more?
Scrapy at a glance
Wondering what Scrapy can do?
Meet the companies using Scrapy