Fork on Github

Meet Scrapy

An open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way.

Install latest version:

Scrapy 0.24

pip install scrapy

Sample Scrapy Code

pip install scrapy
cat > myspider.py <<EOF

from scrapy import Spider, Item, Field

class Post(Item):
    title = Field()

class BlogSpider(Spider):
    name, start_urls = 'blogspider', ['http://blog.scrapinghub.com']

    def parse(self, response):
        return [Post(title=e.extract()) for e in response.css("h2 a::text")]

EOF
scrapy runspider myspider.py

Build your own webcrawlers

Fast and powerful

write the rules to extract the data and let Scrapy do the rest

Easily extensible

extensible by design, plug new functionality easily without having to touch the core

Portable, Python

written in Python and runs on Linux, Windows, Mac and BSD

Healthy community: