Skip to content
master
Go to file
Code
This branch is 4 commits ahead, 19 commits behind gaojiuli:master.

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
img
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Build Python Version License

Web crawling framework for everyone. Written with asyncio, uvloop and aiohttp.

Requirements

  • Python3.5+

Installation

pip install gain

pip install uvloop (Only linux)

Usage

  1. Write spider.py:
from gain import Css, Item, Parser, Spider
import aiofiles

class Post(Item):
    title = Css('.entry-title')
    content = Css('.entry-content')

    async def save(self):
        async with aiofiles.open('scrapinghub.txt', 'a+') as f:
            await f.write(self.results)


class MySpider(Spider):
    concurrency = 5
    headers = {'User-Agent': 'Google Spider'}
    start_url = 'https://blog.scrapinghub.com/'
    parsers = [Parser('https://blog.scrapinghub.com/page/\d+/'),
               Parser('https://blog.scrapinghub.com/\d{4}/\d{2}/\d{2}/[a-z0-9\-]+/', Post)]


MySpider.run()
  1. Run python spider.py

  2. Result:

Example

The examples are in the /example/ directory.

Contribution

  • Pull request.
  • Open issue.

About

Web crawling framework based on asyncio for everyone.

Resources

License

Releases

No releases published

Packages

No packages published

Languages

You can’t perform that action at this time.