Grow your team on GitHub
GitHub is home to over 50 million developers working together. Join them to grow your own development teams, manage permissions, and collaborate on projects.
Sign up
Pinned repositories
Repositories
-
dateparser
python parser for human readable dates
-
article-extraction-benchmark
Article extraction benchmark: dataset and evaluation scripts
-
extruct
Extract embedded metadata from HTML markup
-
splash
Lightweight, scriptable browser as a service with an HTTP API
-
autoextract-spiders
Pre-built Scrapy spiders for AutoExtract
-
shub
Scrapinghub Command Line Client
-
scrapy-poet
Page Object pattern for Scrapy
-
baseimage-docker
Forked from phusion/baseimage-dockerA minimal Ubuntu base image modified for Docker-friendliness
-
-
scrapy-autounit
Automatic unit test generation for Scrapy.
-
spidermon
Scrapy Extension for monitoring spiders execution.
-
js2xml
Convert Javascript code to an XML document
-
scrapinghub-autoextract
Python clients for Scrapinghub AutoExtract API
-
-
scrapinghub-stack-scrapy
Software stack with latest Scrapy and updated deps
-
marathon-apps-collectd-plugin
Forked from jsargiot/marathon-apps-collectd-pluginmarathon-apps-collectd-plugin
-
webstruct-demo
HTTP demo for https://github.com/scrapinghub/webstruct
-
mochiweb
Forked from shaneaevans/mochiwebMochiWeb is an Erlang library for building lightweight HTTP servers.
-
-
portia
Visual scraping for Scrapy
-
kafka-docker
Forked from wurstmeister/kafka-docker -
crawlera-headless-proxy
A complimentary proxy to help to use Crawlera with headless browsers
-
-
shublang
Pluggable DSL that uses pipes to perform a series of linear transformations to extract data
-
andi
Library for annotation-based dependency injection
-
adblockgoparser
Golang parser for Adblock Plus filters
-
price-parser
Extract price amount and currency symbol from a raw text string
-
sample-projects
Sample projects showcasing Scrapinghub tech