Scrapinghub

Repositories

dateparser

python parser for human readable dates

Python BSD-3-Clause 310 1,508 133 (1 issue needs help) 41 Updated Jun 18, 2020
shub-workflow

Python BSD-3-Clause 7 5 1 1 Updated Jun 18, 2020
article-extraction-benchmark

Article extraction benchmark: dataset and evaluation scripts

Python MIT 2 3 0 0 Updated Jun 17, 2020
extruct

Extract embedded metadata from HTML markup

microformats semantic-web rdfa json-ld microdata opengraph

Python 74 481 16 (2 issues need help) 13 Updated Jun 17, 2020
splash

Lightweight, scriptable browser as a service with an HTTP API

Python 405 2,986 282 22 Updated Jun 16, 2020
autoextract-spiders

Pre-built Scrapy spiders for AutoExtract

Python BSD-3-Clause 4 6 2 5 Updated Jun 16, 2020
shub

Scrapinghub Command Line Client

Python 57 94 34 (6 issues need help) 9 Updated Jun 11, 2020
scrapy-poet

Page Object pattern for Scrapy

Python BSD-3-Clause 9 23 3 3 Updated Jun 10, 2020
baseimage-docker
Forked from phusion/baseimage-docker
A minimal Ubuntu base image modified for Docker-friendliness

Python MIT 1,031 1 0 1 Updated Jun 9, 2020
docker-erlang-otp
Forked from erlang/docker-erlang-otp
the Official Erlang OTP image on Docker Hub

Dockerfile Apache-2.0 46 0 0 2 Updated Jun 9, 2020
scrapy-autounit

Automatic unit test generation for Scrapy.

Python BSD-3-Clause 9 25 10 2 Updated Jun 9, 2020
spidermon

Scrapy Extension for monitoring spiders execution.

testing monitoring scraping crawling spiders monitoring-tool scrapinghub

Python BSD-3-Clause 51 256 33 (1 issue needs help) 11 Updated Jun 8, 2020
js2xml

Convert Javascript code to an XML document

Python MIT 16 111 1 0 Updated Jun 4, 2020
scrapinghub-autoextract

Python clients for Scrapinghub AutoExtract API

Python BSD-3-Clause 2 11 0 0 Updated Jun 2, 2020
web-poet

Web scraping Page Objects core library

python web-scraping page-objects

Python BSD-3-Clause 3 11 0 0 Updated Jun 2, 2020
scrapinghub-stack-scrapy

Software stack with latest Scrapy and updated deps

Dockerfile BSD-3-Clause 8 35 1 0 Updated May 25, 2020
marathon-apps-collectd-plugin
Forked from jsargiot/marathon-apps-collectd-plugin
marathon-apps-collectd-plugin

Python GPL-2.0 83 2 0 0 Updated May 25, 2020
webstruct-demo

HTTP demo for https://github.com/scrapinghub/webstruct

Python MIT 2 3 0 2 Updated May 21, 2020
mochiweb
Forked from shaneaevans/mochiweb
MochiWeb is an Erlang library for building lightweight HTTP servers.

Erlang 470 0 0 1 Updated May 20, 2020
woodpecker
Forked from laszlocph/woodpecker
An opinionated fork of the Drone CI system

Go 17 0 0 0 Updated May 19, 2020
docker-images

Dockerfile 9 28 0 3 Updated May 9, 2020
portia

Visual scraping for Scrapy

Python 1,257 7,768 100 9 Updated May 9, 2020
kafka-docker
Forked from wurstmeister/kafka-docker

Shell Apache-2.0 2,034 0 0 1 Updated May 6, 2020
crawlera-headless-proxy

A complimentary proxy to help to use Crawlera with headless browsers

crawler proxy scraping

Go MIT 13 53 3 0 Updated May 6, 2020
varanus

A command line spider monitoring tool

spider monitoring python36

Python 4 6 3 2 Updated May 1, 2020
shublang

Pluggable DSL that uses pipes to perform a series of linear transformations to extract data

Python 2 4 29 (7 issues need help) 1 Updated Apr 29, 2020
andi

Library for annotation-based dependency injection

Python BSD-3-Clause 1 8 1 1 Updated Apr 27, 2020
adblockgoparser

Golang parser for Adblock Plus filters

Go MIT 1 1 0 0 Updated Apr 21, 2020
price-parser

Extract price amount and currency symbol from a raw text string

Python BSD-3-Clause 23 131 11 (4 issues need help) 8 Updated Apr 15, 2020
sample-projects

Sample projects showcasing Scrapinghub tech

Python 125 100 5 3 Updated Apr 5, 2020

Top languages

Loading…

Most used topics

data-science scraping python scrapy web-scraping

Scrapinghub

Pinned repositories

Repositories

dateparser

shub-workflow

article-extraction-benchmark

extruct

splash

autoextract-spiders

shub

scrapy-poet

baseimage-docker

docker-erlang-otp

scrapy-autounit

spidermon

js2xml

scrapinghub-autoextract

web-poet

scrapinghub-stack-scrapy

marathon-apps-collectd-plugin

webstruct-demo

mochiweb

woodpecker

docker-images

portia

kafka-docker

crawlera-headless-proxy

varanus

shublang

andi

adblockgoparser

price-parser

sample-projects

Top languages

Most used topics

People

Grow your team on GitHub

Pinned repositories

Repositories

Top languages

Most used topics

People