web-crawler

不能使用非crawlab里面mongodb么?

Just like it's done in ES, we could route the documents in the statusupdaterbolt based on the host / name or IP and in the spouts check that the number of instances is equal to the # of shards and filter the queries per shard accordingly.

At the moment, we can have only one instance of a spout.

https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-in-solrcloud.html

web-crawler

Here are 520 public repositories matching this topic...

crawlab-team / crawlab

不能使用非crawlab里面mongodb么?

可配置爬虫，在界面上对进行修改，提示保存成功，然而并没有

docker安装的任务执行有问题

BruceDone / awesome-crawler

apache / nutch

sjdirect / abot

xianhu / PSpider

DigitalPebble / storm-crawler

Add support for shards in SOLR

USCDataScience / sparkler

VIDA-NYU / ache

brendonboshell / supercrawler

infinitbyte / gopa

lucasxlu / LagouJob

rivermont / spidy

microfisher / Strong-Web-Crawler

ssssssss-team / spider-flow

antchfx / antch

elliotxx / zhihu-crawler-people

crawler-commons / crawler-commons

duyet / awesome-web-scraper

Norconex / collector-http

gusdnd852 / kochat

mazzzystar / Proxy

commoncrawl / news-crawl

abaykan / CrawlBox

brianmadden / krawler

platonai / pulsar

crawlab-team / crawlab-lite

DwarfThief / Raspagem-de-dados-para-iniciantes

monkey-soft / SchweizerMesser

jaxBCD / Ultimate-Dork

mattdeitke / CVPR2019

Improve this page

Add this topic to your repo