scrapy

Bug 描述
按教程文档说明的，使用docker-compose up -d 安装启动后，直接执行task报错
不知道哪里有问题呢？
我的docker运行环境是win10

`2020-02-15 15:58:04 [scrapy.utils.log] INFO: Scrapy 1.8.0 started (bot: xueqiu)
22020-02-15 15:58:04 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 19.10.0, Python 3.6.9 (default, Nov 7 2019, 10:44:02) - [GCC 8.3.0], pyOpenSSL 19

我最近也在学习下Python的网络爬虫，非常感谢你的分享。
我今天在搭建好环境后尝试Spider_Python项目时遇到一个问题，无法连接mongodb，提示的错误是pymongo不存在Connection模块，然后我在网上找了下pymongo的用法，做了如下修改后可以正常运行并存入mongodb。
` # 连接数据库，db和posts为数据库和集合的游标
def Connection(self):
#connect to mongo(localhost:27017)
mongoclient = pymongo.MongoClient()
mongodb = mongoclient[self.database]
posts = mongodb.posts
return posts

In the quick start section, it seems forgot to mention the pipeline setting, and without the setting seems will cause yield item appear wrong result. Just like #137, please update the document, if need help, I can do the contribution as well.

我运行的是这4条代码，有可以获得IP，但用python客户端调用没办法取出来

启动scrapy worker，包括代理IP采集器和校验器

python crawler_booter.py --usage crawler
python crawler_booter.py --usage validator
启动调度器，包括代理IP定时调度和校验

python scheduler_booter.py --usage crawler
python scheduler_booter.py --usage validator

It would be much better user experience to use custom widgets for spider args. For example if we could be able to select category from a list or enter URL in separate field it would be much easier to end user to work with.

Hi, according to the following links

https://doc.scrapy.org/en/latest/topics/spiders.html#spiderargs
https://scrapyd.readthedocs.io/en/stable/api.html#schedule-json

Params can be sent to Spider class during initialization, I can't see any place for me to input them.
It will be thankful if this feature added.

linux：HTTPConnectionPool(host='192.168.0.24', port=6801): Max retries exceeded with url: /listprojects.json (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f0a78b2d828>: Failed to establish a new connection: [Errno 111] Connection refused',))
windows：HTTPConnectionPool(host='localhost', port=6801): Max retries exceeded with url: /jobs (Caused by Ne

Hello,

I was trying to build by own image with a 3rd party HTTP proxy.

Expected Behavior

According to the documentation:

you can use every software which accept the CONNECT method (Squid, Tinyproxy, etc.).

Actual Behavior

This is not the case because Scrapoxy expects to receive 200 response on http://xx.xx.

Checklist for items that I know need worked on before the ui branch can be merged into the dev branch

Create documentation
Add offline unit tests
Add online integration tests
Add tests to run_*_tests.sh
Can we add actual ui tests? (aka Selenium o

Bug 描述 (Describe the bug)

from scrapy.conf import settings

ModuleNotFoundError: No module named 'scrapy.conf'

如何重现 (To Reproduce)

基于win10 , 编译执行 scrapy crawl lianjia

桌面环境 Desktop (please complete the following information)

操作系统(OS): win10
Python: 3.7
Scrapy:1.7.3
Redis:
Elastic search:
Kibana:

附加信息 (Additional context)

添加有利

It took me hours to figure this out so I want to help anyone else having trouble getting this running on Heroku.

Kimurai uses lsof, so an Aptfile with the single line lsof needs to be included in the root folder along with the heroku buildpack. Can you add this to the docs? Thanks!

The documentation file said that we can download video or other types of file. I googled and haven't found any example about this. Can you give me an example of crawl a file that is not image.

Is there an option to crawl events out of Facebook?
If not, would it be easy to implement? I could assist if there is interest for that.

scrapy

Here are 2,051 public repositories matching this topic...

crawlab-team / crawlab

lining0806 / PythonSpiderNotes

rmax / scrapy-redis

chyroc / WechatSogou

SpiderClub / haipproxy

DormyMo / SpiderKeeper

Gerapy / Gerapy

my8100 / scrapydweb

nghuyong / WeiboSpider

DropsDevopsOrg / ECommerceCrawlers

LuckyZXL2016 / Movie_Recommend

fabienvauchelles / scrapoxy

Expected Behavior

Actual Behavior

wkunzhi / Python3-Spider

holgerd77 / django-dynamic-scraper

istresearch / scrapy-cluster

sczhengyabin / Image-Downloader

ramsayleung / jd_spider

kezhenxu94 / house-renting

Bug 描述 (Describe the bug)

如何重现 (To Reproduce)

桌面环境 Desktop (please complete the following information)

附加信息 (Additional context)

scrapingdance / JSpider

mtianyan / FunpySpiderSearchEngine

librauee / Reptile

vifreefly / kimuraframework

hellock / icrawler

lb2281075105 / Python-Spider

MorvanZhou / easy-scraping-tutorial

jonbakerfish / TweetScraper

xingag / spider_python

AlexMathew / scrapple

alecxe / scrapy-fake-useragent

rugantio / fbcrawl

Improve this page

Add this topic to your repo