crawler

The Take screenshot of item example says "This example demonstrates how to return a Deferred from the process_item() method", but that is no longer the case (async/await syntax is currently used).
I think we should remove that sentece, and add a reminder about [enabling the asyncio reactor](https://docs.scr

Bug 描述
按教程文档说明的，使用docker-compose up -d 安装启动后，直接执行task报错
不知道哪里有问题呢？
我的docker运行环境是win10

`2020-02-15 15:58:04 [scrapy.utils.log] INFO: Scrapy 1.8.0 started (bot: xueqiu)
22020-02-15 15:58:04 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 19.10.0, Python 3.6.9 (default, Nov 7 2019, 10:44:02) - [GCC 8.3.0], pyOpenSSL 19

Reflect this kind of things

{
method : 'POST',
form : { key: 'value', key2: 'value'}
}

I am developing a crawler and so far, so very good: thank you for this outstanding crawler.

The only issue is that, in the returned URLs, there is a & character which gets converted into \u0026, thus: "https://thedomain/alphabet=M\u0026borough=Bronx"

So I tried to replace it, either by using SUBSTITUTE:
RETURN SUBSTITUTE(prfx + letter.attributes.href, "\u0026", "&")

or `REG

crawler

Here are 4,089 public repositories matching this topic...

scrapy / scrapy

binux / pyspider

gocolly / colly

iawia002 / annie

jhao104 / proxy_pool

codelucas / newspaper

code4craft / webmagic

shengqiangzhang / examples-of-web-crawlers

guyueyingmu / avbook

s0md3v / Photon

crawlab-team / crawlab

bda-research / node-crawler

injetlee / Python

yujiosaka / headless-chrome-crawler

chyroc / WechatSogou

rmax / scrapy-redis

SpiderClub / haipproxy

MontFerret / ferret

BruceDone / awesome-crawler

gaojiuli / toapi

symfony / dom-crawler

imWildCat / scylla

Arachni / arachni

dotnetcore / DotnetSpider

NikolaiT / GoogleScraper

jae-jae / QueryList

constverum / ProxyBroker

gaojiuli / gain

xtuhcy / gecco

PuerkitoBio / gocrawl

Improve this page

Add this topic to your repo