crawler
Here are 4,089 public repositories matching this topic...
-
Updated
May 1, 2020 - Python
-
Updated
Apr 30, 2020 - PHP
-
Updated
Mar 14, 2020 - Python
docker安装的任务执行有问题
Bug 描述
按教程文档说明的,使用docker-compose up -d 安装启动后,直接执行task报错
不知道哪里有问题呢?
我的docker运行环境是win10
`2020-02-15 15:58:04 [scrapy.utils.log] INFO: Scrapy 1.8.0 started (bot: xueqiu)
22020-02-15 15:58:04 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 19.10.0, Python 3.6.9 (default, Nov 7 2019, 10:44:02) - [GCC 8.3.0], pyOpenSSL 19
-
Updated
Nov 1, 2019 - Python
I am developing a crawler and so far, so very good: thank you for this outstanding crawler.
The only issue is that, in the returned URLs, there is a & character which gets converted into \u0026, thus: "https://thedomain/alphabet=M\u0026borough=Bronx"
So I tried to replace it, either by using SUBSTITUTE:
RETURN SUBSTITUTE(prfx + letter.attributes.href, "\u0026", "&")
or `REG
-
Updated
Apr 20, 2020
-
Updated
Apr 28, 2020 - PHP
-
Updated
Mar 15, 2020 - Python
-
Updated
Jan 28, 2020 - Ruby
-
Updated
Apr 29, 2020 - C#
-
Updated
May 1, 2020 - HTML
-
Updated
May 2, 2020 - Python
Improve this page
Add a description, image, and links to the crawler topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the crawler topic, visit your repo's landing page and select "manage topics."
The Take screenshot of item example says "This example demonstrates how to return a Deferred from the process_item() method", but that is no longer the case (async/await syntax is currently used).
I think we should remove that sentece, and add a reminder about [enabling the
asyncioreactor](https://docs.scr