We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
It's possible there is a preventative measure stopping us scraping this site, from list 1 #239 - see PR #256
2019-06-11 14:16:18 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: misinformation) 2019-06-11 14:16:18 [scrapy.utils.log] INFO: Versions: lxml 4.3.3.0, libxml2 2.9.9, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 19.2.0, Python 3.7.2 (default, Dec 29 2018, 00:00:04) - [Clang 4.0.1 (tags/RELEASE_401/final)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1b 26 Feb 2019), cryptography 2.6.1, Platform Darwin-18.2.0-x86_64-i386-64bit 2019-06-11 14:16:18 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'misinformation', 'CONCURRENT_REQUESTS': 8, 'FEED_EXPORT_ENCODING': 'utf-8', 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'misinformation.spiders', 'SPIDER_MODULES': ['misinformation.spiders'], 'URLLENGTH_LIMIT': 850, 'USER_AGENT': 'Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1464.0 Safari/537.36'} 2019-06-11 14:16:18 [scrapy.extensions.telnet] INFO: Telnet Password: 38df1f9601e13043 2019-06-11 14:16:18 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.logstats.LogStats'] 2019-06-11 14:16:19 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'misinformation.middlewares.JSLoadButtonMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'misinformation.middlewares.CloudFlareMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'misinformation.middlewares.DelayedRetryMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2019-06-11 14:16:19 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2019-06-11 14:16:19 [scrapy.middleware] INFO: Enabled item pipelines: ['misinformation.pipelines.ArticleJsonFileExporter'] 2019-06-11 14:16:19 [scrapy.core.engine] INFO: Spider opened 2019-06-11 14:16:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2019-06-11 14:16:19 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2019-06-11 14:17:14 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.kansascity.com/news/politics-government/> Traceback (most recent call last): File "/anaconda3/envs/misinfo-dev/lib/python3.7/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_request defer.returnValue((yield download_func(request=request,spider=spider))) twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>] 2019-06-11 14:17:14 [scrapy.core.engine] INFO: Closing spider (finished) 2019-06-11 14:17:14 [ScattergunSpider] INFO: Spider closed: kansascity.com (finished) 2019-06-11 14:17:14 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/exception_count': 3, 'downloader/exception_type_count/twisted.web._newclient.ResponseNeverReceived': 3, 'downloader/request_bytes': 924, 'downloader/request_count': 3, 'downloader/request_method_count/GET': 3, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2019, 6, 11, 13, 17, 14, 513341), 'log_count/ERROR': 1, 'log_count/INFO': 10, 'memusage/max': 77508608, 'memusage/startup': 77504512, 'retry/count': 2, 'retry/max_reached': 2, 'retry/reason_count/twisted.web._newclient.ResponseNeverReceived': 2, 'scheduler/dequeued': 3, 'scheduler/dequeued/memory': 3, 'scheduler/enqueued': 3, 'scheduler/enqueued/memory': 3, 'start_time': datetime.datetime(2019, 6, 11, 13, 16, 19, 834908)} 2019-06-11 14:17:14 [scrapy.core.engine] INFO: Spider closed (finished)
The text was updated successfully, but these errors were encountered:
No branches or pull requests
It's possible there is a preventative measure stopping us scraping this site, from list 1 #239 - see PR #256
The text was updated successfully, but these errors were encountered: