-
Get the news pages' json description document and article text document.
-
The JSON file contains the node, title, path, author, publicated_time and so on.
-
Japan's National Daily, kyodonews, The Asahi Shimbun... japanese news websites
-
Using scrapy 1.5, python3.7
-
Testing is effective at the end of 2018, and the site may change the code.
-
This repository contains only crawler code and does not contain any crawled articles and other files.
-
获取网站新闻文章的json格式索引信息和txt格式文章内容
-
JSON文件包含每篇文章网页、标题、路径、作者、发布时间等信息
-
目前以朝日新闻、共同社、毎日新闻社为例
-
环境:python3.7, scrapy 1.5
-
网站前端样式可能会有修改,本代码在2018年末测试有效
-
此仓库只包含技术性爬取代码,不含文章等其他任何文件