Skip to content

Latest commit

 

History

History
59 lines (37 loc) · 1.12 KB

README.md

File metadata and controls

59 lines (37 loc) · 1.12 KB

pygooglenewsscraper

Scrape the news content from the Google news website (https://news.google.com).

It uses a keyword to retrieve the news title, URL, publisher, and date. The complete news content can then be retrieved from the URL.

Installation

pip3 install pygooglenewsscraper

Examples

Retrieve Google News items through a search keyword

from pygooglenewsscraper import GoogleNews, NewsArticle

# define keyword
keyword = 'artificial intelligence'

# google news object
googlenews = GoogleNews(keyword = keyword)

# perform google news search and retrieve raw news
raw_news = googlenews.get_raw_news()

# parse out the news articles
news = googlenews.parse_news(html = raw_news.text)

# print out results
for k, v in news.items():

	print(v['title'])
	print(v['url'])
	print(v['publisher'])
	print(v['date'])
	print()

Extract the news content for each URL

# get main content of news items
for k, v in news.items():

	# news article object
	news_article = NewsArticle(url = v['url'])

	# parse out news
	news_content = news_article.parse_main_content()

	print(news_content['content'])