You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We used the following query to find websites in the database with no articles in our time period (Sept 1, 2018 - Dec 31,2018).
DECLARE @MinDate DATE = '20180901',
@MaxDate DATE = '20181231';
SELECT DISTINCT([site_name])
FROM [dbo].[articles_v5] WHERE [site_name] NOT IN (
SELECT [site_name] FROM [dbo].[articles_v5]
WHERE [publication_datetime] >= @MinDate
AND [publication_datetime] < @MaxDate);
Out of these sites, we found that most sites were dead or didn't had content in the period. However, there are 3 sites that should have articles:
Hi,
We used the following query to find websites in the database with no articles in our time period (Sept 1, 2018 - Dec 31,2018).
Out of these sites, we found that most sites were dead or didn't had content in the period. However, there are 3 sites that should have articles:
notallowedto.com has hidden pagination. Scraping could be done using https://notallowedto.com/news/page1, https://notallowedto.com/news/page2, https://notallowedto.com/news/page3
The text was updated successfully, but these errors were encountered: