How to pull all URLs modified today? #124
-
This gives me a dataframe with all the urls in the sitemap. How to filter this down to only get the entries where the |
Beta Was this translation helpful? Give feedback.
Answered by
eliasdabbas
Jan 19, 2021
Replies: 2 comments 5 replies
-
You can create a variable with the desired date, for example For example: import datetime
import advertools as adv
import pandas as pd
sitemap = adv.sitemap_to_df('https://www.nytimes.com/sitemaps/new/news.xml.gz')
today = datetime.datetime(2021, 1, 19)
sitemap[pd.to_datetime(sitemap['lastmod'].dt.date).eq(today)]
|
Beta Was this translation helpful? Give feedback.
4 replies
Answer selected by
eliasdabbas
-
Hi Elias,
Is there a wait to add a one second pause/sleep between each sitemap url retrieval?
Let me know
Thanks
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
You can create a variable with the desired date, for example
today
, and then filter the sitemap df where the extracted date fromlastmod
is equal totoday
.For example: