-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0049 stl forest park advisory #99
base: main
Are you sure you want to change the base?
0049 stl forest park advisory #99
Conversation
I usually just use |
city_scrapers/settings/base.py
Outdated
@@ -18,7 +18,7 @@ | |||
USER_AGENT = "City Scrapers [development mode]. Learn more and say hello at https://www.citybureau.org/city-scrapers/" | |||
|
|||
# Obey robots.txt rules | |||
ROBOTSTXT_OBEY = True | |||
ROBOTSTXT_OBEY = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you added custom_settings
in the spider itself, you can change this back to True
. We would like to avoid changing the settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this was from trying stuff out on scrapy shell, will revert
def _parse_title(self, response): | ||
title = response.xpath('//*/div[@class="EventTitle"]/text()').get() | ||
title = title.replace("\n", "") | ||
|
||
return title |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not be necessary, since all of the titles are "Forest Park Advisory Board".
def _parse_desc(self, response): | ||
descs = response.xpath( | ||
'//*/div[@id="EventDisplayBlock"]/div[@class="row"]/div/p/text()' | ||
).getall() | ||
description = "" | ||
for desc in descs: | ||
if desc == "\n": | ||
desc = "" | ||
else: | ||
desc = desc.replace("\xa0", "") | ||
desc = desc.replace("\n", "") | ||
description = description + desc | ||
|
||
return description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The descriptions describe the board and not the meeting so we don't need it.
def _parse_event(self, response): | ||
|
||
start = self._parse_start(response) | ||
links_key = datetime.strftime(start, "%Y-%m-%d") | ||
|
||
meeting = Meeting( | ||
title=self._parse_title(response), | ||
description=self._parse_desc(response), | ||
classification=BOARD, | ||
start=start, | ||
end=self._parse_end(response), | ||
all_day=False, | ||
location=self._parse_location(response), | ||
source=response.url, | ||
) | ||
|
||
meeting["links"] = [] | ||
|
||
if links_key in self.agenda_map.keys(): | ||
meeting["links"].append(self.agenda_map[links_key]) | ||
|
||
if links_key in self.minute_map.keys(): | ||
meeting["links"].append(self.minute_map[links_key]) | ||
|
||
return meeting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parse the status
and id
for each meeting like this.
Summary
Issue: 0049
Checklist
All checks are run in GitHub Actions. You'll be able to see the results of the checks at the bottom of the pull request page after it's been opened, and you can click on any of the specific checks listed to see the output of each step and debug failures.
Questions
My spider seems to be working, but the tests return weird whitespace in the title, description, and links.