Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0049 stl forest park advisory #99

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

mikejoo
Copy link
Collaborator

@mikejoo mikejoo commented Aug 13, 2020

Summary

Issue: 0049

Checklist

All checks are run in GitHub Actions. You'll be able to see the results of the checks at the bottom of the pull request page after it's been opened, and you can click on any of the specific checks listed to see the output of each step and debug failures.

  • Tests are implemented
  • All tests are passing
  • Style checks run (see documentation for more details)
  • Style checks are passing
  • Code comments from template removed

Questions

My spider seems to be working, but the tests return weird whitespace in the title, description, and links.

@ledaliang
Copy link
Member

I usually just use replace(" ", "") and replace("\n", ""), but you could also use re.sub.

@@ -18,7 +18,7 @@
USER_AGENT = "City Scrapers [development mode]. Learn more and say hello at https://www.citybureau.org/city-scrapers/"

# Obey robots.txt rules
ROBOTSTXT_OBEY = True
ROBOTSTXT_OBEY = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you added custom_settings in the spider itself, you can change this back to True. We would like to avoid changing the settings.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was from trying stuff out on scrapy shell, will revert

Comment on lines +104 to +108
def _parse_title(self, response):
title = response.xpath('//*/div[@class="EventTitle"]/text()').get()
title = title.replace("\n", "")

return title
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be necessary, since all of the titles are "Forest Park Advisory Board".

Comment on lines +110 to +123
def _parse_desc(self, response):
descs = response.xpath(
'//*/div[@id="EventDisplayBlock"]/div[@class="row"]/div/p/text()'
).getall()
description = ""
for desc in descs:
if desc == "\n":
desc = ""
else:
desc = desc.replace("\xa0", "")
desc = desc.replace("\n", "")
description = description + desc

return description
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The descriptions describe the board and not the meeting so we don't need it.

Comment on lines +62 to +86
def _parse_event(self, response):

start = self._parse_start(response)
links_key = datetime.strftime(start, "%Y-%m-%d")

meeting = Meeting(
title=self._parse_title(response),
description=self._parse_desc(response),
classification=BOARD,
start=start,
end=self._parse_end(response),
all_day=False,
location=self._parse_location(response),
source=response.url,
)

meeting["links"] = []

if links_key in self.agenda_map.keys():
meeting["links"].append(self.agenda_map[links_key])

if links_key in self.minute_map.keys():
meeting["links"].append(self.minute_map[links_key])

return meeting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parse the status and id for each meeting like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants