0049 stl forest park advisory #99

mikejoo · 2020-08-13T13:17:54Z

Summary

Issue: 0049

Checklist

All checks are run in GitHub Actions. You'll be able to see the results of the checks at the bottom of the pull request page after it's been opened, and you can click on any of the specific checks listed to see the output of each step and debug failures.

Tests are implemented
All tests are passing
Style checks run (see documentation for more details)
Style checks are passing
Code comments from template removed

Questions

My spider seems to be working, but the tests return weird whitespace in the title, description, and links.

ledaliang · 2020-08-13T14:18:46Z

I usually just use replace(" ", "") and replace("\n", ""), but you could also use re.sub.

ledaliang · 2020-08-13T14:21:27Z

city_scrapers/settings/base.py

@@ -18,7 +18,7 @@
 USER_AGENT = "City Scrapers [development mode]. Learn more and say hello at https://www.citybureau.org/city-scrapers/"

 # Obey robots.txt rules
-ROBOTSTXT_OBEY = True
+ROBOTSTXT_OBEY = False


Since you added custom_settings in the spider itself, you can change this back to True. We would like to avoid changing the settings.

Yeah, this was from trying stuff out on scrapy shell, will revert

ledaliang · 2020-08-13T16:47:19Z

city_scrapers/spiders/stl_forest_park_advisory.py

+    def _parse_title(self, response):
+        title = response.xpath('//*/div[@class="EventTitle"]/text()').get()
+        title = title.replace("\n", "")
+
+        return title


This may not be necessary, since all of the titles are "Forest Park Advisory Board".

ledaliang · 2020-08-13T16:48:23Z

city_scrapers/spiders/stl_forest_park_advisory.py

+    def _parse_desc(self, response):
+        descs = response.xpath(
+            '//*/div[@id="EventDisplayBlock"]/div[@class="row"]/div/p/text()'
+        ).getall()
+        description = ""
+        for desc in descs:
+            if desc == "\n":
+                desc = ""
+            else:
+                desc = desc.replace("\xa0", "")
+                desc = desc.replace("\n", "")
+            description = description + desc
+
+        return description


The descriptions describe the board and not the meeting so we don't need it.

ledaliang · 2020-08-13T16:50:31Z

city_scrapers/spiders/stl_forest_park_advisory.py

+    def _parse_event(self, response):
+
+        start = self._parse_start(response)
+        links_key = datetime.strftime(start, "%Y-%m-%d")
+
+        meeting = Meeting(
+            title=self._parse_title(response),
+            description=self._parse_desc(response),
+            classification=BOARD,
+            start=start,
+            end=self._parse_end(response),
+            all_day=False,
+            location=self._parse_location(response),
+            source=response.url,
+        )
+
+        meeting["links"] = []
+
+        if links_key in self.agenda_map.keys():
+            meeting["links"].append(self.agenda_map[links_key])
+
+        if links_key in self.minute_map.keys():
+            meeting["links"].append(self.minute_map[links_key])
+
+        return meeting


Parse the status and id for each meeting like this.

mikejoo added 4 commits July 28, 2020 22:18

started on spider

f9af2a1

stl fpab almost done

e06c394

almost done, need to fix tests

e27db31

weird whitespaces in test scrape results

b3c4fc1

ledaliang reviewed Aug 13, 2020

View reviewed changes

Revert ROBOTSTXT option

17a6710

ledaliang reviewed Aug 13, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0049 stl forest park advisory #99

0049 stl forest park advisory #99

mikejoo commented Aug 13, 2020 •

edited

Loading

ledaliang commented Aug 13, 2020

ledaliang Aug 13, 2020

mikejoo Aug 13, 2020

ledaliang Aug 13, 2020

ledaliang Aug 13, 2020

ledaliang Aug 13, 2020

0049 stl forest park advisory #99

Are you sure you want to change the base?

0049 stl forest park advisory #99

Conversation

mikejoo commented Aug 13, 2020 • edited Loading

Summary

Checklist

Questions

ledaliang commented Aug 13, 2020

ledaliang Aug 13, 2020

Choose a reason for hiding this comment

mikejoo Aug 13, 2020

Choose a reason for hiding this comment

ledaliang Aug 13, 2020

Choose a reason for hiding this comment

ledaliang Aug 13, 2020

Choose a reason for hiding this comment

ledaliang Aug 13, 2020

Choose a reason for hiding this comment

mikejoo commented Aug 13, 2020 •

edited

Loading