Add ability to scrape user pages #12

xraymemory · 2018-06-28T04:01:06Z

Refactored the code so that you can specify both tags and users to be scraped. Also fixed some off by one errors and added more function documentation.

meetmangukiya

This is a nice new addition, thank you! Left a detailed review, some are nitpicks here and there, please bear with me. 😅

meetmangukiya · 2018-06-28T17:08:08Z

instagram_scraper.py

@@ -1,3 +1,4 @@
+


umm... why?

meetmangukiya · 2018-06-28T17:24:15Z

instagram_scraper.py

+        :param short_circuit:
+            Whether or not to short_circuit total_count loop 
+
+    Yields url, captions, hashtags, and mentions for provided insta url


caption*

Move this to the top, in the docstring.

meetmangukiya · 2018-06-28T17:24:34Z

instagram_scraper.py

+        :param existing:
+            URLs to skip
+        :param short_circuit:
+            Whether or not to short_circuit total_count loop 


dedent lines 26-33 by 4 spaces

meetmangukiya · 2018-06-28T17:32:05Z

instagram_scraper.py

    :param total_count:
        Total number of images to be scraped.
+    :param existing:
+        URLs to skip
+    :param mode


add a colon after mode

meetmangukiya · 2018-06-28T17:34:47Z

instagram_scraper.py

+        List of users to be scraped
+    :param total_count:
+        total number of images to be scraped
+    :param should_continue


add colon after should_continue

meetmangukiya · 2018-06-28T17:52:16Z

instagram_scraper.py

-                    existing_links.add(row[1])
-                start = i + 1
-        _single_tag_processing(tag, total_count, existing_links, start)
+                    print(f'[{target}] downloaded {url} as {file_index}.jpg in data/{target}')


This becomes incorrect, since we are downloading as f'{count}.jpg' which is one less than file_index. Replace count with file_index, better variable name.

meetmangukiya · 2018-06-28T17:54:16Z

instagram_scraper.py


                try:
                    req = requests.get(url)
-                    with open(f'data/{tag}/{count}.jpg', 'wb') as img:
+                    with open(f'data/{target}/{count}.jpg', 'wb') as img:


We want the users to be able to distinguish between the user photos, and tag photos, since if I scrape @instagram, I might mistake it for images scraped from instagram tag. So, mode specific data directories. :)

meetmangukiya · 2018-06-28T17:54:37Z

instagram_scraper.py

+                    print(f'[{target}] downloaded {url} as {file_index}.jpg in data/{target}')
+
+    targets = {'tags': tags, 'users': users}
+    for mode,lists in targets.items():


space after ,

meetmangukiya · 2018-06-28T17:55:40Z

instagram_scraper.py

+
+    Scrapes user and hashtag images from Instagram
+    """
+    def _single_input_processing(target: str, total_count: int, existing_links: set, start: int, mode: str='tag'):


Rename this, this is no longer single input processing.

meetmangukiya · 2018-06-28T17:56:48Z

instagram_scraper.py

+                    for i, row in enumerate(reader):
+                        existing_links.add(row[1])
+                    start = i + 1
+            _single_input_processing(target, total_count, existing_links, start, mode=mode)


Account the rename here too

xraymemory added 6 commits June 27, 2018 23:40

Add ability to scrape user pages

7607fa2

More meta info for functions

6a91d7b

Move count tick

f1c5d8a

Fix off by one

78d6e31

More off by one fun

c4e25ab

Remove gratuitous prints

ab2004e

meetmangukiya suggested changes Jun 28, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to scrape user pages #12

Add ability to scrape user pages #12

xraymemory commented Jun 28, 2018

meetmangukiya left a comment

meetmangukiya Jun 28, 2018

meetmangukiya Jun 28, 2018

meetmangukiya Jun 28, 2018

meetmangukiya Jun 28, 2018

meetmangukiya Jun 28, 2018

meetmangukiya Jun 28, 2018

meetmangukiya Jun 28, 2018

meetmangukiya Jun 28, 2018

meetmangukiya Jun 28, 2018

meetmangukiya Jun 28, 2018

Add ability to scrape user pages #12

Are you sure you want to change the base?

Add ability to scrape user pages #12

Conversation

xraymemory commented Jun 28, 2018

meetmangukiya left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment