-
Notifications
You must be signed in to change notification settings - Fork 607
[refactor] fix Profile object #167
base: master
Are you sure you want to change the base?
Conversation
1. is_private 2. birthday 3. favorites
twitter_scraper/modules/profile.py
Outdated
self.location = None | ||
|
||
# TODO unfixed | ||
self.birthday = html.find(".ProfileHeaderCard-birthdateText")[0].text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not too familiar with the code (and appreciate it's not the issue you are fixing), but I would assume to make the code work it would be better to put this in a try except IndexError block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thank you for the response!
The error is an error due to the empty list, I have added an exception catch for all elements, please try again.
I did not use specific IndexError is because the original code did not specify Errors, so I think it will be better to use the same style.
Once again thank you for disclosing the problem!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
profile = Profile(f"{user_id}")
print(profile.to_dict())
You could try using the code by the fixed version above, I use "BarackObama" as user_id to gets the output below:
- output
{'name': 'Barack Obama', 'username': 'BarackObama', 'birthday': None, 'biography': 'Dad, husband, President, citizen.', 'location': 'Washington, DC', 'website': 'obama.org', 'profile_photo': 'https://pbs.twimg.com/profile_images/822547732376207360/5g0FC8XX_normal.jpg', 'banner_photo': None, 'likes_count': None, 'tweets_count': 15921, 'followers_count': 122418592, 'following_count': 601466, 'is_verified': True, 'is_private': False, 'user_id': None}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New version is working! Thanks a lot!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yeachan153 Thank you too ~ could you please confirm the review request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yeachan153 Thank you too ~ could you please confirm the review request?
Seems valid. Could you please merge it? It's been quite a while, everything is better than the current version that is not working with profiles at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last time I checked, the code is working! That's really nice!
I am still a bit concerned about the bare try except blocks; while keeping in the style of the code can give unexpected results in the future. I.e. if the HTML changes, then user attributes can be assigned None without raising an explicit error and could lead to confusing situations and raise cases where mistakes are caught much later because we assume that everything is working as intended.
I am not sure what a perfect solution would be. Maybe errors should be logged even if the scraper continues, so we are aware what went wrong even if the final result is outputted? Just a suggestion.
But nice work on getting everything to work :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix @yimingStar
Seems that soon enough we'd be needing to delegate these properties to a general function because it follows a pattern (find from selector, get first, get text, return value else default to None), and any deviation from that would have to become it's own function.
Generally it's simple and works as it is right now.
I left a suggestion. Let me know what you think.
twitter_scraper/modules/profile.py
Outdated
stats_table = None | ||
stats = None | ||
try: | ||
stats_table = html.find('table.profile-stats')[0] | ||
stats = stats_table.find('td div.statnum') | ||
if not stats: | ||
self.tweets_count = None | ||
self.following_count = None | ||
self.followers_count = None | ||
except: | ||
self.tweets_count = None | ||
self.following_count = None | ||
self.followers_count = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stats_table = None | |
stats = None | |
try: | |
stats_table = html.find('table.profile-stats')[0] | |
stats = stats_table.find('td div.statnum') | |
if not stats: | |
self.tweets_count = None | |
self.following_count = None | |
self.followers_count = None | |
except: | |
self.tweets_count = None | |
self.following_count = None | |
self.followers_count = None | |
try: | |
stats_table = html.find('table.profile-stats')[0] | |
stats = stats_table.find('td div.statnum') | |
except: | |
stats_table = None | |
stats = None |
How about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stats - tweets_count - following_count - followers_count
@yeachan153 Genuinely agree with your suggestion for the purpose of letting the user notice the missing web elements. The fixed is in this commit c7bbf55 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix works. Left a minor comment. Overall LGTM. Thanks @yimingStar 👍
@@ -39,7 +40,7 @@ def __init__(self, username): | |||
page = session.get(f"https://twitter.com/{username}", headers=headers) | |||
self.username = username | |||
self.__parse_profile(page) | |||
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking, extra white spaces.
@@ -49,77 +50,128 @@ def __parse_profile(self, page): | |||
) | |||
except ParserError: | |||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't ignore a fatal error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that was in the original package, so it's not related to this PR. Still, it should be ignored
Is this solution still working? So this:
leads to:
|
Uable to get the original page (400 error code), get mobile twitter page indead.
UnboundLocalError: local variable 'html' referenced before assignment #166