-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix images parsing and add artist album to ChartEntry #80
base: master
Are you sure you want to change the base?
Conversation
Thanks! Can you rebase your branch? I fixed the tests in master. I'll leave some inline comments. |
billboard.py
Outdated
weeks, | ||
rank, | ||
isNew, | ||
artistImage=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, can you explicitly set artistImage
to None
in _parseOldStylePage
instead of having a default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, will do
@@ -65,8 +66,17 @@ class ChartEntry: | |||
rank: The track's position on the chart, as an int. | |||
isNew: Whether the track is new to the chart, as a boolean. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the new field to the doc comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, will do
billboard.py
Outdated
dateElement = soup.select_one("button.date-selector__button.button--link") | ||
artists = [] | ||
# extract artists info from page | ||
pageContent = json.loads(soup.select_one("div#charts")['data-charts']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this raise if this fails to load? Can we catch the exception instead?
billboard.py
Outdated
# TODO: Parse the image | ||
image = None | ||
# get image data from artists list | ||
artistData = list(filter(lambda x: x['name'] == artist, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like artists
should be a dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although, what happens if an artist appears multiple times on a chart with different albums? Does titleImage
work correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. I'll fix that
billboard.py
Outdated
# searching for image in 3 different sizes | ||
if artistImages: | ||
_artistImage = artistImages['sizes'].get('original') | ||
if not _artistImage: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
x = z1
if not x:
x = z2
if not x:
x = z3
is usually written x = z1 or z2 or z3
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although it wouldn't be good for readability , don't you think ?
also, it is possible to add more later. Since sometimes the image is available only in small sizes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this?
_artistImage = None
for variant in ['original', 'ye-landing-lg-2x', 'ye-landing-med-2x']:
_artistImage = _artistImage or artistImages['sizes'].get(variant)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will do it !
billboard.py
Outdated
@@ -314,7 +324,43 @@ def getMinistatsCellValue(fieldName, ifNoValue=None): | |||
self.entries.append(entry) | |||
|
|||
def _parseNewStylePage(self, soup): | |||
dateElement = soup.select_one("button.date-selector__button.button--link") | |||
artists = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this whole block into a helper function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
billboard.py
Outdated
'ye-landing-med-2x') | ||
artistImage = _artistImage['Name'] if _artistImage else None | ||
if titleImages: | ||
_titleImage = titleImages['sizes'].get('ye-landing-lg-2x') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you prefer ye-landing-lg-2x
here but original
for the artist image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for some cases, ye-landing-lg-2x
was available when original
wasn't, but the approach you suggested above will fix that.
billboard.py
Outdated
# searching for image in 3 different sizes | ||
if artistImages: | ||
_artistImage = artistImages['sizes'].get('original') | ||
if not _artistImage: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this?
_artistImage = None
for variant in ['original', 'ye-landing-lg-2x', 'ye-landing-med-2x']:
_artistImage = _artistImage or artistImages['sizes'].get(variant)
@guoguo12 I did some modifications added more sizes for the images |
Billboard website use Javascript to lazy load images. This caused the issue with parsing the images. I managed to trace the data to an object in the page and create the links.
I also added artist image next to the title/album image.