Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix images parsing and add artist album to ChartEntry #80

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

soubenz
Copy link

@soubenz soubenz commented Dec 6, 2020

Billboard website use Javascript to lazy load images. This caused the issue with parsing the images. I managed to trace the data to an object in the page and create the links.
I also added artist image next to the title/album image.

@soubenz soubenz mentioned this pull request Dec 6, 2020
@guoguo12
Copy link
Owner

guoguo12 commented Dec 6, 2020

Thanks! Can you rebase your branch? I fixed the tests in master.

I'll leave some inline comments.

billboard.py Outdated
weeks,
rank,
isNew,
artistImage=None):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, can you explicitly set artistImage to None in _parseOldStylePage instead of having a default value?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will do

@@ -65,8 +66,17 @@ class ChartEntry:
rank: The track's position on the chart, as an int.
isNew: Whether the track is new to the chart, as a boolean.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the new field to the doc comment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will do

billboard.py Outdated
dateElement = soup.select_one("button.date-selector__button.button--link")
artists = []
# extract artists info from page
pageContent = json.loads(soup.select_one("div#charts")['data-charts'])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this raise if this fails to load? Can we catch the exception instead?

billboard.py Outdated
# TODO: Parse the image
image = None
# get image data from artists list
artistData = list(filter(lambda x: x['name'] == artist,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like artists should be a dict.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, what happens if an artist appears multiple times on a chart with different albums? Does titleImage work correctly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. I'll fix that

billboard.py Outdated
# searching for image in 3 different sizes
if artistImages:
_artistImage = artistImages['sizes'].get('original')
if not _artistImage:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think

x = z1
if not x:
  x = z2
if not x:
  x = z3

is usually written x = z1 or z2 or z3.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although it wouldn't be good for readability , don't you think ?
also, it is possible to add more later. Since sometimes the image is available only in small sizes

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this?

_artistImage = None
for variant in ['original', 'ye-landing-lg-2x', 'ye-landing-med-2x']:
  _artistImage = _artistImage or artistImages['sizes'].get(variant)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will do it !

billboard.py Outdated
@@ -314,7 +324,43 @@ def getMinistatsCellValue(fieldName, ifNoValue=None):
self.entries.append(entry)

def _parseNewStylePage(self, soup):
dateElement = soup.select_one("button.date-selector__button.button--link")
artists = []
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this whole block into a helper function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

billboard.py Outdated
'ye-landing-med-2x')
artistImage = _artistImage['Name'] if _artistImage else None
if titleImages:
_titleImage = titleImages['sizes'].get('ye-landing-lg-2x')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you prefer ye-landing-lg-2x here but original for the artist image?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for some cases, ye-landing-lg-2x was available when original wasn't, but the approach you suggested above will fix that.

billboard.py Outdated
# searching for image in 3 different sizes
if artistImages:
_artistImage = artistImages['sizes'].get('original')
if not _artistImage:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this?

_artistImage = None
for variant in ['original', 'ye-landing-lg-2x', 'ye-landing-med-2x']:
  _artistImage = _artistImage or artistImages['sizes'].get(variant)

@soubenz
Copy link
Author

soubenz commented Dec 9, 2020

@guoguo12 I did some modifications added more sizes for the images

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants