Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reddit #997

Open
wants to merge 49 commits into
base: master
Choose a base branch
from
Open

Reddit #997

wants to merge 49 commits into from

Conversation

jpontoire
Copy link

No description provided.

.gitignore Outdated Show resolved Hide resolved
""",
variadic_input={
"dummy_column": "subreddit",
"item_label": "subreddit url, subreddit shortcode or subreddit id",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subreddit url, shortcode or id ce sera probablement un peu moins long dans l'aide

},
arguments=[
{
"flags": ["-n", "--number"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D'habitude cet argument s'appelle -l, --limit dans minet. Et on ne le met en général que quand c'est pas trivial à faire avec un xan slice je crois.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Là c'est pas le maximum qu'il doit récupérer c'est le nombre de résultats à récupérer. Tu préfères que je change ça et que je mette une limite ?

""",
variadic_input={
"dummy_column": "post",
"item_label": "post url, post shortcode or post id",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem

)

REDDIT_USER_POSTS_SUBCOMMAND = command(
"user_posts",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Les noms de commandes sont en kebab case, pas snake case :)

Il faudra changer l'exemple aussi.



def extract_t1_ids(text):
pattern = r"t1_(\w+)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compile cette regex en dehors de la fonction. Sinon dans ce cas là est-ce qu'un split fait pas le taf?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ça marcherait pas un split, c'est une chaine qui contient plus d'infos mais où je récupère juste ceux de cette forme là

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

par contre je viens de remarquer que sur certaines pages les données sont pas stockées avec le même format...
je finis de faire les autres changements et je répare ça

error,
):
try_author = post.select_one("a[class*='author']")
author = try_author.get_text() if try_author else "Deleted"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Est-il possible qu'il existe un user qui s'appelle "Deleted"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alors oui j'y avais pas pensé mais il y a un user qui s'appelle deleted

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

du coup soit je trouve un autre moyen d'afficher les users supprimés soit je peux faire précéder les username de "u/" comme c'est souvent fait sur Reddit

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ou pour les utilisateurs supprimés je peux mettre "[Deleted]", c'est indiqué comme ça sur le site

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Et si tu ne mets rien? Ca marche aussi non?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya qu'un seul cas de figure où on ne connait pas le user?


def get_childs_l500(self, url, list_comments, parent_id):
_, soup, _ = reddit_request(url, self.pool_manager)
comments = soup.select("div[class='commentarea']>div>div[class*='comment']")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[class='commentarea'] > .commentarea ou alors tu veux spécifiquement qu'il n'y ait qu'une seule classe?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

en fait je veux juste récupérer le premier élément dont la classe contient 'comment' qui va contenir le premier commentaire de la page (le principal vu qu'on est sur la page du commentaire) et tous ses fils imbriqués direct dans sa balise. si je fais pas comme ça je vais récupérer un élément qui contient le premier commentaire et tous ses fils, un autre qui contient son premier fils et tous ses fils, etc.

with loading_bar.step(url):
try:
if cli_args.all:
comments = scraper.get_comments(url, True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments = scraper.get_comments(url, cli_args.all) sinon?

with loading_bar.step(url):
try:
if cli_args.number:
if cli_args.text:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Même remarque

error,
):
try_author = post.select_one("a[class*='author']")
author = try_author.get_text() if try_author else "Deleted"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Et si tu ne mets rien? Ca marche aussi non?

error,
):
try_author = post.select_one("a[class*='author']")
author = try_author.get_text() if try_author else "Deleted"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya qu'un seul cas de figure où on ne connait pas le user?


ID_RE = re.compile(r"t1_(\w+)")


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment explaining why wrt redirects and rate limit

return urljoin("https://old.reddit.com", path)


def get_old_url(url):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify this as a replace

return old_url


def get_new_url(url):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem

link,
error,
):
sub = post.scrape_one("a[class*='subreddit']", "href")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.subreddit

edited_date=edited_date,
external_link=link,
subreddit=get_new_url(sub),
error=error,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error should not conceptually belong to the data class, because an error is mutually exclusive with the data being present in this case.


def get_childs_l500(self, url, list_comments, parent_id):
_, soup, _ = reddit_request(url, self.pool_manager)
comments = soup.select("div.commentarea>div>div[class*='comment']")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.comment?

if parent_id is None:
for com in comments:
list_comments.append((None, com))
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer early return

comment="",
error=error,
)
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably idem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants