-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reddit #997
base: master
Are you sure you want to change the base?
Conversation
minet/cli/reddit/__init__.py
Outdated
""", | ||
variadic_input={ | ||
"dummy_column": "subreddit", | ||
"item_label": "subreddit url, subreddit shortcode or subreddit id", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subreddit url, shortcode or id
ce sera probablement un peu moins long dans l'aide
minet/cli/reddit/__init__.py
Outdated
}, | ||
arguments=[ | ||
{ | ||
"flags": ["-n", "--number"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D'habitude cet argument s'appelle -l, --limit
dans minet. Et on ne le met en général que quand c'est pas trivial à faire avec un xan slice
je crois.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Là c'est pas le maximum qu'il doit récupérer c'est le nombre de résultats à récupérer. Tu préfères que je change ça et que je mette une limite ?
minet/cli/reddit/__init__.py
Outdated
""", | ||
variadic_input={ | ||
"dummy_column": "post", | ||
"item_label": "post url, post shortcode or post id", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idem
minet/cli/reddit/__init__.py
Outdated
) | ||
|
||
REDDIT_USER_POSTS_SUBCOMMAND = command( | ||
"user_posts", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Les noms de commandes sont en kebab case, pas snake case :)
Il faudra changer l'exemple aussi.
minet/reddit/scraper.py
Outdated
|
||
|
||
def extract_t1_ids(text): | ||
pattern = r"t1_(\w+)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compile cette regex en dehors de la fonction. Sinon dans ce cas là est-ce qu'un split fait pas le taf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ça marcherait pas un split, c'est une chaine qui contient plus d'infos mais où je récupère juste ceux de cette forme là
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
par contre je viens de remarquer que sur certaines pages les données sont pas stockées avec le même format...
je finis de faire les autres changements et je répare ça
minet/reddit/scraper.py
Outdated
error, | ||
): | ||
try_author = post.select_one("a[class*='author']") | ||
author = try_author.get_text() if try_author else "Deleted" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Est-il possible qu'il existe un user qui s'appelle "Deleted"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alors oui j'y avais pas pensé mais il y a un user qui s'appelle deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
du coup soit je trouve un autre moyen d'afficher les users supprimés soit je peux faire précéder les username de "u/" comme c'est souvent fait sur Reddit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ou pour les utilisateurs supprimés je peux mettre "[Deleted]", c'est indiqué comme ça sur le site
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Et si tu ne mets rien? Ca marche aussi non?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya qu'un seul cas de figure où on ne connait pas le user?
minet/reddit/scraper.py
Outdated
|
||
def get_childs_l500(self, url, list_comments, parent_id): | ||
_, soup, _ = reddit_request(url, self.pool_manager) | ||
comments = soup.select("div[class='commentarea']>div>div[class*='comment']") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[class='commentarea']
> .commentarea
ou alors tu veux spécifiquement qu'il n'y ait qu'une seule classe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
en fait je veux juste récupérer le premier élément dont la classe contient 'comment' qui va contenir le premier commentaire de la page (le principal vu qu'on est sur la page du commentaire) et tous ses fils imbriqués direct dans sa balise. si je fais pas comme ça je vais récupérer un élément qui contient le premier commentaire et tous ses fils, un autre qui contient son premier fils et tous ses fils, etc.
minet/cli/reddit/comments.py
Outdated
with loading_bar.step(url): | ||
try: | ||
if cli_args.all: | ||
comments = scraper.get_comments(url, True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments = scraper.get_comments(url, cli_args.all)
sinon?
minet/cli/reddit/posts.py
Outdated
with loading_bar.step(url): | ||
try: | ||
if cli_args.number: | ||
if cli_args.text: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Même remarque
minet/reddit/scraper.py
Outdated
error, | ||
): | ||
try_author = post.select_one("a[class*='author']") | ||
author = try_author.get_text() if try_author else "Deleted" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Et si tu ne mets rien? Ca marche aussi non?
minet/reddit/scraper.py
Outdated
error, | ||
): | ||
try_author = post.select_one("a[class*='author']") | ||
author = try_author.get_text() if try_author else "Deleted" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya qu'un seul cas de figure où on ne connait pas le user?
|
||
ID_RE = re.compile(r"t1_(\w+)") | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment explaining why wrt redirects and rate limit
return urljoin("https://old.reddit.com", path) | ||
|
||
|
||
def get_old_url(url): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplify this as a replace
return old_url | ||
|
||
|
||
def get_new_url(url): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idem
minet/reddit/scraper.py
Outdated
link, | ||
error, | ||
): | ||
sub = post.scrape_one("a[class*='subreddit']", "href") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.subreddit
edited_date=edited_date, | ||
external_link=link, | ||
subreddit=get_new_url(sub), | ||
error=error, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error should not conceptually belong to the data class, because an error is mutually exclusive with the data being present in this case.
minet/reddit/scraper.py
Outdated
|
||
def get_childs_l500(self, url, list_comments, parent_id): | ||
_, soup, _ = reddit_request(url, self.pool_manager) | ||
comments = soup.select("div.commentarea>div>div[class*='comment']") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.comment?
minet/reddit/scraper.py
Outdated
if parent_id is None: | ||
for com in comments: | ||
list_comments.append((None, com)) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer early return
comment="", | ||
error=error, | ||
) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably idem
No description provided.