Reddit #997

jpontoire · 2025-01-09T11:01:05Z

No description provided.

.gitignore

Yomguithereal · 2025-01-09T11:34:15Z

minet/cli/reddit/__init__.py

+    """,
+    variadic_input={
+        "dummy_column": "subreddit",
+        "item_label": "subreddit url, subreddit shortcode or subreddit id",


subreddit url, shortcode or id ce sera probablement un peu moins long dans l'aide

Yomguithereal · 2025-01-09T11:35:03Z

minet/cli/reddit/__init__.py

+    },
+    arguments=[
+        {
+            "flags": ["-n", "--number"],


D'habitude cet argument s'appelle -l, --limit dans minet. Et on ne le met en général que quand c'est pas trivial à faire avec un xan slice je crois.

Là c'est pas le maximum qu'il doit récupérer c'est le nombre de résultats à récupérer. Tu préfères que je change ça et que je mette une limite ?

Yomguithereal · 2025-01-09T11:36:07Z

minet/cli/reddit/__init__.py

+    """,
+    variadic_input={
+        "dummy_column": "post",
+        "item_label": "post url, post shortcode or post id",


Yomguithereal · 2025-01-09T11:36:59Z

minet/cli/reddit/__init__.py

+)
+
+REDDIT_USER_POSTS_SUBCOMMAND = command(
+    "user_posts",


Les noms de commandes sont en kebab case, pas snake case :)

Il faudra changer l'exemple aussi.

Yomguithereal · 2025-01-09T11:41:46Z

minet/reddit/scraper.py

+
+
+def extract_t1_ids(text):
+    pattern = r"t1_(\w+)"


compile cette regex en dehors de la fonction. Sinon dans ce cas là est-ce qu'un split fait pas le taf?

ça marcherait pas un split, c'est une chaine qui contient plus d'infos mais où je récupère juste ceux de cette forme là

par contre je viens de remarquer que sur certaines pages les données sont pas stockées avec le même format...
je finis de faire les autres changements et je répare ça

Yomguithereal · 2025-01-09T11:43:11Z

minet/reddit/scraper.py

+    error,
+):
+    try_author = post.select_one("a[class*='author']")
+    author = try_author.get_text() if try_author else "Deleted"


Est-il possible qu'il existe un user qui s'appelle "Deleted"?

alors oui j'y avais pas pensé mais il y a un user qui s'appelle deleted

du coup soit je trouve un autre moyen d'afficher les users supprimés soit je peux faire précéder les username de "u/" comme c'est souvent fait sur Reddit

ou pour les utilisateurs supprimés je peux mettre "[Deleted]", c'est indiqué comme ça sur le site

Et si tu ne mets rien? Ca marche aussi non?

Ya qu'un seul cas de figure où on ne connait pas le user?

Yomguithereal · 2025-01-09T11:44:27Z

minet/reddit/scraper.py

+
+    def get_childs_l500(self, url, list_comments, parent_id):
+        _, soup, _ = reddit_request(url, self.pool_manager)
+        comments = soup.select("div[class='commentarea']>div>div[class*='comment']")


[class='commentarea'] > .commentarea ou alors tu veux spécifiquement qu'il n'y ait qu'une seule classe?

en fait je veux juste récupérer le premier élément dont la classe contient 'comment' qui va contenir le premier commentaire de la page (le principal vu qu'on est sur la page du commentaire) et tous ses fils imbriqués direct dans sa balise. si je fais pas comme ça je vais récupérer un élément qui contient le premier commentaire et tous ses fils, un autre qui contient son premier fils et tous ses fils, etc.

Yomguithereal · 2025-01-09T11:46:03Z

minet/cli/reddit/comments.py

+        with loading_bar.step(url):
+            try:
+                if cli_args.all:
+                    comments = scraper.get_comments(url, True)


comments = scraper.get_comments(url, cli_args.all) sinon?

Yomguithereal · 2025-01-09T11:46:30Z

minet/cli/reddit/posts.py

+        with loading_bar.step(url):
+            try:
+                if cli_args.number:
+                    if cli_args.text:


Même remarque

Yomguithereal · 2025-01-09T14:29:25Z

minet/reddit/scraper.py

+    error,
+):
+    try_author = post.select_one("a[class*='author']")
+    author = try_author.get_text() if try_author else "Deleted"


Et si tu ne mets rien? Ca marche aussi non?

Yomguithereal · 2025-01-09T14:29:44Z

minet/reddit/scraper.py

+    error,
+):
+    try_author = post.select_one("a[class*='author']")
+    author = try_author.get_text() if try_author else "Deleted"


Ya qu'un seul cas de figure où on ne connait pas le user?

Yomguithereal · 2025-01-14T10:39:53Z

minet/reddit/scraper.py

+
+ID_RE = re.compile(r"t1_(\w+)")
+
+


Add comment explaining why wrt redirects and rate limit

Yomguithereal · 2025-01-14T10:40:05Z

minet/reddit/scraper.py

+    return urljoin("https://old.reddit.com", path)
+
+
+def get_old_url(url):


Simplify this as a replace

Yomguithereal · 2025-01-14T10:40:12Z

minet/reddit/scraper.py

+    return old_url
+
+
+def get_new_url(url):


Yomguithereal · 2025-01-14T10:52:03Z

minet/reddit/scraper.py

+    link,
+    error,
+):
+    sub = post.scrape_one("a[class*='subreddit']", "href")


Yomguithereal · 2025-01-14T10:54:16Z

minet/reddit/scraper.py

+        edited_date=edited_date,
+        external_link=link,
+        subreddit=get_new_url(sub),
+        error=error,


Error should not conceptually belong to the data class, because an error is mutually exclusive with the data being present in this case.

Yomguithereal · 2025-01-14T10:54:33Z

minet/reddit/scraper.py

+
+    def get_childs_l500(self, url, list_comments, parent_id):
+        _, soup, _ = reddit_request(url, self.pool_manager)
+        comments = soup.select("div.commentarea>div>div[class*='comment']")


Yomguithereal · 2025-01-14T10:55:57Z

minet/reddit/scraper.py

+        if parent_id is None:
+            for com in comments:
+                list_comments.append((None, com))
+        else:


Prefer early return

Yomguithereal · 2025-01-14T10:56:09Z

minet/reddit/scraper.py

+                comment="",
+                error=error,
+            )
+        else:


Probably idem

jpontoire and others added 29 commits December 5, 2024 16:45

Draft reddit

6cba1ee

Fix reddit posts

831537e

Updating reddit posts

8fb9cf0

Adding -t, --text to reddit posts

a88ac13

Fix tests

2735a0a

fix tests

9434b19

fix tests

1c93157

Merge branch 'master' into reddit

a49a818

First version of reddit comments

ef116eb

Update reddit comments

3ab4b42

Optimization with yield

bc901cb

Adding user_posts function

e40d672

Fix user_posts

f53c451

Fixing errors with user_posts

b932a8d

Fixing format

26be5f3

Refacto

e3a96af

better refacto

2fb4cd2

Adding reddit user_comments

bc9ff73

adding scraped values for points and comments

47c1ae5

Handle broken and banned pages

0632112

Better handling for scores

d770363

Draft of edited_date

240f1f2

Fixing error when no pagination and edited_date

abff314

Fixing data in user_comments

b045fb7

refacto and use of posts with the name of the subreddit

65ac1bf

Fixing typo

30932a2

Fixing typo

2e2abfc

Fixing error in get_new_url

39700b0

Merge branch 'master' into reddit

2c5e078

Yomguithereal requested changes Jan 9, 2025

View reviewed changes

jpontoire added 13 commits January 9, 2025 14:02

changes doc and kebab-case

4d74228

removing print and sleep

6e32569

Avoid stack overflow error

a49918b

refacto

3189558

changing -n, --number to -l, --limit and fixing errors with comments

fa3bc28

Fixing gh-tests error

2f51fb6

Fixing comments and handling detection

5a42b15

adding use of spoof-ua

7b9bb8c

Fixing tests

bf8aee9

Fixing error with deleted accounts

c28763c

Compiling the regex outside the function

622fc24

refacto

2fdfb61

Fixing error with number of posts retrieved

1e311fd

Yomguithereal requested changes Jan 14, 2025

View reviewed changes

jpontoire added 7 commits January 14, 2025 12:07

Fixing bug with old posts

b414d8a

fixing error with "?..." in url

392f20b

Fixing test error

9d4218f

refacto

8468cb8

refacto

8dc345c

fix bug with add_slash

de29187

refacto

25e2e60

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reddit #997

Reddit #997

jpontoire commented Jan 9, 2025

Yomguithereal Jan 9, 2025

Yomguithereal Jan 9, 2025

jpontoire Jan 9, 2025

Yomguithereal Jan 9, 2025

Yomguithereal Jan 9, 2025

Yomguithereal Jan 9, 2025

jpontoire Jan 9, 2025

jpontoire Jan 9, 2025

Yomguithereal Jan 9, 2025

jpontoire Jan 9, 2025

jpontoire Jan 9, 2025

jpontoire Jan 9, 2025

Yomguithereal Jan 9, 2025

Yomguithereal Jan 9, 2025

Yomguithereal Jan 9, 2025

jpontoire Jan 9, 2025

Yomguithereal Jan 9, 2025

Yomguithereal Jan 9, 2025

Yomguithereal Jan 9, 2025

Yomguithereal Jan 9, 2025

Yomguithereal Jan 14, 2025

Yomguithereal Jan 14, 2025

Yomguithereal Jan 14, 2025

Yomguithereal Jan 14, 2025

Yomguithereal Jan 14, 2025

Yomguithereal Jan 14, 2025

Yomguithereal Jan 14, 2025

Yomguithereal Jan 14, 2025

		return urljoin("https://old.reddit.com", path)


		def get_old_url(url):


		ID_RE = re.compile(r"t1_(\w+)")

Reddit #997

Are you sure you want to change the base?

Reddit #997

Conversation

jpontoire commented Jan 9, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment