Skip to content

This repository introduces a bot for Telegram that analyzes and classifies news using a pre-trained RuBERT model, simplifying news management in Telegram channels.

License

Notifications You must be signed in to change notification settings

NKTKLN/telegram-news-classifier

Repository files navigation

📣 Telegram News Classifier

🗒 Description

At some point, I realized that most of the posts from the Telegram channels I subscribed to were not informative and lacked value for me. Therefore, I decided to create a bot that sorts out ads and categorizes all the news and posts from the channels I follow.

⚙️ Bot Configuration

The configuration file can be found in the config/example_config.yaml file. Here's an example of what it should look like:

telegram:
  api_id: 1821196
  api_hash: "your_api_hash_here"  # https://my.telegram.org/auth
  session_name: "news_classifier"

bot_settings:
  model_path: "model"
  db_path: "messages.db"
  message_lifetime: 2  # Time in hours

# Optional: Uncomment to exclude categories or channels

# exclude_categories:
#   - 1
#   - 2

# exclude_channels:
#   - 1
#   - 2

How to set it up:

  1. Obtain your API ID and API Hash by logging into Telegram's Developer Portal.
  2. Replace your_api_hash_here with your actual API hash.
  3. The model_path should point to the folder where the model is located (downloadable from this link).
  4. The db_path is the database where the bot stores the messages.
  5. The message_lifetime is the time in hours that messages are stored in the database to account for repeated messages.

The example_config.yaml is just a template. Once you've filled it with your details, you can rename it to config.yaml.

🐳 Run in Docker

You can run the bot using Docker. Simply execute:

docker build -t telegram-news-classifier .

If No Session Exists

If the session file (news_classifier.session) is missing, the bot will require you to log in. To do this, run the following command:

docker run -i -t -v $(pwd):/app telegram-news-classifier --login

This command will initiate the login process, and you will be prompted to enter your phone number and the authentication code from Telegram. After the first login, the session file will be saved and used for future runs.

Running the Bot (After Session Exists)

If the session file is already present (created after the first login), you can run the bot without the --login flag:

docker run -v $(pwd):/app telegram-news-classifier -d

Alternatively, if you're using docker-compose, you can run the bot with:

docker-compose up --build -d

🔧 Manual Run

Alternatively, you can set up and run it manually using Python and Poetry:

  1. Install Poetry if you haven't already: Poetry installation guide.

  2. Clone the repository and navigate to the project folder.

  3. Download the model from this link.

  4. Install the dependencies by running:

    poetry install

    Additionally, you will need to install the language model for spaCy:

    poetry run python -m spacy download ru_core_news_sm
  5. To run the bot, use:

    poetry run python -m bot.main

✅ ToDo

  • Add a "merge" news function (combine news from different sources into the most detailed version).
  • Add deletion of topics if changes have been made to the config.

📃 License

This project is licensed under the MIT License. See LICENSE.md for the full text.

About

This repository introduces a bot for Telegram that analyzes and classifies news using a pre-trained RuBERT model, simplifying news management in Telegram channels.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published