Skip to content

Commit

Permalink
Merge pull request #37 from BenMMcLean/release/0.5.0/1
Browse files Browse the repository at this point in the history
Release/0.5.0/1
  • Loading branch information
emilymclean authored Mar 22, 2024
2 parents 4e91884 + 11df4ab commit a25ab49
Show file tree
Hide file tree
Showing 21 changed files with 703 additions and 336 deletions.
3 changes: 2 additions & 1 deletion Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,11 @@ aiostream = "*"
python-magic = "*"
pykson = "*"
libmagic = "*"
plemmy = "*"
plemmy = "==0.4.1"
sqlalchemy = "*"
alembic = "*"
pytest = "*"
mistletoe = "*"

[dev-packages]

Expand Down
532 changes: 252 additions & 280 deletions Pipfile.lock

Large diffs are not rendered by default.

13 changes: 12 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,18 @@ A Docker image is provided for the purposes of easily running the moderation bot
environment. To run a bot with just the toxicity detection an example `docker-compose` file has been [provided](docker-compose.example.yml).
To set up further modules (as detailed further below), mount a replacement `main.py` file at `/app/main.py`.

The bot can also be run un-containerised, either by cloning the repo, or by (in the future) using the pip package.
The bot can also be run un-containerised, either by cloning the repo, or by using the pip package.

### ⚠️ Compatibility ⚠️
Version 0.19.0+ of Lemmy incompatibally updates the method through which clients interact with the API. By default, this project
will work with these newer versions. If your community is hosted on an older instance, the following steps are necessary for the
bot to function:

If you are using docker, ensure your container uses versions prefixed with `compat-0.18-`.

If you are using the package hosted on pypi, *explicitly* declare your plemmy version as `0.3.11`.

If you are running from source, update the plemmy version in Pipfile.

## Modules
Different aspects of moderation are divided into "Processors". These scan and report content for a single kind of
Expand Down
6 changes: 1 addition & 5 deletions docker-compose.example.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,7 @@ services:
- LEMMY_INSTANCE=
- LEMMY_OWNER_USERNAME=
- LEMMY_COMMUNITIES=
- MATRIX_INSTANCE=
- MATRIX_USERNAME=
- MATRIX_PASSWORD=
- MATRIX_ROOM=
volumes:
- lemmy-mod-data:/app/data
volumes:
lemmy-mod-data:
lemmy-mod-data:
124 changes: 115 additions & 9 deletions gitversion.yml
Original file line number Diff line number Diff line change
@@ -1,21 +1,127 @@
mode: ContinuousDeployment
continuous-delivery-fallback-tag: ''
assembly-versioning-scheme: MajorMinorPatch
assembly-file-versioning-scheme: MajorMinorPatch
mode: ContinuousDelivery
tag-prefix: '(pre/)?[vV]'
next-version: 0.2.0
continuous-delivery-fallback-tag: ci
major-version-bump-message: '\+semver:\s?(breaking|major)'
minor-version-bump-message: '\+semver:\s?(feature|minor)'
patch-version-bump-message: '\+semver:\s?(fix|patch)'
no-bump-message: '\+semver:\s?(none|skip)'
legacy-semver-padding: 4
build-metadata-padding: 4
commits-since-version-source-padding: 4
tag-pre-release-weight: 60000
commit-message-incrementing: Enabled
branches:
develop:
mode: ContinuousDeployment
increment: Patch
tag: ''
tag: alpha
increment: Minor
prevent-increment-of-merged-branch-version: false
track-merge-target: true
regex: ^develop$
source-branches: []
tracks-release-branches: true
is-release-branch: false
prevent-increment-of-merged-branch-version: false
master:
is-mainline: false
pre-release-weight: 0
main:
mode: ContinuousDelivery
tag: ''
mode: ContinuousDeployment
increment: Minor
increment: Patch
prevent-increment-of-merged-branch-version: true
track-merge-target: false
regex: ^master$|^main$
source-branches:
- develop
- release
tracks-release-branches: false
is-release-branch: false
is-mainline: true
pre-release-weight: 55000
release:
mode: ContinuousDelivery
tag: beta
increment: None
prevent-increment-of-merged-branch-version: true
track-merge-target: false
regex: ^releases?[/-]
source-branches:
- develop
- main
- support
- release
tracks-release-branches: false
is-release-branch: true
is-mainline: false
pre-release-weight: 30000
feature:
mode: ContinuousDelivery
tag: useBranchName
increment: Inherit
prevent-increment-of-merged-branch-version: false
track-merge-target: false
regex: ^.*[/-]features?[/-]
source-branches:
- develop
- main
- release
- feature
- support
- hotfix
tracks-release-branches: false
is-release-branch: false
is-mainline: false
pre-release-weight: 30000
pull-request:
mode: ContinuousDelivery
tag: PullRequest
increment: Inherit
prevent-increment-of-merged-branch-version: false
tag-number-pattern: '[/-](?<number>\d+)'
track-merge-target: false
regex: ^(pull|pull\-requests|pr)[/-]
source-branches:
- develop
- main
- release
- feature
- support
- hotfix
tracks-release-branches: false
is-release-branch: false
is-mainline: false
pre-release-weight: 30000
hotfix:
mode: ContinuousDelivery
tag: beta
increment: Patch
prevent-increment-of-merged-branch-version: false
track-merge-target: false
regex: ^.*[/-]hotfix(es)?[/-]
source-branches:
- develop
- main
- support
tracks-release-branches: false
is-release-branch: false
is-mainline: false
pre-release-weight: 30000
support:
mode: ContinuousDelivery
tag: ''
increment: Patch
prevent-increment-of-merged-branch-version: true
track-merge-target: false
regex: ^.*[/-]support[/-]
source-branches:
- main
tracks-release-branches: false
is-release-branch: false
is-mainline: true
pre-release-weight: 55000
ignore:
sha: []
commit-date-format: yyyy-MM-dd
merge-message-formats: {}
update-build-number: true
1 change: 1 addition & 0 deletions lemmymodbot/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@
from .lemmybot import LemmyBot
from .config import Config, MatrixConfig, environment_config
from .ml import ToxicityClassifier
from .helpers import *
22 changes: 13 additions & 9 deletions lemmymodbot/database.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ def __init__(self, directory_name, filename):
PRIMARY KEY("url")
);'''
self.check_table_exists('phash', create_phash_table_sql)
self.update_table('''ALTER TABLE "phash" ADD COLUMN "spam" INTEGER''')

@contextmanager
def _session(self):
Expand Down Expand Up @@ -100,7 +101,7 @@ def add_to_comments_list(self, comment_id, results):
""" add a comment id to the list of previously checked comments in the database """
with self._session() as conn:
sql = '''INSERT INTO comments(id, toxicity, non_toxicity) VALUES(?,?,?);'''
conn.execute(sql, (comment_id, results['toxicity'], results['non_toxicity']))
conn.execute(sql, (comment_id, results['toxicity'] if 'toxicity' in results else 0, results['non_toxicity'] if 'non_toxicity' in results else 1))

def add_outcome_to_comment(self, comment_id, outcome):
""" add an outcome to a comment record in the database """
Expand All @@ -125,29 +126,32 @@ def add_to_posts_list(self, post_id, detox_name_results, detox_body_results, ext
body_toxicity, body_non_toxicity, link, phash) VALUES(?,?,?,?,?,?,?);"""

conn.execute(sql, (post_id,
detox_name_results['toxicity'],
detox_name_results['non_toxicity'],
detox_name_results['toxicity'] if 'toxicity' in detox_name_results else 0,
detox_name_results['non_toxicity'] if 'non_toxicity' in detox_name_results else 1,
detox_body_results['toxicity'] if 'toxicity' in detox_body_results else 0,
detox_body_results['non_toxicity'] if 'non_toxicity' in detox_body_results else 1,
link,
extras['phash'] if 'phash' in extras else None
))

def add_phash(self, url: str, phash: str):
def add_phash(self, url: str, phash: str, spam: bool = False):
with self._session() as conn:
sql = """INSERT INTO phash(url, phash) VALUES(?,?);"""
conn.execute(sql, (url, phash))
sql = """INSERT INTO phash(url, phash, spam) VALUES(?,?,?);"""
conn.execute(sql, (url, phash, spam))

def phash_exists(self, phash: str):
def phash_exists(self, phash: str, spam: bool = False):
with self._session() as conn:
sql = """SELECT COUNT(url) FROM phash where phash=?"""
result = conn.execute(sql, (phash,))
sql = """SELECT COUNT(url) FROM phash where phash=? AND spam=?"""
result = conn.execute(sql, (phash,spam))
for row in result.fetchone():
if row != 0:
return True
return False

def url_exists(self, url: str) -> Optional[str]:
if url == "":
return None

with self._session() as conn:
sql = """SELECT phash FROM phash where url=?"""
result = conn.execute(sql, (url,))
Expand Down
3 changes: 3 additions & 0 deletions lemmymodbot/helpers/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from .extract_links_from_markdown import extract_links_from_markdown
from .fetch_image import fetch_image
from .spam_image_bootstrapper import SpamImageBootstrapper
25 changes: 25 additions & 0 deletions lemmymodbot/helpers/extract_links_from_markdown.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
from typing import List

from mistletoe import Document
from mistletoe.span_token import Image, Link, AutoLink
from mistletoe.token import Token


def extract_links_from_markdown(markdown: str) -> List[str]:
doc = Document(markdown)
return _internal_extract_links_from_markdown(doc)


def _internal_extract_links_from_markdown(token: Token) -> List[str]:
if isinstance(token, Image):
return [token.src]
if isinstance(token, Link) or isinstance(token, AutoLink):
return [token.target]

if "children" in vars(token):
out = []
for child in token.children:
out += _internal_extract_links_from_markdown(child)
return out

return []
13 changes: 13 additions & 0 deletions lemmymodbot/helpers/fetch_image.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
from io import BytesIO

import requests
from PIL import Image, UnidentifiedImageError
import imagehash


def fetch_image(url: str) -> (Image, str):
try:
img = Image.open(BytesIO(requests.get(url).content))
return img, str(imagehash.phash(img))
except UnidentifiedImageError:
return None, None
32 changes: 32 additions & 0 deletions lemmymodbot/helpers/spam_image_bootstrapper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
from typing import List, Callable

from PIL.Image import Image

from lemmymodbot.database import Database
from lemmymodbot.helpers import fetch_image


class SpamImageBootstrapper:
database: Database
fetch_image: Callable[[str], tuple[Image, str]]

def __init__(self, database: Database, fetch_image: Callable[[str], tuple[Image, str]] = fetch_image):
self.database = database
self.fetch_image = fetch_image

def setup(self, images: List[str]):
for image in images:
is_url = image.startswith("http")
if is_url:
# Check if url has already been added
phash = self.database.url_exists(image)
if phash is not None:
if self.database.phash_exists(phash, True):
continue

phash = self.fetch_image(image)[1] if is_url else image
if phash is not None:
if self.database.phash_exists(phash, True):
continue

self.database.add_phash(image if is_url else "", phash, True)
1 change: 1 addition & 0 deletions lemmymodbot/lemmybot.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ def __init__(
db_directory_name = 'data'
db_file_name = 'history.db'
self.history_db = Database(db_directory_name, db_file_name)

self.logger.info("Bot starting!")
self.mydelay = ReconnectionDelayManager(logger=self.logger)
self.matrix_facade = MatrixFacade(
Expand Down
3 changes: 2 additions & 1 deletion lemmymodbot/processors/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@
from .phash_processor import PhashProcessor
from .title_conformity_processor import TitleConformityProcessor
from .mime_processor import MimeWhitelistProcessor, MimeBlacklistProcessor
from .account_age_processor import AccountAgeProcessor
from .account_age_processor import AccountAgeProcessor
from .spam_image_processor import SpamImageProcessor
Loading

0 comments on commit a25ab49

Please sign in to comment.