Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add reporter_stats materialized view and endpoint to fetch reporter stats #3509

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

foysalit
Copy link
Contributor

@foysalit foysalit commented Feb 8, 2025

No description provided.

Comment on lines +834 to +865
"accountReportCount": {
"type": "integer",
"description": "The total number of reports made by the user on accounts."
},
"recordReportCount": {
"type": "integer",
"description": "The total number of reports made by the user on records."
},
"reportedAccountCount": {
"type": "integer",
"description": "The total number of accounts reported by the user."
},
"reportedRecordCount": {
"type": "integer",
"description": "The total number of records reported by the user."
},
"takendownAccountCount": {
"type": "integer",
"description": "The total number of accounts taken down as a result of the user's reports."
},
"takendownRecordCount": {
"type": "integer",
"description": "The total number of records taken down as a result of the user's reports."
},
"labeledAccountCount": {
"type": "integer",
"description": "The total number of accounts labeled as a result of the user's reports."
},
"labeledRecordCount": {
"type": "integer",
"description": "The total number of records labeled as a result of the user's reports."
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to split the "accounts reports stats" and "record reports stats" in two distinct objects, to be more consistent with the reportee stats ?

Copy link
Contributor

@matthieusieben matthieusieben Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this could actually rely on a shared sub-type. Something like:

defs:
  # ...

  reportsCountAggregate:
    description: Aggregated statistics
    type: object
    required:
      - totalCount
      - distinctCount
      - takendownCount
      - labeledCount
    properties:
      totalCount:
        description: The total count of reports in the aggregate (including duplicate reports)
      subjectCount:
        description: The count of distinct subjects
      takendownCount:
        description: The count of reported subjects that were taken down
      labeledCount:
        description: The count of reported subjects that were labeled

  reporterStats:
    type: object
    required: ['did']
    properties:
      did:
        type: string
        description: The DID of the user
      reportedRecordsStats:
        type: ref
        ref: '#reportsCountAggregate'
        description: Aggregated statistics on the user's reported records
      reportedAccountsStats:
        type: ref
        ref: '#reportsCountAggregate'
        description: Aggregated statistics on the user's reported accounts


-- Count total number of reports for accounts (including duplicates)
COUNT(*) FILTER (
WHERE reports."action" = 'tools.ozone.moderation.defs#modEventReport'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will have no effect because this is already present in the global WHERE clause bellow.

@matthieusieben
Copy link
Contributor

matthieusieben commented Feb 11, 2025

After some experimentation with production like data in a database, there might be a more efficient way of doing this. It involves creating 4 partial indexes instead of a materialized view, and splits the query in 4 different queries:

Migration
CREATE INDEX "moderation_event_account_reports_idx" ON "public"."moderation_event"("createdBy","subjectDid") where "subjectUri" IS NULL AND "action" = 'tools.ozone.moderation.defs#modEventReport';
CREATE INDEX "moderation_event_record_reports_idx" ON "public"."moderation_event"("createdBy","subjectDid","subjectUri") where "subjectUri" IS NOT NULL AND "action" = 'tools.ozone.moderation.defs#modEventReport';

CREATE INDEX "moderation_event_account_actions_ids" ON "public"."moderation_event"("subjectDid","action") where "subjectUri" IS NULL AND "action" IN ( 'tools.ozone.moderation.defs#modEventTakedown', 'tools.ozone.moderation.defs#modEventLabel');
CREATE INDEX "moderation_event_record_actions_ids" ON "public"."moderation_event"("subjectDid","subjectUri", "action") where "subjectUri" IS NOT NULL AND "action" IN ( 'tools.ozone.moderation.defs#modEventTakedown', 'tools.ozone.moderation.defs#modEventLabel');
Queries
-- uses "moderation_event_account_reports_idx" index
SELECT
    -- Count total number of reports for accounts (including duplicates)
    COUNT(*) AS "accountReportCount",
    -- Count unique accounts reported
    COUNT(DISTINCT reports."subjectDid") AS "reportedAccountCount"
FROM "moderation_event" AS reports
WHERE reports."action" = 'tools.ozone.moderation.defs#modEventReport'
AND reports."subjectUri" IS NULL
AND reports."createdBy" IN ('did:plc:foo','did:plc:bar')
GROUP BY reports."createdBy";

-- uses "moderation_event_record_reports_idx" index
SELECT
    -- Count total number of reports for records (including duplicates)
    COUNT(*) AS "recordReportCount",
    -- Count unique records reported
    COUNT(DISTINCT reports."subjectUri") AS "reportedRecordCount"
FROM "moderation_event" AS reports
WHERE reports."action" = 'tools.ozone.moderation.defs#modEventReport'
AND reports."subjectUri" IS NOT NULL
AND reports."createdBy" IN ('did:plc:foo','did:plc:bar')
GROUP BY reports."createdBy";

-- uses "moderation_event_account_reports_idx" index
-- uses "moderation_event_account_actions_ids" index
SELECT
    -- Count unique accounts taken down by moderators
    COUNT(DISTINCT actions."subjectDid") FILTER (
        WHERE actions."action" = 'tools.ozone.moderation.defs#modEventTakedown'
    ) AS "takendownAccountCount",
    -- Count unique accounts labeled by moderators
    COUNT(DISTINCT actions."subjectDid") FILTER (
        WHERE actions."action" = 'tools.ozone.moderation.defs#modEventLabel'
    ) AS "labeledAccountCount"
FROM "moderation_event" AS reports
LEFT JOIN "moderation_event" AS actions ON
      actions."subjectDid" = reports."subjectDid"
  AND actions."subjectUri" IS NULL -- explicited to match index 
  AND actions."action" IN ( 'tools.ozone.moderation.defs#modEventTakedown', 'tools.ozone.moderation.defs#modEventLabel')
WHERE reports."action" = 'tools.ozone.moderation.defs#modEventReport'
AND reports."subjectUri" IS NULL
AND reports."createdBy" IN ('did:plc:foo','did:plc:bar')
GROUP BY reports."createdBy";

-- uses "moderation_event_record_reports_idx" index
-- uses "moderation_event_record_actions_ids" index
SELECT
    -- Count unique records taken down by moderators
    COUNT(DISTINCT actions."subjectUri") FILTER (
        WHERE actions."action" = 'tools.ozone.moderation.defs#modEventTakedown'
    ) AS "takendownRecordCount",
    -- Count unique records labeled by moderators
    COUNT(DISTINCT actions."subjectUri") FILTER (
        WHERE actions."action" = 'tools.ozone.moderation.defs#modEventLabel'
    ) AS "labeledRecordCount"
FROM "moderation_event" AS reports
LEFT JOIN "moderation_event" AS actions ON
      actions."subjectDid" = reports."subjectDid"
  AND actions."subjectUri" IS NOT NULL -- explicited to match index definition
  AND actions."subjectUri" = reports."subjectUri"
  AND actions."action" IN ( 'tools.ozone.moderation.defs#modEventTakedown', 'tools.ozone.moderation.defs#modEventLabel')
WHERE reports."action" = 'tools.ozone.moderation.defs#modEventReport'
AND reports."subjectUri" IS NOT NULL
AND reports."createdBy" IN ('did:plc:foo','did:plc:bar')
GROUP BY reports."createdBy";

There does not seem to be any noticeable effect on insert latencies due to the new indexes. Note that any insert in the moderation_event table triggers at an update of at most one of the newly added indexes.

@matthieusieben
Copy link
Contributor

That being said, I think that the strategy that is applied here can be abused by a user to artificially increase their statistics, and subsequently, their perceived reporter credibility by moderators.

Indeed, it is very easy for a user to just send reports on subjects already labelled / takendown. This behavior should probably actually be punished, which will not be the case with these changes as they are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants