-
Notifications
You must be signed in to change notification settings - Fork 627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Add reporter_stats materialized view and endpoint to fetch reporter stats #3509
base: main
Are you sure you want to change the base?
Conversation
"accountReportCount": { | ||
"type": "integer", | ||
"description": "The total number of reports made by the user on accounts." | ||
}, | ||
"recordReportCount": { | ||
"type": "integer", | ||
"description": "The total number of reports made by the user on records." | ||
}, | ||
"reportedAccountCount": { | ||
"type": "integer", | ||
"description": "The total number of accounts reported by the user." | ||
}, | ||
"reportedRecordCount": { | ||
"type": "integer", | ||
"description": "The total number of records reported by the user." | ||
}, | ||
"takendownAccountCount": { | ||
"type": "integer", | ||
"description": "The total number of accounts taken down as a result of the user's reports." | ||
}, | ||
"takendownRecordCount": { | ||
"type": "integer", | ||
"description": "The total number of records taken down as a result of the user's reports." | ||
}, | ||
"labeledAccountCount": { | ||
"type": "integer", | ||
"description": "The total number of accounts labeled as a result of the user's reports." | ||
}, | ||
"labeledRecordCount": { | ||
"type": "integer", | ||
"description": "The total number of records labeled as a result of the user's reports." | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to split the "accounts reports stats" and "record reports stats" in two distinct objects, to be more consistent with the reportee stats ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this could actually rely on a shared sub-type. Something like:
defs:
# ...
reportsCountAggregate:
description: Aggregated statistics
type: object
required:
- totalCount
- distinctCount
- takendownCount
- labeledCount
properties:
totalCount:
description: The total count of reports in the aggregate (including duplicate reports)
subjectCount:
description: The count of distinct subjects
takendownCount:
description: The count of reported subjects that were taken down
labeledCount:
description: The count of reported subjects that were labeled
reporterStats:
type: object
required: ['did']
properties:
did:
type: string
description: The DID of the user
reportedRecordsStats:
type: ref
ref: '#reportsCountAggregate'
description: Aggregated statistics on the user's reported records
reportedAccountsStats:
type: ref
ref: '#reportsCountAggregate'
description: Aggregated statistics on the user's reported accounts
|
||
-- Count total number of reports for accounts (including duplicates) | ||
COUNT(*) FILTER ( | ||
WHERE reports."action" = 'tools.ozone.moderation.defs#modEventReport' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will have no effect because this is already present in the global WHERE
clause bellow.
packages/ozone/src/db/migrations/20250206T003647759Z-reporter-stats-materialized-views.ts
Outdated
Show resolved
Hide resolved
After some experimentation with production like data in a database, there might be a more efficient way of doing this. It involves creating 4 partial indexes instead of a materialized view, and splits the query in 4 different queries: MigrationCREATE INDEX "moderation_event_account_reports_idx" ON "public"."moderation_event"("createdBy","subjectDid") where "subjectUri" IS NULL AND "action" = 'tools.ozone.moderation.defs#modEventReport';
CREATE INDEX "moderation_event_record_reports_idx" ON "public"."moderation_event"("createdBy","subjectDid","subjectUri") where "subjectUri" IS NOT NULL AND "action" = 'tools.ozone.moderation.defs#modEventReport';
CREATE INDEX "moderation_event_account_actions_ids" ON "public"."moderation_event"("subjectDid","action") where "subjectUri" IS NULL AND "action" IN ( 'tools.ozone.moderation.defs#modEventTakedown', 'tools.ozone.moderation.defs#modEventLabel');
CREATE INDEX "moderation_event_record_actions_ids" ON "public"."moderation_event"("subjectDid","subjectUri", "action") where "subjectUri" IS NOT NULL AND "action" IN ( 'tools.ozone.moderation.defs#modEventTakedown', 'tools.ozone.moderation.defs#modEventLabel'); Queries-- uses "moderation_event_account_reports_idx" index
SELECT
-- Count total number of reports for accounts (including duplicates)
COUNT(*) AS "accountReportCount",
-- Count unique accounts reported
COUNT(DISTINCT reports."subjectDid") AS "reportedAccountCount"
FROM "moderation_event" AS reports
WHERE reports."action" = 'tools.ozone.moderation.defs#modEventReport'
AND reports."subjectUri" IS NULL
AND reports."createdBy" IN ('did:plc:foo','did:plc:bar')
GROUP BY reports."createdBy";
-- uses "moderation_event_record_reports_idx" index
SELECT
-- Count total number of reports for records (including duplicates)
COUNT(*) AS "recordReportCount",
-- Count unique records reported
COUNT(DISTINCT reports."subjectUri") AS "reportedRecordCount"
FROM "moderation_event" AS reports
WHERE reports."action" = 'tools.ozone.moderation.defs#modEventReport'
AND reports."subjectUri" IS NOT NULL
AND reports."createdBy" IN ('did:plc:foo','did:plc:bar')
GROUP BY reports."createdBy";
-- uses "moderation_event_account_reports_idx" index
-- uses "moderation_event_account_actions_ids" index
SELECT
-- Count unique accounts taken down by moderators
COUNT(DISTINCT actions."subjectDid") FILTER (
WHERE actions."action" = 'tools.ozone.moderation.defs#modEventTakedown'
) AS "takendownAccountCount",
-- Count unique accounts labeled by moderators
COUNT(DISTINCT actions."subjectDid") FILTER (
WHERE actions."action" = 'tools.ozone.moderation.defs#modEventLabel'
) AS "labeledAccountCount"
FROM "moderation_event" AS reports
LEFT JOIN "moderation_event" AS actions ON
actions."subjectDid" = reports."subjectDid"
AND actions."subjectUri" IS NULL -- explicited to match index
AND actions."action" IN ( 'tools.ozone.moderation.defs#modEventTakedown', 'tools.ozone.moderation.defs#modEventLabel')
WHERE reports."action" = 'tools.ozone.moderation.defs#modEventReport'
AND reports."subjectUri" IS NULL
AND reports."createdBy" IN ('did:plc:foo','did:plc:bar')
GROUP BY reports."createdBy";
-- uses "moderation_event_record_reports_idx" index
-- uses "moderation_event_record_actions_ids" index
SELECT
-- Count unique records taken down by moderators
COUNT(DISTINCT actions."subjectUri") FILTER (
WHERE actions."action" = 'tools.ozone.moderation.defs#modEventTakedown'
) AS "takendownRecordCount",
-- Count unique records labeled by moderators
COUNT(DISTINCT actions."subjectUri") FILTER (
WHERE actions."action" = 'tools.ozone.moderation.defs#modEventLabel'
) AS "labeledRecordCount"
FROM "moderation_event" AS reports
LEFT JOIN "moderation_event" AS actions ON
actions."subjectDid" = reports."subjectDid"
AND actions."subjectUri" IS NOT NULL -- explicited to match index definition
AND actions."subjectUri" = reports."subjectUri"
AND actions."action" IN ( 'tools.ozone.moderation.defs#modEventTakedown', 'tools.ozone.moderation.defs#modEventLabel')
WHERE reports."action" = 'tools.ozone.moderation.defs#modEventReport'
AND reports."subjectUri" IS NOT NULL
AND reports."createdBy" IN ('did:plc:foo','did:plc:bar')
GROUP BY reports."createdBy"; There does not seem to be any noticeable effect on insert latencies due to the new indexes. Note that any insert in the |
That being said, I think that the strategy that is applied here can be abused by a user to artificially increase their statistics, and subsequently, their perceived reporter credibility by moderators. Indeed, it is very easy for a user to just send reports on subjects already labelled / takendown. This behavior should probably actually be punished, which will not be the case with these changes as they are. |
No description provided.