Text moderation analyzes written content to detect language that may be inappropriate, harmful, offensive, or unsafe. This includes identifying:
- Profanity or abusive language: Words or phrases that could offend or harm
- Hate speech: Content targeting groups based on attributes like race, gender, or religion
- Harassment: Language intended to intimidate, bully, or demean others
- Sensitive or dangerous content: Discussions of violence, explicit material, or risky behavior
The Google Cloud Natural Language API moderates text by assigning safety ratings with confidence scores (0 to 1). These scores help developers detect and manage harmful content while maintaining user trust.
The API provides powerful tools for text analysis:
- Entity Extraction: Identifies names, places, or dates in text
- Sentiment Analysis: Determines whether text is positive, neutral, or negative
- Text Classification: Categorizes text into topics like sports or technology
- Text Moderation: Detects harmful content and assigns safety ratings for categories like profanity and hate speech
The API supports four moderation thresholds to tailor safety settings:
- Block Always: Shows all content regardless of safety ratings
- Block Only High: Blocks content with a high probability of being unsafe
- Block Medium and Above: Blocks content with a medium or high probability of being unsafe
- Block Low and Above: Blocks content with a low, medium, or high probability of being unsafe
These thresholds allow developers to customize moderation to their platform's requirements.
Text moderation is essential for maintaining safe online spaces. Key applications include:
- Social Media Platforms: Filtering offensive comments or posts
- Customer Feedback Systems: Screening reviews for inappropriate content
- Community Forums: Automatically flagging harmful language
When the API detects unsafe content, it:
- Assigns confidence scores for categories like profanity and hate speech
- Applies user-defined safety thresholds
- Takes action (e.g., blocking, flagging, or providing feedback)
- Returns detailed feedback about flagged categories and their confidence scores
from google.cloud import language_v1
def moderate_text(content, threshold=0.7):
client = language_v1.LanguageServiceClient()
document = {"content": content, "type_": language_v1.Document.Type.PLAIN_TEXT}
response = client.analyze_moderation_text(document=document)
flagged_categories = []
for category in response.moderation_response.categories:
if category.confidence >= threshold:
flagged_categories.append(f"{category.name} (confidence: {category.confidence})")
if flagged_categories:
return f"Comment blocked. Detected: {', '.join(flagged_categories)}"
return "Comment approved."
# Example usage
user_comment = "I hate you! You're so stupid."
result = moderate_text(user_comment, threshold=0.7)
print(result)
Example Outputs:
For unsafe content:
Comment blocked. Detected: profanity (confidence: 0.92), hate_speech (confidence: 0.87)
For safe content:
Comment approved.
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://language.googleapis.com/v1/documents:moderateText \
-d '{
"document": {
"content": "shut up",
"type": "PLAIN_TEXT"
}
}'
This returns safety ratings with confidence scores for harmful categories.
Gemini, developed by Google DeepMind, offers built-in safeguards for:
- Harassment
- Hate speech
- Sexually explicit content
- Dangerous content
- Default Settings: Block content with medium or higher probability
- Adjustable Parameters: Customize safety settings for categories and thresholds
- Detailed Feedback: Provides feedback for blocked content with probabilities
safety_settings = {
"category": "hate_speech",
"threshold": "BLOCK_LOW_AND_ABOVE"
}
Example use case: A chatbot for kids may block even low-probability unsafe content for stricter safety.
The API supports multiple languages, including:
- Java
- Go
- Node.js
The Google Cloud Natural Language API and Gemini empower developers to create safer digital environments by detecting and managing harmful content. With flexible thresholds, actionable feedback, and support for multiple languages, these tools help tailor moderation to your platform's needs.
Learn more in Responsible AI for Developers: Privacy & Safety from Google Cloud Skills Boost: https://www.cloudskillsboost.google/paths/183/course_templates/1036/video/513303
Learn safety skills hands on in a lab: see Safeguarding with Vertex AI in Gemini on the Software Girls channel and solve the lab on your own https://www.youtube.com/watch?v=SM2ShTcMWVs
I wrote the code for the gif-source-code
folder in CodePen using Babel.
index.html
: The HTML structure for the project.style.css
: The CSS for styling.script.js
: The JavaScript code written in ES6+.
You can view or edit the live version of this code on CodePen: https://codepen.io/captainanonymous00/pen/WbeXJMG.
To use this code locally, download the files in the gif-source-code
folder and open index.html
in your browser.
This repository is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
- You are free to share and adapt the work for non-commercial purposes.
- Proper attribution must be given to the author.
- Commercial use is not permitted without explicit permission.
For more details, see the LICENSE file.