Skip to content
@TrustAIRLab

TrustAIRLab

GitHub Org's stars

TrustAIRLab (Trustworthy AI Research Lab) is a research lab dedicated to the trustworthy machine learning, with a focus on safety, privacy, and security. It aims to

  • offer high-quality libraries to reduce the difficulties in algorithm reproduction

  • benchmark existing attacks and defenses on machine learning models

  • build a solid foundation for Trustworthy AI research and development

Popular repositories Loading

  1. Comprehensive_Jailbreak_Assessment Comprehensive_Jailbreak_Assessment Public

    Python 73 10

  2. VoiceJailbreakAttack VoiceJailbreakAttack Public

    Code for Voice Jailbreak Attacks Against GPT-4o.

    Python 27

  3. JailbreakLLMs JailbreakLLMs Public

    A dataset consists of 6,387 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 666 jailbreak prompts).

    9

  4. Conversation_Reconstruction_Attack Conversation_Reconstruction_Attack Public

    This is the public code repository for the paper 'Reconstruct Your Previous Conversations! Comprehensively Investigating Privacy Leakage Risks in Conversations with GPT Models'

    Python 9

  5. importance-in-mlattacks importance-in-mlattacks Public

    Python 8

  6. SecurityNet SecurityNet Public

    JavaScript 7

Repositories

Showing 10 of 21 repositories
  • synthetic_artifact_auditing Public

    [Usenix Security 2025] Synthetic Artifact Auditing: Tracing LLM-Generated Synthetic Data Usage in Downstream Applications

    TrustAIRLab/synthetic_artifact_auditing’s past year of commit activity
    Python 1 Apache-2.0 0 0 0 Updated Jan 29, 2025
  • proactive_unsafe_generation Public

    [Usenix Security 2025] On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts

    TrustAIRLab/proactive_unsafe_generation’s past year of commit activity
    Python 1 Apache-2.0 0 0 0 Updated Jan 29, 2025
  • HateBench Public

    [USENIX'25] HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns

    TrustAIRLab/HateBench’s past year of commit activity
    4 Apache-2.0 0 0 0 Updated Jan 28, 2025
  • TrustAIRLab/Hateful_Memes_in_VLM’s past year of commit activity
    0 Apache-2.0 0 0 0 Updated Jan 28, 2025
  • Conversation_Reconstruction_Attack Public

    This is the public code repository for the paper 'Reconstruct Your Previous Conversations! Comprehensively Investigating Privacy Leakage Risks in Conversations with GPT Models'

    TrustAIRLab/Conversation_Reconstruction_Attack’s past year of commit activity
    Python 9 0 0 0 Updated Jan 12, 2025
  • ModSCAN Public

    An official public repository of the paper "ModSCAN: Measuring Stereotypical Bias in Large Vision-Language Models from Vision and Language Modalities" (https://arxiv.org/abs/2410.06967).

    TrustAIRLab/ModSCAN’s past year of commit activity
    Python 2 MIT 1 0 0 Updated Jan 8, 2025
  • ICL-MIA Public
    TrustAIRLab/ICL-MIA’s past year of commit activity
    Python 3 0 1 0 Updated Dec 19, 2024
  • TrustAIRLab/importance-in-mlattacks’s past year of commit activity
    Python 8 0 0 0 Updated Dec 18, 2024
  • TrustAIRLab/Comprehensive_Jailbreak_Assessment’s past year of commit activity
    Python 73 10 0 1 Updated Oct 31, 2024
  • SecurityNet Public
    TrustAIRLab/SecurityNet’s past year of commit activity
    JavaScript 7 MIT 0 1 0 Updated Oct 30, 2024

Top languages

Loading…

Most used topics

Loading…