TrustAIRLab

All

21 repositories

synthetic_artifact_auditing
Public
[Usenix Security 2025] Synthetic Artifact Auditing: Tracing LLM-Generated Synthetic Data Usage in Downstream Applications
synthetic-data synthetic-dataset-generation llm synthetic-artifact-auditing
Python
•
Apache License 2.0
•0•1•0•0•Updated Jan 29, 2025Jan 29, 2025
proactive_unsafe_generation
Public
[Usenix Security 2025] On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts
poisoning-attacks text-to-image-generation unsafe-image
Python
•
Apache License 2.0
•0•1•0•0•Updated Jan 29, 2025Jan 29, 2025
HateBench
Public
[USENIX'25] HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns
hatespeech hatespeech-detection llm
Apache License 2.0
•0•4•0•0•Updated Jan 28, 2025Jan 28, 2025
Hateful_Memes_in_VLM
Public
Apache License 2.0
•0•0•0•0•Updated Jan 28, 2025Jan 28, 2025
Conversation_Reconstruction_Attack
Public
This is the public code repository for the paper 'Reconstruct Your Previous Conversations! Comprehensively Investigating Privacy Leakage Risks in Conversations with GPT Models'
Python
•0•9•0•0•Updated Jan 12, 2025Jan 12, 2025
ModSCAN
Public
An official public repository of the paper "ModSCAN: Measuring Stereotypical Bias in Large Vision-Language Models from Vision and Language Modalities" (https://arxiv.org/abs/2410.06967).
Python
•
MIT License
•1•2•0•0•Updated Jan 8, 2025Jan 8, 2025
ICL-MIA
Public
Python
•0•3•1•0•Updated Dec 19, 2024Dec 19, 2024
importance-in-mlattacks
Public
Python
•0•8•0•0•Updated Dec 18, 2024Dec 18, 2024
Comprehensive_Jailbreak_Assessment
Public
Python
•10•73•0•1•Updated Oct 31, 2024Oct 31, 2024
SecurityNet
Public
JavaScript
•
MIT License
•0•7•1•0•Updated Oct 30, 2024Oct 30, 2024
ZeroFake
Public
Python
•0•7•1•0•Updated Oct 30, 2024Oct 30, 2024
homepage
Public
JavaScript
•
MIT License
•0•0•0•0•Updated Oct 14, 2024Oct 14, 2024
T2I_Model_Evolution
Public
MIT License
•0•0•0•0•Updated Aug 28, 2024Aug 28, 2024
ML-Doctor
Public
Code for ML Doctor
Python
•
MIT License
•0•5•0•0•Updated Aug 14, 2024Aug 14, 2024
VoiceJailbreakAttack
Public
Code for Voice Jailbreak Attacks Against GPT-4o.
Python
•
MIT License
•0•27•1•0•Updated May 31, 2024May 31, 2024
easy-bib
Public
TeX
•
MIT License
•1•5•0•1•Updated Mar 9, 2024Mar 9, 2024
.github
Public
0•0•0•0•Updated Feb 28, 2024Feb 28, 2024
Label-Only-MIA
Public
Python
•
MIT License
•0•5•0•0•Updated Feb 23, 2024Feb 23, 2024
JailbreakLLMs
Public
A dataset consists of 6,387 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 666 jailbreak prompts).
MIT License
•0•9•0•0•Updated Feb 21, 2024Feb 21, 2024
Link-Stealing-Attack
Public
Python
•0•2•0•0•Updated Feb 21, 2024Feb 21, 2024
MGTBench
Public
Python
•
MIT License
•0•6•0•0•Updated Feb 21, 2024Feb 21, 2024