Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements
Shu Yang*, Shenzhe Zhu*, Zeyu Wu, Keyu Wang, Junchi Yao, Junchao Wu, Lijie Hu, Mengdi Li, Derek F. Wong, Di Wang†
(*Contribute equally, †Corresponding author)
🤗 Dataset | 📜 Project Page | 📝 arxiv
❗️Content Warning: This repo contains examples of harmful language.
- 2025/02/16: ❗️We have released our evaluation code.
- 2025/02/16: ❗️We have released our dataset.
conda create -n fraud python=3.10
conda activate fraud
pip install -r requirements.txt
#please config your model api as in ./utils/config.py
OPENAI_KEYS = ["your tokens"]
ZHI_KEYS = ["your tokens"]
ZHI_URL = "your url"
OHMYGPT_KEYS = ["your tokens"]
OHMYGPT_URL = "your url"
# In here, we use Helpful Assistant task as an example
nohup bash script/multi-round-level_attack/assistant.sh >assistant.out
# In here, we use Helpful Assistant task as an example
nohup bash script/multi-round-dsr.sh >eval.out
cd ./results
We introduce Fraud-R1, a benchmark designed to evaluate LLMs’ ability to defend against internet fraud and phishing in dynamic, real-world scenarios. Fraud-R1 comprises 8,564 fraud cases sourced from phishing scams, fake job postings, social media, and news, categorized into 5 major fraud types. Unlike previous benchmarks, Fraud-R1 introduces a multi-round evaluation pipeline to assess LLMs’ resistance to fraud at different stages, including credibility building, urgency creation, and emotional manipulation. Furthermore, we evaluate 15 LLMs under two settings: (i) Helpful-Assistant, where the LLM provides general decision-making assistance, and (ii) Role-play, where the model assumes a specific persona, widely used in real-world agent-based interactions. Our evaluation reveals the significant challenges in defending against fraud and phishing inducement, especially in role-play settings and fake job postings. Additionally, we observe a substantial performance gap between Chinese and English, underscoring the need for improved multilingual fraud detection capabilities.
An overview of Fraud-R1 evaluation flow. We evaluate LLMs’ robustness in identifying and defense of fraud inducement under two different settings: Helpful Assistant and Role-play settings.
Our process begins with real-world fraud cases sourced from multiple channels. We then extract key Fraudulent Strategies and Fraudulent Intentions from these cases. Next, we employ Deepseek-R1
to generate fraudulent messages, emails, and posts, which are subsequently filtered to form ourbasedata (Base Dataset). Finally, through a multi-stage refinement process, we construct ourlevelupdatset (Level-up Dataset) to enable robust evaluation of LLMs against increasingly sophisticated fraudulent scenarios.
Statistics | Information |
---|---|
Total dataset size | 8564 |
Data split | Base (25%) / Levelup (75%) |
Languages | Chinese (50%) / English (50%) |
Fraudulent Service | 28.04% |
Impersonation | 28.04% |
Phishing Scam | 22.06% |
Fake Job Posting | 14.02% |
Online Relationship | 7.84% |
Average token length | 273.92 tokens |
FP-base: FP-base is directly generated by a state-of-the-art reasoning LLM from our selected real-world fraud cases
FP-levelup: FP-levelup is a rule-based augmentation of the base dataset, designed for multi-round dialogue setting.
Following is the step-by-step augmented fraud of 4 levels, including FP-base and FP-levelup(Building Credibility, Creating Urgency, Exploiting Emotional Appeal).
Following is the Overall Model Performance on Fraud-R1 : The DSR% column represents the Defense Success Rate, while the DFR% column represents the Defense Failure Rate. Note: for model wise, DSR% = 100% - DFR%.
The detailed DSR(%) on 15 models. Bold values indicate the highest score in each column within API-based or Open-weight models, and underlined values represent the second highest score within the same category. "OD" stands for the overall DSR of models. "AS" and "RP" represent the model performance on Helpful Assistant and Role-play tasks, respectively. We use “R1-Llama-70B” as a shorthand for “Deepseek-R1-Distill-Llama-70B”.
This dataset includes offensive content that some may find disturbing. It is intended solely for educational and research use.
- Shu Yang: shu.yang@kaust.edu.sa
- Shenzhe Zhu: cho.zhu@mail.utoronto.ca
@misc{yang2025fraudr1,
title={Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements},
author={Shu Yang and Shenzhe Zhu and Zeyu Wu and Keyu Wang and Junchi Yao and Junchao Wu and Lijie Hu and Mengdi Li and Derek F. Wong and Di Wang},
year={2025},
eprint={2502.12904},
archivePrefix={arXiv},
primaryClass={cs.CL}
}