Pairing Security Advisories with Vulnerable Functions using Open-Source LLMs

This repository contains the code and data of our research on pairing security advisories with vulnerable functions using open-source Large Language Models (LLMs).

Requirements

GPU Support: The research codebase is currently for GPUs, specifically tested on two NVIDIA RTX 3090 TIs. Future updates will include support for non-GPU environments.
Storage: Model cloning requires over 100 GB of storage space.

Setup Instructions

Environment Configuration

Workspace Setup: Update the .env file with your workspace directory where you prefer to save files. Also allows for parsing GitHub data and loading data to WANDB if you prefer:
```
WORKSPACE_FOLDER={YOUR_WORKSPACE}
PYTHONPATH=${WORKSPACE_FOLDER}/
GITHUB_TOKEN={YOUR_TOKEN}
GITHUB_USERNAME={YOUR_USERNMAE}
WANDB_KEY={YOUR_WANDB_KEY}
```
YAML Configuration: Modify paths in ./code/llm/cfgs/sample_config.yaml to point to your directories. This configuration file is essential for driving the LLM.
```
# YAML Configuration Parameters
paths:
  base: {UPDATE THESE PATHS}
```

Model Cloning

Clone the following models into a designated model directory, as specified in your YAML configuration. Note the significant storage requirements.

git clone git@hf.co:codellama/CodeLlama-7b-Instruct-hf &&
git clone git@hf.co:codellama/CodeLlama-13b-Instruct-hf &&
git clone git@hf.co:codellama/CodeLlama-34b-Instruct-hf &&
git clone git@hf.co:deepseek-ai/deepseek-coder-33b-instruct &&
git clone git@hf.co:mistralai/Mixtral-8x7B-Instruct-v0.1 &&
git clone git@hf.co:WizardLM/WizardCoder-15B-V1.0

Configuration Updates

Ensure the model paths are correctly set in the configuration:

models:
  base: {UPDATE THESE PATHS}

Data

Extract the CSVs in /data/patchparser-data.tar.gz:

mkdir ./data/patchparser-data
tar -xzf ./data/patchparser-data.tar.gz -C ./data/patchparser-data/

This will create:

$ ls ./data/patchparser-data
govulndb-cot-examples-fp-2023-10-31.csv  
govulndb-cot-examples-tp-2023-10-31.csv  
patchparser-data-2023-10-31.csv

Environment Setup

Create and activate a Python virtual environment:

python3 -m venv venv
source venv/bin/activate

Install required Python packages:

pip3 install -r requirements.txt

Execution

To execute the CodeLlama 34b model in a few-shot setting and observe results:

python3 ./code/llm/llm_driver.py sample_config

Target Models

Our research uses the following models:

Contact

For questions please feel free to open an issue: GitHub Issues

Cite

@InProceedings{DunlapLLM2024,
  title = {Pairing Security Advisories with Vulnerable Functions Using Open-Source LLMs},
  ISBN = {9783031641718},
  ISSN = {1611-3349},
  url = {http://dx.doi.org/10.1007/978-3-031-64171-8_18},
  DOI = {10.1007/978-3-031-64171-8_18},
  booktitle = {Lecture Notes in Computer Science},
  publisher = {Springer Nature Switzerland},
  author = {Dunlap,  Trevor and Meyers,  John Speed and Reaves,  Bradley and Enck,  William},
  year = {2024},
  pages = {350–369}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
code		code
data		data
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pairing Security Advisories with Vulnerable Functions using Open-Source LLMs

Requirements

Setup Instructions

Environment Configuration

Model Cloning

Configuration Updates

Data

Environment Setup

Execution

Target Models

Contact

Cite

About

Releases

Packages

Languages

License

s3c2/llm-vulnerable-functions

Folders and files

Latest commit

History

Repository files navigation

Pairing Security Advisories with Vulnerable Functions using Open-Source LLMs

Requirements

Setup Instructions

Environment Configuration

Model Cloning

Configuration Updates

Data

Environment Setup

Execution

Target Models

Contact

Cite

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages