Skip to content

The official repository for the paper "VulScribeR: Exploring RAG-based Vulnerability Augmentation with LLMs".

Notifications You must be signed in to change notification settings

shayandaneshvar/VulScribeR

Repository files navigation

VulScribeR

Official repository for our paper:

VulScribeR: Exploring RAG-based Vulnerability Augmentation with LLMs

If you find this project useful in your research, please consider citing:

@article{daneshvar2024exploringragbasedvulnerabilityaugmentation,
      title={Exploring RAG-based Vulnerability Augmentation with LLMs}, 
      author={Seyed Shayan Daneshvar and Yu Nong and Xu Yang and Shaowei Wang and Haipeng Cai},
      year={2024},
      eprint={2408.04125},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2408.04125}, 
}

Datasets

Primary Datasets

Bigvul_train,
Bigvul test,
Bigvul_val

Reveal,
Devign,
PrimeVul (RQ4 only)

VGX and Vulgen (used as baselines)

VGX Full dataset,
Vulgen Full dataset from VGX paper

Retriever's output

All pair matching (except for RQ4), including for mutation and random ones for RQ2
RQ4's pair matching/retriver output

Our Generated Vulnerable Samples

Filtered Datasets for RQs(1-3),
Unfiltered Datasets for RQs(1-3),
Unfiltered Datasets for RQ4

The unfiltered dataset contains samples from the Generator and hasn't gone through the Verification phase. They also include extra metadata that shows which clean_vul pair was used for generation, plus the vul lines.

How to use?

See here

How to train DLVD models

Go to the models directory, the readme for each model explains how to use each of the models

About

The official repository for the paper "VulScribeR: Exploring RAG-based Vulnerability Augmentation with LLMs".

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published