AI Generated Text Detection using LORA based fine tuning of LLM Models

Overview

This project addresses SemEval 2024 Task 8 Subtask A, which involves binary classification of text as either human-written or machine-generated. It leverages transformer-based models—RoBERTa, BART, and GPT-2—and applies parameter-efficient fine-tuning with LoRA to improve classification accuracy and computational efficiency. The system is tested on both familiar and unfamiliar domains using the official SemEval dataset and demonstrates strong generalization across varying text sources and styles.

Notably, RoBERTa fine-tuned with LoRA achieved an F1-score of 0.96 on AI-generated content in the familiar domain, while BART with LoRA performed best in the unfamiliar domain with an F1-score of 0.88, demonstrating strong generalization and adaptability across varying contexts.

Block Diagram

Features

Binary classification of text (Human vs AI-generated)
Zero-shot evaluation using RoBERTa, BART, GPT-2, and GPT-4o
Parameter-efficient fine-tuning using LoRA
Performance analysis across familiar and unfamiliar domains
Balanced training using undersampling and structured preprocessing
Domain generalization evaluation on OUTFOX dataset and GPT-4 generated content
Results for the models using Adam Optimizer:
Results for the models using LoRA:

Dataset

The dataset used for this project is the official SemEval 2024 Task 8 Subtask A Monolingual dataset: https://drive.google.com/drive/folders/1CAbb3DjrOPBNm0ozVBfhvrEh9P9rAppc

Tech Stack

Language: Python 3.9+
Frameworks & Libraries:
- PyTorch
- Hugging Face Transformers
- PEFT (Parameter-Efficient Fine-Tuning)
- Datasets (Hugging Face)
- scikit-learn
- Accelerate
Platform: Google Colab Pro (with T4, L4, and A100 GPUs)
Hardware Requirements:
- Minimum 16 GB RAM
- 5 GB disk space
- At least 16 GB GPU memory for LoRA-based fine-tuning

Running Tests

The models can be tested using historical Bitcoin price data to verify prediction accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
BART_LoRA.ipynb		BART_LoRA.ipynb
GPT2_LoRA.ipynb		GPT2_LoRA.ipynb
README.md		README.md
RoBERTa_LoRA.ipynb		RoBERTa_LoRA.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Generated Text Detection using LORA based fine tuning of LLM Models

Overview

Block Diagram

Features

Dataset

Tech Stack

Running Tests

About

Uh oh!

Releases

Packages

Languages

anjalichennupati/AI_Generated_Text_Detection_using_LORA_based_fine_tuning_of_LLM_Models

Folders and files

Latest commit

History

Repository files navigation

AI Generated Text Detection using LORA based fine tuning of LLM Models

Overview

Block Diagram

Features

Dataset

Tech Stack

Running Tests

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages