This project implements real-time sign language recognition using MediaPipe Hands for hand landmark detection and an MLP (Multi-Layer Perceptron) model for character classification. Additionally, we experimented with MobileNetV2 for sign recognition but found that MediaPipe-based MLP performs better in real-time scenarios. The system captures hand gestures from a webcam, extracts landmark features, and predicts sign language letters, dynamically forming words and sentences.
📂 Sign Language Recognition System/
│── 📂 SIGN_TO_SENTENCE_PROJECT/
│ │── 📂 Asl_Sign_Data/ # Raw ASL dataset
│ │── 📄 asl_mediapipe_keypoints_dataset.csv # Preprocessed dataset for MLP model
│ │── 📄 asl_mediapipe_mlp_model.h5 # Trained MLP model
│ │── 📄 sign_language_model_MobileNetV2.h5 # Trained MobileNetV2 model
│ │── 📄 Combined_Architecture.ipynb # Hybrid model experiments
│ │── 📄 LLM.ipynb # Language Model Integration
│ │── 📄 Mediapipe_Training.ipynb # Training script for MLP model
│ │── 📄 MobileNetV2_Training.ipynb # Training script for MobileNetV2
│ │── 📄 concluion.txt # Summary of results
│ │── 📄 requirements.txt # Required dependencies
- The dataset used for training was obtained from Kaggle ASL Sign Language Dataset.
- It contains hand gesture images labeled with ASL characters.
- For MobileNetV2, we used raw images.
- For MLP (MediaPipe), we extracted landmark keypoints from each image and stored them in CSV format.
- We trained MobileNetV2 on raw images for sign classification.
- However, it struggled with real-time sign recognition, leading us to explore MediaPipe-based MLP.
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
# Load pre-trained MobileNetV2 model
base_model = MobileNetV2(input_shape=(224, 224, 3), include_top=False, weights='imagenet')
# Add custom classification layers
model = Sequential([
base_model,
Flatten(),
Dense(256, activation='relu'),
Dense(num_classes, activation='softmax')
])
# Compile model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
- Extracted hand landmark coordinates using MediaPipe Hands.
- Trained a MLP model on the extracted landmark features.
- This method proved to be faster and more reliable for real-time recognition.
import mediapipe as mp
import numpy as np
import tensorflow as tf
# Load the trained MLP model
mlp_model = tf.keras.models.load_model("asl_mediapipe_mlp_model.h5")
# Initialize MediaPipe Hands
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(min_detection_confidence=0.7, min_tracking_confidence=0.7)
# Predict a sign using MediaPipe landmarks
def predict_sign(landmarks):
input_data = np.array(landmarks).flatten().reshape(1, -1)
prediction = mlp_model.predict(input_data)
return np.argmax(prediction)
pip install -r requirements.txt
To test the output of individual models, run the last cell in:
Mediapipe_Training.ipynb
for MLP model evaluation.MobileNetV2_Training.ipynb
for MobileNetV2 evaluation.
To see the working of both MobileNetV2 and MediaPipe integrated, run:
jupyter notebook Combined_Architecture.ipynb
- Normal Signs → Letters are appended to the sentence.
- SPACE Sign → Adds a space.
- DELETE Sign → Removes the last character.
- NOTHING → No input detected.
- MobileNet did not perform well on real-time images, so we moved to MediaPipe-based MLP.
- Next Phase → Building a FastAPI backend (
SignConnect-Backend
) for better integration and mobile app support.
As the next phase of development, we aim to implement Text-to-Sign Language Actions, allowing users to input text that gets translated into sign language animations. Possible technologies we will explore:
- AI-generated 3D avatars to perform sign language gestures.
- Computer Vision & Reinforcement Learning to map text to sign movements.
- Deep Learning models to generate smooth sign transitions.
We welcome contributions from the community for this phase! If you're interested in helping develop Text-to-Sign Language Generation, feel free to open an issue or submit a pull request on our GitHub repository.
- Uses MediaPipe Hands for landmark detection.
- Model trained using TensorFlow & Scikit-Learn.
- Inspired by existing research on gesture recognition & sign language AI.
This project is licensed under the MIT License - see the LICENSE file for details.