🤖 SmolVLM Real-Time Webcam Demo with MLX-VLM

A real-time webcam application powered by SmolVLM (Small Vision Language Model) running on Apple Silicon using MLX-VLM. This application provides a simple web interface where you can analyze webcam footage in real-time using AI vision language models(mlx-community/SmolVLM-Instruct-4bit).

This repository features a simple demo of real-time object detection using MLX-VLM with mlx-community/SmolVLM-Instruct-4bit, optimized for M1 MacBook Pro.

For improved output quality, you can switch to SmolVLM-Instruct-8bit, though it may require a faster Apple Silicon chip for faster performance.

✨ Features

🎥 Real-time Webcam Analysis - Capture and analyze webcam frames instantly
🧠 SmolVLM Integration - Powered by efficient SmolVLM models via MLX-VLM
🌐 Web Interface - Simple, responsive web UI with modern design
⚡ Real-time Processing - Fast inference on Apple Silicon devices
🎛️ Customizable Settings - Adjust prompts, temperature, tokens, and auto-analysis
📱 Mobile Friendly - Responsive design works on various screen sizes
🔄 Auto Analysis - Optional automatic frame analysis at set intervals

🚀 Quick Start

Prerequisites

Apple Silicon Mac (M1, M2, M3, or newer)
Python 3.10+
Webcam or camera access

Installation

Clone or download the application

# If you have the file locally, navigate to the directory
cd /path/to/your/mlx-projects

Install dependencies

# Install MLX-VLM (the key dependency)
pip install mlx-vlm

# Install web server dependencies
pip install flask flask-socketio

# Install image processing
pip install pillow

Run the application

python mlx_smolvlm_webcam.py --model mlx-community/SmolVLM-Instruct-4bit --port 8080

Open your browser
- Navigate to: http://localhost:8080
- Click "Start Camera" to enable webcam
- Click "📸 Analyze Frame" to get AI descriptions

📋 Requirements

Essential Dependencies

pip install mlx-vlm flask flask-socketio pillow

Supported Models

mlx-community/SmolVLM-Instruct-4bit (default, recommended)
mlx-community/SmolVLM-Instruct
Other SmolVLM models from mlx-community

🎯 Usage

Basic Usage

python mlx_smolvlm_webcam.py --model mlx-community/SmolVLM-Instruct-4bit

Advanced Options

python mlx_smolvlm_webcam.py \
  --model mlx-community/SmolVLM-Instruct-4bit \
  --host 127.0.0.1 \
  --port 8080 \
  --debug

Command Line Arguments

--model: HuggingFace model ID (default: mlx-community/SmolVLM-Instruct-4bit)
--host: Server host (default: 127.0.0.1)
--port: Server port (default: 8080)
--debug: Enable debug mode

🎛️ Web Interface Features

Camera Controls

Start Camera: Enable webcam access
📸 Analyze Frame: Capture and analyze current frame
⏸️ Pause/▶️ Resume: Toggle camera feed

Settings Panel

Custom Prompt: Customize what you want the AI to describe
Max Tokens: Control response length (5-50)
Temperature: Adjust creativity/randomness (0.1-1.0)
Auto Analyze: Automatic analysis every .5/1/1.5/2/2.5/3/5/10 seconds or Manual

Example Prompts

"Describe what you see in this image in detail"
"What objects are visible in this scene?"
"Analyze the emotions and expressions of people in this image"
"Describe the lighting and composition of this scene"
"What activities are taking place in this image?"

🔧 Troubleshooting

Common Issues

"Module not found: flask_socketio"

pip install flask-socketio

"Model type idefics3 not supported"

Make sure you're using mlx-vlm not mlx-lm

pip uninstall mlx-lm
pip install mlx-vlm

"Port already in use"

# Use a different port
python mlx_smolvlm_webcam.py --port 8080

Camera permission denied

Allow camera access in your browser
Check System Preferences > Security & Privacy > Camera

Model loading fails

# Clear HuggingFace cache and retry
rm -rf ~/.cache/huggingface/
python mlx_smolvlm_webcam.py --model mlx-community/SmolVLM-Instruct-4bit

Performance Tips

Use 4-bit models for faster inference:

--model mlx-community/SmolVLM-Instruct-4bit

Adjust image size - App automatically resizes to 512px max dimension
Lower max tokens for faster responses
Use auto-analyze sparingly to avoid overwhelming the model

🏗️ Architecture

Backend: Flask + SocketIO for real-time communication
Frontend: Modern HTML5 + JavaScript with WebSocket
AI Model: SmolVLM via MLX-VLM for Apple Silicon optimization
Image Processing: PIL for image handling and resizing

🎨 Features in Detail

Real-time Analysis

The application captures webcam frames and sends them to SmolVLM for analysis. The AI provides detailed descriptions of what it sees, including objects, people, activities, and scenes.

Modern Web Interface

Gradient backgrounds and modern CSS
Responsive design for different screen sizes
Real-time status indicators
Smooth animations and transitions

Flexible Configuration

Adjustable AI parameters (temperature, max tokens)
Custom prompts for specific use cases
Auto-analysis for continuous monitoring

📝 Example Outputs

Scene Description:

"I can see a person sitting at a desk with a laptop computer. There are books and papers scattered on the desk, and a window with natural lighting in the background. The person appears to be working or studying."

Object Detection:

"In this image, I can identify: a laptop computer, several books, a coffee mug, a desk lamp, and a potted plant on the windowsill."

🤝 Contributing

Feel free to submit issues, feature requests, or pull requests to improve this application.

📄 License

This project is open source. Please check individual dependencies for their respective licenses.

🙏 Acknowledgments

SmolVLM: HuggingFace's efficient vision language model
MLX: Apple's machine learning framework for Apple Silicon
MLX-VLM: MLX integration for vision language models
Inspired by: https://github.com/ngxson/smolvlm-realtime-webcam

Enjoy analyzing the world through AI! 🎉

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
mlx-vlm-demo.png		mlx-vlm-demo.png
mlx_smolvlm_webcam.py		mlx_smolvlm_webcam.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🤖 SmolVLM Real-Time Webcam Demo with MLX-VLM

✨ Features

🚀 Quick Start

Prerequisites

Installation

📋 Requirements

Essential Dependencies

Supported Models

🎯 Usage

Basic Usage

Advanced Options

Command Line Arguments

🎛️ Web Interface Features

Camera Controls

Settings Panel

Example Prompts

🔧 Troubleshooting

Common Issues

Performance Tips

🏗️ Architecture

🎨 Features in Detail

Real-time Analysis

Modern Web Interface

Flexible Configuration

📝 Example Outputs

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

davepoon/mlx-vlm-smolvlm-realtime-webcam

Folders and files

Latest commit

History

Repository files navigation

🤖 SmolVLM Real-Time Webcam Demo with MLX-VLM

✨ Features

🚀 Quick Start

Prerequisites

Installation

📋 Requirements

Essential Dependencies

Supported Models

🎯 Usage

Basic Usage

Advanced Options

Command Line Arguments

🎛️ Web Interface Features

Camera Controls

Settings Panel

Example Prompts

🔧 Troubleshooting

Common Issues

Performance Tips

🏗️ Architecture

🎨 Features in Detail

Real-time Analysis

Modern Web Interface

Flexible Configuration

📝 Example Outputs

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages