This tool converts geospatial imagery (GeoTIFF files) and OpenStreetMap (OSM) building data into the COCO (Common Objects in Context) format, enabling the creation of computer vision datasets for building detection and classification tasks. The converter bridges the gap between geospatial data and machine learning workflows by transforming geographic coordinates into pixel-based annotations.
- Multi-format Image Support: Processes GeoTIFF, TIFF, JPG, JPEG, and PNG images
- Coordinate Transformation: Automatically converts geographic coordinates (latitude/longitude) to pixel coordinates using geospatial transformations
- Building Classification: Categorizes buildings based on OSM building types (residential, commercial, industrial, etc.)
- Polygon Annotation: Creates precise segmentation masks for each building footprint
- Bounding Box Generation: Automatically generates bounding boxes around building polygons
- Image Format Conversion: Converts TIFF images to JPG format with configurable quality settings
- Spatial Filtering: Only includes buildings that intersect with image boundaries
- Area Filtering: Removes small polygons below a minimum area threshold (default: 10 pixels)
- Coordinate Clipping: Ensures all annotations stay within image bounds
- Data Validation: Validates COCO structure using pycocotools
- COCO JSON Dataset: Standard COCO format compatible with popular ML frameworks
- Annotation Mapping: Detailed mapping file linking annotations to original data sources
- Metadata Preservation: Maintains image metadata and geospatial information
- Ready-to-Use Datasets: Eliminates the manual annotation process for building detection tasks
- Scalable Processing: Handles large datasets with hundreds of images and thousands of building polygons
- Framework Compatibility: Works seamlessly with PyTorch, TensorFlow, and other ML frameworks that support COCO format
- Bridge GIS and ML: Connects traditional GIS workflows with modern machine learning approaches
- Automated Workflow: Reduces manual processing time from days to hours
- Accuracy Preservation: Maintains spatial accuracy through proper coordinate transformations
- Building Analysis: Enables automated building detection and classification from satellite/aerial imagery
- Change Detection: Facilitates temporal analysis of urban development
- Large-Scale Mapping: Supports city-wide or regional building inventory projects
rasterio>=1.3.0
pandas>=1.5.0
numpy>=1.21.0
shapely>=1.8.0
pyproj>=3.4.0
opencv-python>=4.6.0
Pillow>=9.0.0
pycocotools>=2.0.4
osmnx>=1.2.0
geopandas>=0.11.0
# Install required packages
pip install -r requirements.txt
# For QGIS integration (optional)
apt-get update
apt-get install qgis python3-qgis
pip install GDAL
# Basic usage
python geo_to_coco.py --images /path/to/images --csv buildings.csv --output dataset.json
# With custom quality settings
python geo_to_coco.py --images /path/to/images --csv buildings.csv --output dataset.json --quality 90
# Skip TIFF conversion
python geo_to_coco.py --images /path/to/images --csv buildings.csv --output dataset.json --no-convert
from geo_to_coco import GeoTiffToCoco
# Initialize converter
converter = GeoTiffToCoco(
images_folder_path="/path/to/images",
csv_path="/path/to/buildings.csv",
min_area=10 # Minimum polygon area in pixels
)
# Create COCO dataset
coco_dataset = converter.create_coco_dataset(
output_path="output.json",
convert_tiff_to_jpg=True,
jpg_quality=90
)
- Format: GeoTIFF, TIFF, JPG, JPEG, or PNG
- Geospatial Information: Must contain coordinate reference system (CRS) and transformation matrix
- Recommended Resolution: 640x640 pixels for optimal performance
- Coordinate System: Any projected coordinate system (automatically handled)
- Required Columns:
building
: Building type classification (e.g., "residential", "commercial")geometry
: WKT (Well-Known Text) polygon geometries
- Coordinate System: WGS84 (EPSG:4326) recommended
- Data Source: Typically exported from OSM or other GIS databases
- Images: Metadata for each image including dimensions and file paths
- Annotations: Building polygons with segmentation masks, bounding boxes, and categories
- Categories: Building type classifications with unique IDs
- Metadata: Dataset information, creation date, and versioning
- Traceability: Links each annotation back to original CSV row
- Quality Metrics: Area calculations and validation flags
- Debugging Support: Facilitates troubleshooting and data quality assessment
- Batch Processing: Handles multiple images simultaneously
- Memory Optimization: Processes images individually to manage memory usage
- Parallel Processing: Can be extended for multi-core processing
- Input: Varies based on image size and format
- Output: JSON files are typically 1-10MB for city-scale datasets
- Temporary: Additional space needed for TIFF to JPG conversion
- Building Detection: Train models to identify buildings in satellite imagery
- Urban Morphology: Analyze building patterns and urban structure
- Change Detection: Monitor urban development over time
- Property Assessment: Automated building inventory and valuation
- Insurance: Risk assessment based on building types and density
- Urban Planning: Support development planning and zoning decisions
- Census Support: Building counting and classification for demographic analysis
- Emergency Planning: Building inventory for disaster response planning
- Tax Assessment: Automated property identification and classification
The tool includes built-in validation to ensure data quality:
- Coordinate Validation: Verifies all coordinates fall within image bounds
- Polygon Validation: Checks for valid polygon geometries
- COCO Compliance: Validates output against COCO format specifications
- Orphaned Annotation Detection: Identifies and reports annotation issues
This comprehensive tool transforms complex geospatial data into machine learning-ready formats, enabling researchers and practitioners to leverage building footprint data for computer vision applications at scale.
This interactive script helps you tile large GeoTIFF images into smaller, uniform tiles suitable for machine learning datasets. It's particularly useful for preparing satellite or aerial imagery for computer vision tasks.
First, ensure GDAL tools are installed:
# Ubuntu/Debian
sudo apt-get update
sudo apt-get install gdal-bin
# CentOS/RHEL
sudo yum install gdal
# macOS
brew install gdal
# Verify installation
gdal_retile.py --help
# Make the script executable
chmod +x gdal_retile.sh
Simply run the script and follow the prompts:
./gdal_retile.sh
The script will guide you through:
- Input GeoTIFF file selection
- Output directory setup
- Tile size configuration
- Overlap settings
- Optional CSV metadata generation
For automation or batch processing:
# Basic usage
./gdal_retile.sh -i input.tif -o tiles/
# With custom settings
./gdal_retile.sh -i input.tif -o tiles/ -s 512 -v 32 --csv metadata.csv
# Batch mode (non-interactive)
./gdal_retile.sh --batch -i input.tif -o tiles/ -s 640 -v 64
Parameter | Description | Default |
---|---|---|
-i, --input |
Input GeoTIFF file | Required |
-o, --output |
Output directory for tiles | Required |
-s, --size |
Tile size in pixels | 640 |
-v, --overlap |
Overlap size in pixels | 64 |
--csv |
CSV metadata file name | Optional |
--batch |
Run in non-interactive mode | false |
-h, --help |
Show help message | - |
./gdal_retile.sh -i satellite_image.tif -o tiles/
This creates 640x640 pixel tiles with 64 pixel overlap.
./gdal_retile.sh -i aerial_photo.tif -o custom_tiles/ -s 512 -v 32
Creates 512x512 pixel tiles with 32 pixel overlap.
./gdal_retile.sh -i large_image.tif -o tiles/ --csv tile_metadata.csv
Generates tiles and saves metadata information to CSV file.
# Process multiple files
for file in *.tif; do
./gdal_retile.sh --batch -i "$file" -o "tiles_${file%.*}/"
done
After running the script, you'll have:
output_directory/
├── tile_1_1.tif
├── tile_1_2.tif
├── tile_2_1.tif
├── tile_2_2.tif
└── ...
If CSV metadata is enabled:
metadata.csv # Contains tile information and coordinates
- Input: Large satellite imagery (e.g., 10000x10000 pixels)
- Output: Manageable tiles for ML training
- Settings: 640x640 tiles, 64px overlap
- Input: High-resolution aerial photos
- Output: Tiles for object detection
- Settings: 512x512 tiles, 32px overlap
- Input: City-wide imagery
- Output: Tiles for building detection
- Settings: 640x640 tiles, 64px overlap + metadata
- 640x640: Good for most ML applications
- 512x512: Faster processing, less memory usage
- 1024x1024: Better for large object detection
- 64 pixels: Recommended for object detection
- 32 pixels: Sufficient for classification tasks
- 128 pixels: Better for large objects
- Larger tiles require more memory
- Start with smaller tiles if you encounter memory issues
- Monitor disk space for large datasets
This script works perfectly with the GeoTIFF to COCO converter:
# 1. Tile your large image
./gdal_retile.sh -i large_image.tif -o tiles/
# 2. Convert to COCO format
python geo_to_coco.py --images tiles/ --csv buildings.csv --output dataset.json
-
Permission Denied
chmod +x gdal_retile.sh
-
GDAL Not Found
- Install GDAL tools (see Installation section)
- Check PATH environment variable
-
Insufficient Disk Space
- Tiling can create many files
- Ensure adequate disk space (typically 2-3x original file size)
-
Memory Issues
- Reduce tile size
- Process smaller sections of the image
File does not exist
: Check input file pathDirectory not writable
: Check output directory permissionsInvalid tile size
: Ensure size is a positive integerGDAL command failed
: Check GDAL installation and input file format
For issues or questions:
- Check the troubleshooting section
- Review GDAL documentation
- Open an issue on the GitHub repository
- GeoTIFF to COCO Converter: Convert tiled images to ML datasets
- QGIS: Visualize and prepare geospatial data
- OSMnx: Extract building data from OpenStreetMap