Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
-
Updated
Aug 14, 2025 - Python
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Turns Data and AI algorithms into production-ready web applications in no time.
The Data Engineering Cookbook
An orchestration platform for the development, production, and observation of data assets.
Always know what to expect from your data.
🐚 Python-powered shell. Full-featured and cross-platform.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
The Open Source Feature Store for AI/ML
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
Compare tables within or across databases
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
Implementing best practices for PySpark ETL jobs and applications.
Python Stream Processing
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.
Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.
To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."