No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
-
Updated
Aug 23, 2025 - Python
No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
Implementing best practices for PySpark ETL jobs and applications.
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Enterprise-grade and API-first LLM workspace for unstructured documents, including data extraction, redaction, rights management, prompt playground, and more!
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
Integrate LLM in any pipeline - fit/predict pattern, JSON driven flows, and built in concurency support.
concurrent & fluent interface for (async) iterables
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.
Data pipelines from re-usable components
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications
Watchmen Platform is a low code data platform for data pipeline, meta data management , analysis, and quality management
Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python.
Conductor OSS SDK for Python programming language
Near real time ETL to populate a dashboard.
Model Context Protocol (MCP) Server for the Keboola Platform
Data Engineering/Scraping Project. Creating a detailed Sports Relational Database for the Top European Soccer Leagues.
Flowfile is a visual ETL tool and Python library combining drag-and-drop workflows with Polars dataframes. Build data pipelines visually, define flows programmatically with a Polars-like API, and export to standalone Python code. Perfect for fast, intuitive data processing from development to production.
Add a description, image, and links to the etl-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the etl-pipeline topic, visit your repo's landing page and select "manage topics."