NYC Taxi Trip Analytics – End-to-End Data Pipeline & Dashboard

This project demonstrates an end-to-end data pipeline for analyzing NYC Taxi trip data. It involves data cleaning, cloud storage, data modeling using a star schema on BigQuery, and visualization through Looker Studio.

Project Objectives

Clean and preprocess raw trip-level data for consistency and accuracy
Store and manage data using cloud-native infrastructure (GCS + BigQuery)
Design a dimensional data model to support analytical querying
Build a scalable dashboard to surface operational, financial, and behavioral insights

Tech Stack

Component	Tool/Service
Data Cleaning	Python (Jupyter Notebook)
Cloud Storage	Google Cloud Storage
Data Warehouse	BigQuery
Dashboarding	Looker Studio
Data Modeling	Star Schema

Workflow Overview

1. Data Cleaning (`data_exploration.ipynb`)

Loaded raw NYC taxi CSV dataset using pandas.
Cleaned data by:
- Parsing and formatting datetime columns.
- Filtering out invalid entries (e.g., zero/negative distance, fare, or passengers).
- Dropping duplicates and nulls in critical fields.
Exported cleaned data as trips_cleaned.csv.

View Data Cleaning Notebook
Download Cleaned CSV

2. Cloud Storage

Uploaded trips_cleaned.csv to a GCS bucket: gs://nyc-taxi-data-cleaned/trips_cleaned.csv

3. BigQuery Data Warehouse

a. Fact Table

Table: trips_cleaned_1.fact_trip
Contains all numeric and transactional data (distance, fare, time, surcharges, tips, etc.).

b. Dimension Tables

dim_payment_type: Maps payment_type_id → payment_type_description
dim_rate_code: Maps RatecodeID to rate descriptions
dim_location: Maps LocationID → Borough, Zone, Service Zone
dim_vendor: Maps VendorID to vendor names

c. Data Model

Implemented a star schema, joining fact_trip to relevant dimensions for optimized query performance and semantic clarity.

Sample Analytical Queries

Weekly Revenue Trend

SELECT
  EXTRACT(WEEK FROM tpep_pickup_datetime) AS week,
  SUM(total_amount) AS weekly_revenue
FROM trips_cleaned_1.fact_trip
GROUP BY week
ORDER BY week;

Average Tip by Payment Type

SELECT
  dpt.payment_type_description,
  AVG(tip_amount) AS avg_tip
FROM fact_trip ft
JOIN dim_payment_type dpt ON ft.payment_type = dpt.payment_type_id
GROUP BY dpt.payment_type_description;

Top Pickup Zones by Revenue

SELECT
  dl.zone AS pickup_zone,
  SUM(total_amount) AS revenue
FROM fact_trip ft
JOIN dim_location dl ON ft.PULocationID = dl.location_id
GROUP BY pickup_zone
ORDER BY revenue DESC
LIMIT 5;

Open BigQuery Dataset

Looker Studio Dashboard

Connected Looker Studio to BigQuery to build an interactive dashboard with the following sections:

Pages:

Overview: Weekly revenue/trips, top pickup zones, trip volume by hour
Operations: Revenue/trips segmented by borough, ratecode, vendor
Revenue Analytics: Component-wise revenue breakdown (e.g., tips, tolls), waterfall, funnel charts

View Looker Studio Dashboard

Directory Structure

├── data/
│   └── trips_cleaned.csv
├── data_exploration.ipynb
├── sql/
│   ├── create_fact_table.sql
│   ├── create_dim_tables.sql
│   └── insights.sql
└── dashboard/
    └── looker_studio_link.txt
└── results/
    └── top_5_rows.csv

Deployment & Automation Notes

Cleaned data is manually uploaded to GCS. For automation, integrate with Cloud Functions or Composer.
BigQuery views are refreshed on query execution.
Dashboard is auto-updated via live BigQuery connection.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
dashboard		dashboard
data		data
sql		sql
.gitignore		.gitignore
NYC_Dashboard.pdf		NYC_Dashboard.pdf
README.md		README.md
data_exploration.ipynb		data_exploration.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NYC Taxi Trip Analytics – End-to-End Data Pipeline & Dashboard

Project Objectives

Tech Stack

Workflow Overview

1. Data Cleaning (`data_exploration.ipynb`)

2. Cloud Storage

3. BigQuery Data Warehouse

a. Fact Table

b. Dimension Tables

c. Data Model

Sample Analytical Queries

Weekly Revenue Trend

Average Tip by Payment Type

Top Pickup Zones by Revenue

Looker Studio Dashboard

Pages:

View Looker Studio Dashboard

Directory Structure

Deployment & Automation Notes

About

Uh oh!

Releases

Packages

Languages

ruru-lyy/NYC-Taxi-Trip-EDA-Dashboard

Folders and files

Latest commit

History

Repository files navigation

NYC Taxi Trip Analytics – End-to-End Data Pipeline & Dashboard

Project Objectives

Tech Stack

Workflow Overview

1. Data Cleaning (data_exploration.ipynb)

2. Cloud Storage

3. BigQuery Data Warehouse

a. Fact Table

b. Dimension Tables

c. Data Model

Sample Analytical Queries

Weekly Revenue Trend

Average Tip by Payment Type

Top Pickup Zones by Revenue

Looker Studio Dashboard

Pages:

View Looker Studio Dashboard

Directory Structure

Deployment & Automation Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Data Cleaning (`data_exploration.ipynb`)

Packages