MediaMarkt.de Data Analytics Project (GCP) – Looker Dashboard 🔗

This repository showcases an end-to-end data pipeline project built specifically to demonstrate my skills as a Data Analyst candidate for MediaMarkt.

Note on Ethics:
I am fully aware that large-scale data scraping without permission is not ethical.
For this project, I intentionally collected only a very small sample of publicly available product information, purely for learning and demonstration purposes.

This project demonstrates proficiency in:

🐍 Python for data extraction and automation
🗄️ BigQuery SQL for data modeling and cleaning
☁️ Cloud Run / Cloud Scheduler for scalable pipelines
📊 Looker Studio for visualization and reporting

The dashboard and pipeline architecture are inspired by real business requirements at MediaMarkt, showing how I can turn raw data into actionable insights as a data analyst.

🏗 Architecture

The pipeline consists of the following components:

🕸 Web Scraping (Cloud Run + Python)
📂 File Storage (CSV in GCP Cloud Storage)
🗄 Data Transformation (BigQuery SQL + Views)
📊 Reporting (Looker Studio BI)

🔄 Pipeline Steps

1. 🕸 Web Scraping – Cloud Run

Scrapes MediaMarkt.de product data using Python.
Deployed on Cloud Run.
Scheduled with Cloud Scheduler.

2. 📂 File Storage – Cloud Storage

Scraped data is exported as CSV.
CSV files are stored in a GCP Cloud Storage bucket.

3. 🗄 Data Transformation – BigQuery

BigQuery loads CSV data from Cloud Storage.
SQL scripts clean, transform, and normalize the data.
A clean BigQuery view is created for analytics.

4. 📊 Reporting – Looker

Looker connects to the BigQuery clean view.
Dashboards and reports are created for analytics.

🏆 Results – Looker Dashboard

The final output of this project is an interactive Looker Studio dashboard that provides insights on MediaMarkt products.

Live Dashboard: Click here to explore the dashboard

🛠 Technology Stack

🐍 Scraping: Python, Cloud Run
☁️ Data Storage: Cloud Storage (CSV)
🗄 Processing: BigQuery SQL
📊 Visualization: Looker Studio (BI)
⏱ Orchestration: Cloud Scheduler

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
architecture		architecture
google-cloud-platform/bgquery_sql		google-cloud-platform/bgquery_sql
looker		looker
scraping		scraping
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MediaMarkt.de Data Analytics Project (GCP) – Looker Dashboard 🔗

🏗 Architecture

🔄 Pipeline Steps

1. 🕸 Web Scraping – Cloud Run

2. 📂 File Storage – Cloud Storage

3. 🗄 Data Transformation – BigQuery

4. 📊 Reporting – Looker

🏆 Results – Looker Dashboard

🛠 Technology Stack

About

Uh oh!

Releases

Packages

Languages

shahsuvarli/mediamarkt-gcp

Folders and files

Latest commit

History

Repository files navigation

MediaMarkt.de Data Analytics Project (GCP) – Looker Dashboard 🔗

🏗 Architecture

🔄 Pipeline Steps

1. 🕸 Web Scraping – Cloud Run

2. 📂 File Storage – Cloud Storage

3. 🗄 Data Transformation – BigQuery

4. 📊 Reporting – Looker

🏆 Results – Looker Dashboard

🛠 Technology Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages