This repository is currently work in progress
This repository is part of a larger project.
This project was completed as part of a Data Engineering Bootcamp at Le Wagon Paris and presented at Demo Day on November 8, 2024 (View Project Demo Slides).
The objective of this project was to build a complete ETL and machine learning pipeline—from data ingestion to an end-user interface—using tools covered in the bootcamp. Given a four-day timeframe, we leveraged previous bootcamp exercises as a foundation, enabling us to focus on optimizing and studying the performance of the pipeline.
Repositories that are part of the Taxifare Project:
- Taxifare:
A data engineering pipeline that ingests, processes, and stores NYC taxi ride data in cloud storage and a data warehouse.
- Distributed processing with
Spark, onDataproc - Job orchestration using
Airflow. - Cloud storage on
Google Cloud Storage - Analytical warehouse with
BigQuery
- Distributed processing with
- Taxifare API:
A cloud-deployed API providing a prediction endpoint.
- Built with
FastAPiandGunicorn - Deployed on
Google Cloud Run, using aDockerimage hosted inArtifact Registery
- Built with
- Taxifare Front:
A
Streamlitapplication that allows users to predict taxi fares with our model.
Work in progress
