#

data-engineering-pipeline

Here are 57 public repositories matching this topic...

goodreads_etl_pipeline

san089 / goodreads_etl_pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Updated Mar 9, 2020
Python

Udacity-Data-Engineering-Projects

san089 / Udacity-Data-Engineering-Projects

Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.

Updated Aug 26, 2022
Python

versatile-data-kit

vmware / versatile-data-kit

Build, run and manage your data pipelines with Python or SQL on any cloud

python data-science data sql pipeline etl analytics snowflake data-warehouse data-engineering dataops warehouse elt data-pipelines data-engineer trino data-lineage data-engineering-pipeline trinodb

Updated Jan 3, 2023
Python

anna-geller / dataflow-ops

Project demonstrating how to automate Prefect 2.0 deployments to AWS ECS Fargate

python aws data-science data automation serverless pipeline analytics orchestration data-engineering dataflow infrastructure-as-code cicd observability prefect data-engineering-pipeline analytics-engineering data-engineering-infrastructure dataflow-ops

Updated Dec 17, 2022
Python

alanchn31 / Movalytics-Data-Warehouse

Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow

docker airflow udacity sql spark analytics aws-s3 movie-database python3 pyspark data-engineering redshift movie-reviews movie-recommendation aws-redshift data-engineering-pipeline data-modelling data-warehouse-cloud data-engineer-nanodegree

Updated Jun 16, 2020
Python

anna-geller / prefect-deployment-patterns

Code examples showing flow deployment to various types of infrastructure

python aws data-science data automation serverless pipeline orchestration data-engineering serverless-framework dataflow prefect data-engineering-pipeline data-products data-engineering-team data-engineering-infrastructure dataflow-ops

Updated Nov 27, 2022
Python

immu0001 / Udacity-Data-Engineer-nanodegree

Classwork projects and home works done through Udacity data engineering nano degree

data-science big-data spark etl s3-bucket data-analysis redshift data-pipelines classwork emr-cluster data-lake-analytics data-engineering-pipeline airflow-dags

Updated Nov 28, 2022
Jupyter Notebook

anna-geller / prefect-aws-lambda

Deploy a Prefect flow to serverless AWS Lambda function

python aws data-science lambda automation aws-lambda serverless pipeline data-engineering serverless-framework event-driven dataflow cicd event-driven-architecture data-engineering-pipeline data-engineering-infrastructure dataflow-ops

Updated Sep 27, 2022
Python

sanjeevai / disaster-response-pipeline

ETL pipeline combined with supervised learning and grid search to classify text messages sent during a disaster event

sqlite-database supervised-learning grid-search-hyperparameters etl-pipeline data-engineering-pipeline disaster-event

Updated Feb 24, 2019
Python

mikeroyal / Apache-Spark-Guide

Apache Spark Guide

data-science machine-learning awesome big-data spark apache-spark pyspark data-engineering spark-streaming awesome-list data-engineering-pipeline awesome-automations

Updated Feb 1, 2022
Python

dylanzenner / business_closures_de_pipeline

Data Engineering pipeline hosted entirely in the AWS ecosystem utilizing DocumentDB as the database

slack aws aws-lambda aws-s3 documentdb data-engineering quicksight aws-cloudwatch-events data-engineering-pipeline aws-secretsmanager aws-ssm-document aws-ecosystem aws-ec2-bastion

Updated Oct 26, 2021
Python

ketgo / marshmallow-pyspark

Marshmallow serializer integration with pyspark

schema spark pyspark data-engineering marshmallow data-pipelines data-cleaning data-engineering-pipeline data-schemas

Updated Nov 11, 2022
Python

antimoz-om / Antimoz

A data engineering pipeline for digital marketers.

kafka hadoop data-analytics digital-marketing data-engineering-pipeline

Updated Dec 21, 2018
Shell

Alero-Awani / Batch-data-engineering-project

A batch Data Pipeline that retrieves data from a user purchase table and a movie review table and is transformed to form a user behaviour metric table.

docker airflow sql pipeline terraform aws-s3 pyspark data-engineering-pipeline

Updated Sep 8, 2022
HCL

brunocampos01 / predicting-retail-churn-with-azure-ml-studio

Challenge to job: Data Scientist

Updated Jan 10, 2022
Python

san089 / data-engineer-roadmap

Learning from multiple companies in Silicon Valley. Netflix, Facebook, Google, Startups

data etl data-engineering data-engineer data-engineering-pipeline carrers

Updated Sep 17, 2018

benedekrozemberczki / AV_Ultimate_Student_Hunt

Sponsor

Solution for the Ultimate Student Hunt Challenge (1st place).

competition machine-learning r kaggle data-engineering supervised-learning xgboost forecasting weather-forecast data-cleaning gradient-boosting analytics-vidhya-competition distributed-machine-learning winning-entry data-engineering-pipeline driven-data extreme-gradient-boosting student-hunt

Updated Mar 2, 2022
R

datarootsio / notion-dbs-data-quality

Using Great Expectations and Notion's API, this repo aims to provide data quality for our databases in Notion.

notion data-quality data-engineering-pipeline notion-database notion-api great-expectations

Updated Nov 30, 2021
Python

koksang / social-media-analysis

Social Media Analysis, scalable solution, flexible deployment that analyses social media contents

python bigquery social-media twitter kafka etl gcp python3 data-engineering dbt ray apache-airflow data-engineering-pipeline

Updated Dec 9, 2022
Jupyter Notebook

markditsworth / TweetAnalyzer

An environment for analyzing Twitter

python docker elasticsearch kibana logstash twitter kafka nlp-machine-learning data-engineering-pipeline

Updated Dec 8, 2022
Python

Improve this page

Add a description, image, and links to the data-engineering-pipeline topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-engineering-pipeline topic, visit your repo's landing page and select "manage topics."