The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP
-
Updated
Mar 8, 2023 - Scala
The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Kestra is an infinitely scalable orchestration and scheduling platform, creating, running, scheduling, and monitoring millions of complex pipelines.
A list of useful resources to learn Data Engineering from scratch
Next-Generation Real-Time Data Processing Platform
The open standard for data logging
task management & automation tool
A lightweight stream processing library for Go
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Open-source data observability for analytics engineers.
Smarter data pipelines for audio.
Example end to end data engineering project.
A list about Apache Kafka
Streaming reactive and dataflow graphs in Python
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Code review for data in dbt
Pythonic tool for running data-science/high performance/quantum-computing workflows in heterogenous environments.
Use SQL to build ELT pipelines on a data lakehouse.
Add a description, image, and links to the data-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the data-pipeline topic, visit your repo's landing page and select "manage topics."