#

data-pipeline

Here are 437 public repositories matching this topic...

snowplow

snowplow / snowplow

The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP

data analytics snowplow data-collection data-pipeline product-analytics marketing-analytics snowplow-pipeline snowplow-events

Updated Mar 8, 2023
Scala

apache / incubator-seatunnel

SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).

real-time offline high-performance apache data-integration sql-engine data-pipeline etl-framework seatunnel

Updated Mar 14, 2023
Java

kestra

kestra-io / kestra

Kestra is an infinitely scalable orchestration and scheduling platform, creating, running, scheduling, and monitoring millions of complex pipelines.

workflow data pipeline etl workflow-engine scheduler orchestration data-engineering data-integration elt data-pipeline data-quality low-code data-orchestration data-orchestrator reverse-etl

Updated Mar 14, 2023
Java

adilkhash / Data-Engineering-HowTo

A list of useful resources to learn Data Engineering from scratch

distributed-systems scala cloud-providers data-engineering data-pipeline

Updated Feb 1, 2023

memphis

memphisdev / memphis

Next-Generation Real-Time Data Processing Platform

kubernetes golang data enrichment microservices schema-registry message-bus message-queue data-engineering data-pipeline message-broker data-streaming data-stream-processing messaging-queue

Updated Mar 14, 2023
Go

whylabs / whylogs

The open standard for data logging

python data-science machine-learning analytics logging constraints dataset dataops data-pipeline data-quality calculate-statistics data-constraints mlops model-performance ml-pipelines ai-pipelines approximate-statistics statistical-properties

Updated Mar 14, 2023
Jupyter Notebook

pydoit / doit

task management & automation tool

python workflow data-science build-automation task-runner build-tool build-system workflow-management hacktoberfest data-pipeline workflow-automation

Updated Jan 16, 2023
Python

go-streams

reugn / go-streams

A lightweight stream processing library for Go

Updated Mar 7, 2023
Go

bytedance / bitsail

BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.

real-time big-data high-performance data-lake data-integration flink data-synchronization data-pipeline

Updated Mar 14, 2023
Java

GoogleCloudPlatform / data-science-on-gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

data-science machine-learning data-visualization data-engineering cloud-computing data-analysis data-processing data-pipeline

Updated Dec 20, 2022
Jupyter Notebook

elementary

elementary-data / elementary

Open-source data observability for analytics engineers.

bigquery snowflake data-warehouse dataops data-analysis redshift dbt data-pipelines data-pipeline lineage data-governance data-lineage analytics-engineer dbt-packages data-observability data-reliability dbt-artifacts

Updated Mar 14, 2023
HTML

klio

spotify / klio

Smarter data pipelines for audio.

signal-processing data-pipeline audio-processing media-processing

Updated Mar 23, 2022
Python

damklis / DataEngineeringProject

Example end to end data engineering project.

python redis elasticsearch airflow kafka big-data mongodb scraping django-rest-framework s3 data-engineering minio kafka-connect hacktoberfest data-pipeline debezium

Updated Dec 8, 2022
Python

vdp

instill-ai / vdp

Sponsor

💧 Versatile Data Pipeline (VDP) is an open-source tool to seamlessly integrate AI for unstructured data into the modern data stack

Updated Mar 14, 2023
Smarty

infoslack / awesome-kafka

A list about Apache Kafka

infrastructure kafka apache-spark stream-processing apache-kafka kafka-streams data-processing data-pipeline streaming-data

Updated Nov 30, 2022

streamlet-dev / tributary

Streaming reactive and dataflow graphs in Python

python streaming kafka stream asynchronous websockets python3 lazy-evaluation data-pipeline reactive-data-streams python-data-streams

Updated Nov 22, 2022
Python

msamogh / nonechucks

Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!

machine-learning torch pytorch data-preprocessing preprocessing data-processing data-cleaning data-pipeline

Updated Sep 22, 2022
Python

piperider

InfuseAI / piperider

Code review for data in dbt

python data-science reporting exploratory-data-analysis eda data-visualization code-review pull-requests dbt data-exploration data-pipeline data-quality data-profiling data-testing data-observability data-profiler data-reliability continue-integration dbt-metrics

Updated Mar 14, 2023
Python

covalent

AgnostiqHQ / covalent

Pythonic tool for running data-science/high performance/quantum-computing workflows in heterogenous environments.

Updated Mar 14, 2023
Python

cuebook / cuelake

Use SQL to build ELT pipelines on a data lakehouse.

sql apache-spark etl pipelines data-engineering data-lake data-transfer delta data-integration upsert elt data-pipeline datalake data-ingestion spark-sql zeppelin-notebook apache-iceberg lakehouse incremental-updates

Updated May 25, 2022
JavaScript

Improve this page

Add a description, image, and links to the data-pipeline topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-pipeline topic, visit your repo's landing page and select "manage topics."