data-lake

On more advanced versions of LakeFS (probably > = v1.0.0), we would like to remove the logic that tries to fill the generation field in DB when loading old dumps. It means we will no longer support loading dump that made with a version lower than v0.61.0.

Code of Conduct

I agree to follow this project's Code of Conduct

Search before asking

I have searched in the issues and found no similar issues.

Describe the subtask

TPCDSTable table should Implement SupportsMetadataColumns to expose meta columns for tables.

When an Item in the queue is added with incorrect type for the corresponding Data Mapper, the Job fails during planning, without any information about which datamapper/queue item id is involved.

Let's take a Data Mapper with a identifier of type int for instance. If we add foo to the deletion queue, the find will fail with a log like this:

{
  "EventData": {
    "Error": "ValueError

Is your feature request related to a problem? Please describe.
Executing all tests takes already about 30mins. We should try to optimize that.

Describe the solution you'd like
Much time is taken by preparing input data by writing test data to DataObjects (Csv or Hive). This could be significantly reduced by creating a custom DataObject where a DataFrame can be set as input data, which

If ZeeQS imports records from a Zeebe cluster with multiple partitions then it can happen that variable updates, element instance transitions, and message correlations are not persisted.

The problem is caused by the importer. It uses the record position as ID for the entities. But the positions are not unique across multiple partitions.

Related issue: https://github.com/camunda-community-hub

data-lake

Here are 158 public repositories matching this topic...

treeverse / lakeFS

apache / incubator-kyuubi

Code of Conduct

Search before asking

Describe the subtask

Teradata / kylo

san089 / goodreads_etl_pipeline

san089 / Udacity-Data-Engineering-Projects

uber / marmaray

kaiwaehner / hivemq-mqtt-tensorflow-kafka-realtime-iot-machine-learning-training-inference

awslabs / aws-serverless-data-lake-framework

cuebook / cuelake

Azure / usql

awslabs / amazon-s3-find-and-forget

Azure / AzureDataLake

alanchn31 / Data-Engineering-Projects

smart-data-lake / smart-data-lake

aws-samples / aws-dbs-refarch-datalake

datamindedbe / lighthouse

LearningJournal / SparkProgrammingInScala

Jayvardhan-Reddy / Azure-Certification-DP-200

LearningJournal / Spark-Streaming-In-Python

vineeths96 / Data-Engineering-Nanodegree

camunda-community-hub / zeeqs

realtimedatalake / rtdl

aws-samples / analyzing-reddit-sentiment-with-aws

garystafford / tickit-data-lake-demo

OElesin / querypal

datarootsio / terraform-module-azure-datalake

data-mill-cloud / data-mill

rayyan17 / jobAnalytics_and_search

ExpediaGroup / hiveberg

KentHsu / Udacity-Data-Engineering-Nanodgree

Improve this page

Add this topic to your repo