data-engineering

Time-series Bar Chart v2 does not update total values for stacked bar chart when toggling legends.

How to reproduce the bug

Create a "Time-series Bar Chart v2"
Go to "Customize" and select "Show value", "Stack series" and "Only total"
Toggle series in legends
The total value should update but it doesn't

The legacy Time-series Bar Chart does not have this issue.

Current behavior

Right now, the connection string to Azure can be passed as a string at initialization or read AZURE_STORAGE_CONNECTION_STRING from the environment.

The connection string property is not serialized with the storage object. The only way to get this to work is to have AZURE_STORAGE_CONNECTION_STRING available when the flow is retrieved from storage. For most agent types, t

Change all instances of Airbyte OSS to Airbyte Open Source across all Docs

Is your feature request related to a problem? Please describe.
Most of operations of GE are executed from CLI, which is not friendly to non-programmer.

Describe the solution you'd like
A management system with all kinds of web UI to create expectation, query validation results etc.

right now, we silently convert to "default"

from dagster import asset


@asset(group_name="")
def asset():
    ...

Is your feature request related to a problem? Please describe.
When creating a SQLite online store your only option is to create it on the filesystem. As every access needs to hit the filesystem then this slows down the online store.

Describe the solution you'd like
I'd like an option :memory: to use an in memory SQLite store instead. Eg in feature_store.yaml:

online

When there are not enough results, we tell the user that the experiment just started, so come back later. When the experiment dates are set to a future time, this language doesn't fit very well. We should adjust the language to take this future state into account when figuring out the message.

<img width="875" alt="CleanShot 2022-04-10 at 21 23 22@2x" src="https://user-images.githubusercontent

Many lakeFS users integrate it with Spark.
To simplify the search experience of docs, Spark integrations should be a top-level category in our documentation.

raising this:

https://github.com/ploomber/ploomber/blob/2fe474987ae6fca088e02399dbff99672a17fc95/src/ploomber/exceptions.py#L83

will exit the DAG gracefully, but it's undocumented

Description

Currently, we have some plugins which depend on dynamic-library with specific version (like gitextractor depends on libgit2 v1.3.0), which can be hard to satisfy, and, sometimes, user just don't need those plugins at all.
With support for "specifying what plugins to build", user can choose to ignore those plugins, and compile only those he/she wanted.
Plus, this would be conve

You might have a column called money in one database, and amount in another. Today we don't have a way to have the columns have different names across the two databases. In the Python API, perhaps it's sufficient to just make it based on the position in the column tuple.

For the CLI, maybe we could use: -c amount:money.

Let's prepare a mixin for interacting with Roles and Policies with the Python client, in case users want to use the API directly.

Do not only have the list, get etc, but also utility methods, such as updating a default role. It should wrap the following logic:

import requests
import json

# Get the ID
data_consumer = requests.get("http://localhost:8585/api/v1/roles/name/DataCo

(1) Add docstrings to methods
(2) Covert .format() methods to f strings for readability
(3) Make sure we are using Python 3.8 throughout
(4) zip extract_all() in ingest_flights.py can be simplified with a Path parameter

Hi ,

I am using some basic functions from pyjanitor such as - clean_names() , collapse_levels() in one of my code which I want to productionise.
And there are limitations on the size of the production code base.
Currently ,if I just look at the requirements.txt for just "pyjanitor" , its huge .
I don't think I require all the dependencies in my code.
How can I remove the unnecessary ones ?

if they are not class methods then the method would be invoked for every test and a session would be created for each of those tests.

`class PySparkTest(unittest.TestCase):
@classmethod
def suppress_py4j_logging(cls):
logger = logging.getLogger('py4j')
logger.setLevel(logging.WARN)

@classmethod
def create_testing_pyspark_session(cls):
    return Sp

data-engineering

Here are 1,449 public repositories matching this topic...

apache / superset

How to reproduce the bug

eugeneyan / applied-ml

andkret / Cookbook

datastacktv / data-engineer-roadmap

PrefectHQ / prefect

Current behavior

airbytehq / airbyte

great-expectations / great_expectations

DataTalksClub / data-engineering-zoomcamp

dagster-io / dagster

benthosdev / benthos

feast-dev / feast

growthbook / growthbook

awslabs / aws-data-wrangler

treeverse / lakeFS

ploomber / ploomber

kestra-io / kestra

adilkhash / Data-Engineering-HowTo

apache / incubator-devlake

Description

kantord / just-dashboard

datafold / data-diff

metarank / metarank

open-metadata / OpenMetadata

quiltdata / quilt

GoogleCloudPlatform / data-science-on-gcp

benthecoder / yt-channels-DS-AI-ML-CS

san089 / goodreads_etl_pipeline

pyjanitor-devs / pyjanitor

sodadata / soda-core

AlexIoannides / pyspark-example-project

abhishek-ch / around-dataengineering

Improve this page

Add this topic to your repo