data-engineering

A clear and concise description of what the bug is.
The superset chart table sets the number of pagination rows to select the setting, and add the option of whether to select all

like this
![image](https://user-images.githubusercontent.com/52438024/141405677-f9e25aef-e0d3-4e99-986a

Description

It is not an actual bug but in the documentation here -> https://docs.prefect.io/orchestration/concepts/api.html#queries
flow_run actually needs to be flow_runs.
Otherwise it does not work for me.

Expected Behavior

Documentation should be updated.

Reproduction

Thank you for this great tool!

[Describe the bug
A clear and concise description of what the bug is.]

Broken link in the automatically generated Edit Your Expectation Suite starter noteboook: https://docs.greatexpectations.io/en/latest/autoapi/great_expectations/data_asset/index.html?highlight=remove_expectation&utm_source=notebook&utm_medium=edit_expectations#great_expectations.data_

Tell us about the problem you're trying to solve

From slack convo:

Requests for trello source connector: if the root of everything is boards i.e. “return all cards for a board” (same for users, lists etc.), then allow for a list of board_id’s in the source connector configuration.
Also I'm not sure the actions stream pull

When specifying on demand feature views at retrieval time (e.g. get_X_features), the output feature vectors include e.g. request data or dependent feature vectors, even if users did not specify said features.

Expected Behavior

Non-specified dependent feature values are not returned in output

Current Behavior

Non-specified dependent feature values are in output

Steps to reprodu

What

being able to take a data object (or prefix, like a partition) and get back the commit that added/modified it.

Why

This is valuable lineage information that is currently available in lakeFS but not exposed easily, and mimics the behavior of git blame

How

Given the lakeFS API already supports listing the log of commits for an object or prefix (🎉), this could be a `

if they are not class methods then the method would be invoked for every test and a session would be created for each of those tests.

`class PySparkTest(unittest.TestCase):
@classmethod
def suppress_py4j_logging(cls):
logger = logging.getLogger('py4j')
logger.setLevel(logging.WARN)

@classmethod
def create_testing_pyspark_session(cls):
    return Sp

Hi ,

I am using some basic functions from pyjanitor such as - clean_names() , collapse_levels() in one of my code which I want to productionise.
And there are limitations on the size of the production code base.
Currently ,if I just look at the requirements.txt for just "pyjanitor" , its huge .
I don't think I require all the dependencies in my code.
How can I remove the unnecessary ones ?

The load_dotted_path raises the following error if unable to load the module:

Traceback (most recent call last):
  File "/Users/Edu/Desktop/import-error/script.py", line 4, in <module>
    load_dotted_path('tests.quality.fn')
  File "/Users/Edu/dev/ploomber/src/ploomber/util/dotted_path.py", line 128, in load_dotted_path
    module = importlib.import_module(mod)
  File "/Users/

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

The description in the REST resource class GlossaryAuthorGraphRESTResource is from the Subject Area OMAS:

@Tag(name="Subject Area OMAS", description="The Subject Area OMAS supports subject matter experts who are documenting their knowledge about a particular subject. This includes g

data-engineering

Here are 998 public repositories matching this topic...

apache / superset

eugeneyan / applied-ml

andkret / Cookbook

datastacktv / data-engineer-roadmap

PrefectHQ / prefect

Description

Expected Behavior

Reproduction

great-expectations / great_expectations

airbytehq / airbyte

Tell us about the problem you're trying to solve

Jeffail / benthos

feast-dev / feast

Expected Behavior

Current Behavior

Steps to reprodu

awslabs / aws-data-wrangler

adilkhash / Data-Engineering-HowTo

treeverse / lakeFS

What

Why

How

kantord / just-dashboard

quiltdata / quilt

benthecoder / yt-channels-DS-AI-ML-CS

GoogleCloudPlatform / data-science-on-gcp

san089 / goodreads_etl_pipeline

AlexIoannides / pyspark-example-project

pyjanitor-devs / pyjanitor

abhishek-ch / around-dataengineering

ploomber / ploomber

oleg-agapov / data-engineering-book

san089 / Udacity-Data-Engineering-Projects

gunnarmorling / awesome-opensource-data-engineering

automaticmode / active_workflow

sodadata / soda-sql

odpi / egeria

Is there an existing issue for this?

Current Behavior

dataform-co / dataform

kevintpeng / Learn-Something-Every-Day

Cascading / cascading

Improve this page

Add this topic to your repo