data-engineering

@yousoph

Screenshot

Description

Unnecessary scrollbar

Design input

Please remove them.

cc @yousoph

Current behavior

Right now, the connection string to Azure can be passed as a string at initialization or read AZURE_STORAGE_CONNECTION_STRING from the environment.

The connection string property is not serialized with the storage object. The only way to get this to work is to have AZURE_STORAGE_CONNECTION_STRING available when the flow is retrieved from storage. For most agent types, t

Describe the bug
When trying to run scaffolding (profiling) command, it fails because of commas in columns.

To Reproduce
Steps to reproduce the behavior:

Run great_expectations suite scaffold scaffold-name on datasource where commas are in column
Bug pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 5323 saw 2

Expected behavior
D

The current system tests use the default admin user for all requests.

The auth test should create users with less privileges and check actions are passing or blocked according to the permissions.

Hi ,

I am using some basic functions from pyjanitor such as - clean_names() , collapse_levels() in one of my code which I want to productionise.
And there are limitations on the size of the production code base.
Currently ,if I just look at the requirements.txt for just "pyjanitor" , its huge .
I don't think I require all the dependencies in my code.
How can I remove the unnecessary ones ?

if they are not class methods then the method would be invoked for every test and a session would be created for each of those tests.

`class PySparkTest(unittest.TestCase):
@classmethod
def suppress_py4j_logging(cls):
logger = logging.getLogger('py4j')
logger.setLevel(logging.WARN)

@classmethod
def create_testing_pyspark_session(cls):
    return Sp

To a lesser extent this occurs with Governance Program OMAS, Org

Same as #281 but for SQLAlchemy client:

https://github.com/ploomber/ploomber/blob/088c7f2b3605b4f624f804e206078d7d8d35baf5/src/ploomber/clients/db.py#L107

We can use the flavor property in the constructor to determine whether we are dealing with a sqlite db or not.

data-engineering

Here are 801 public repositories matching this topic...

apache / superset

Screenshot

Description

Design input

eugeneyan / applied-ml

andkret / Cookbook

datastacktv / data-engineer-roadmap

PrefectHQ / prefect

Current behavior

great-expectations / great_expectations

Jeffail / benthos

adilkhash / Data-Engineering-HowTo

awslabs / aws-data-wrangler

kantord / just-dashboard

treeverse / lakeFS

quiltdata / quilt

GoogleCloudPlatform / data-science-on-gcp

san089 / goodreads_etl_pipeline

pyjanitor-devs / pyjanitor

AlexIoannides / pyspark-example-project

oleg-agapov / data-engineering-book

san089 / Udacity-Data-Engineering-Projects

rich-iannone / pointblank

automaticmode / active_workflow

abhishek-ch / around-dataengineering

gunnarmorling / awesome-opensource-data-engineering

odpi / egeria

dataform-co / dataform

kevintpeng / Learn-Something-Every-Day

Cascading / cascading

sodadata / soda-sql

ploomber / ploomber

sderosiaux / every-single-day-i-tldr

alexklibisz / elastik-nearest-neighbors

Improve this page

Add this topic to your repo