data-engineering
Here are 1,449 public repositories matching this topic...
-
Updated
Aug 10, 2022
-
Updated
Jul 4, 2022
-
Updated
Jan 25, 2022
Current behavior
Right now, the connection string to Azure can be passed as a string at initialization or read AZURE_STORAGE_CONNECTION_STRING from the environment.
The connection string property is not serialized with the storage object. The only way to get this to work is to have AZURE_STORAGE_CONNECTION_STRING available when the flow is retrieved from storage. For most agent types, t
Change all instances of Airbyte OSS to Airbyte Open Source across all Docs
Management UI for GE
Is your feature request related to a problem? Please describe.
Most of operations of GE are executed from CLI, which is not friendly to non-programmer.
Describe the solution you'd like
A management system with all kinds of web UI to create expectation, query validation results etc.
right now, we silently convert to "default"
from dagster import asset
@asset(group_name="")
def asset():
...
-
Updated
Aug 10, 2022 - Go
Is your feature request related to a problem? Please describe.
When creating a SQLite online store your only option is to create it on the filesystem. As every access needs to hit the filesystem then this slows down the online store.
Describe the solution you'd like
I'd like an option :memory: to use an in memory SQLite store instead. Eg in feature_store.yaml:
onlineWhen there are not enough results, we tell the user that the experiment just started, so come back later. When the experiment dates are set to a future time, this language doesn't fit very well. We should adjust the language to take this future state into account when figuring out the message.
<img width="875" alt="CleanShot 2022-04-10 at 21 23 22@2x" src="https://user-images.githubusercontent
-
Updated
Aug 10, 2022 - Python
Many lakeFS users integrate it with Spark.
To simplify the search experience of docs, Spark integrations should be a top-level category in our documentation.
document early stop
raising this:
will exit the DAG gracefully, but it's undocumented
-
Updated
Aug 10, 2022 - Java
-
Updated
Jun 28, 2022
Description
Currently, we have some plugins which depend on dynamic-library with specific version (like gitextractor depends on libgit2 v1.3.0), which can be hard to satisfy, and, sometimes, user just don't need those plugins at all.
With support for "specifying what plugins to build", user can choose to ignore those plugins, and compile only those he/she wanted.
Plus, this would be conve
-
Updated
Mar 29, 2022 - JavaScript
You might have a column called money in one database, and amount in another. Today we don't have a way to have the columns have different names across the two databases. In the Python API, perhaps it's sufficient to just make it based on the position in the column tuple.
For the CLI, maybe we could use: -c amount:money.
-
Updated
Aug 2, 2022 - Scala
Let's prepare a mixin for interacting with Roles and Policies with the Python client, in case users want to use the API directly.
Do not only have the list, get etc, but also utility methods, such as updating a default role. It should wrap the following logic:
import requests
import json
# Get the ID
data_consumer = requests.get("http://localhost:8585/api/v1/roles/name/DataCo-
Updated
Aug 10, 2022 - Jupyter Notebook
(1) Add docstrings to methods
(2) Covert .format() methods to f strings for readability
(3) Make sure we are using Python 3.8 throughout
(4) zip extract_all() in ingest_flights.py can be simplified with a Path parameter
-
Updated
Dec 31, 2021
-
Updated
Mar 9, 2020 - Python
Hi ,
I am using some basic functions from pyjanitor such as - clean_names() , collapse_levels() in one of my code which I want to productionise.
And there are limitations on the size of the production code base.
Currently ,if I just look at the requirements.txt for just "pyjanitor" , its huge .
I don't think I require all the dependencies in my code.
How can I remove the unnecessary ones ?
-
Updated
Aug 10, 2022 - Python
if they are not class methods then the method would be invoked for every test and a session would be created for each of those tests.
`class PySparkTest(unittest.TestCase):
@classmethod
def suppress_py4j_logging(cls):
logger = logging.getLogger('py4j')
logger.setLevel(logging.WARN)
@classmethod
def create_testing_pyspark_session(cls):
return Sp
-
Updated
Jul 20, 2022 - Python
Improve this page
Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."
Time-series Bar Chart v2 does not update total values for stacked bar chart when toggling legends.
How to reproduce the bug
The legacy Time-series Bar Chart does not have this issue.