data-engineering
Here are 1,048 public repositories matching this topic...
-
Updated
Dec 7, 2021
-
Updated
Aug 14, 2021
-
Updated
May 28, 2021
Current behavior
You get an error if you try to upload the same file name
azure.core.exceptions.ResourceExistsError: The specified blob already exists.
RequestId:5bef0cf1-b01e-002e-6
Proposed behavior
The task should take in an overwrite argument and pass it to [this line](https://github.com/PrefectHQ/prefect/blob/6cd24b023411980842fa77e6c0ca2ced47eeb83e/src/prefect/
Describe the bug
data docs columns shrink to 1 character width with long query
To Reproduce
Steps to reproduce the behavior:
- make a batch from a long query string
- run validation
- render result to data docs
- See screenshot
<img width="1525" alt="Data_documentation_compiled_by_Great_Expectations" src="https://user-images.githubusercontent.com/928247/103230647-30eca500-4
-
Updated
Dec 22, 2021 - Go
Is your feature request related to a problem? Please describe.
I have a framework that handles the offline store. It creates the tables, indexes, reads data from different data sources, does some transformations, and then inserts into the offline store. As a part of this, I can construct the entities, feature views, feature services, etc, a instance of the ParsedRepo class for Feast. What I n
-
Updated
Dec 23, 2021 - Python
-
Updated
Oct 29, 2021
The Stacktrace returned from failed API operations in the HadoopFS implementation loses some important context (Error message, HTTP status code, error body in the response - and request ID), making it very hard to debug configuration issues without tailing the log(s) of lakeFS server(s) itself.
-
Updated
Aug 2, 2021 - JavaScript
-
Updated
Dec 24, 2021 - Jupyter Notebook
-
Updated
Dec 19, 2021 - Jupyter Notebook
-
Updated
Nov 15, 2021
-
Updated
Mar 9, 2020 - Python
if they are not class methods then the method would be invoked for every test and a session would be created for each of those tests.
`class PySparkTest(unittest.TestCase):
@classmethod
def suppress_py4j_logging(cls):
logger = logging.getLogger('py4j')
logger.setLevel(logging.WARN)
@classmethod
def create_testing_pyspark_session(cls):
return Sp
Adjusting docs links
Since we changed the domain of the docs to docs.ploomber.io
We need to search in all of our repos: ploomber, soorgeon, soopervisor and projects:
Look for the ploomber.readthedocs.io address and replace it with docs.ploomber.io
Please follow the contribution guildelines for the docs.
Hi ,
I am using some basic functions from pyjanitor such as - clean_names() , collapse_levels() in one of my code which I want to productionise.
And there are limitations on the size of the production code base.
Currently ,if I just look at the requirements.txt for just "pyjanitor" , its huge .
I don't think I require all the dependencies in my code.
How can I remove the unnecessary ones ?
-
Updated
Dec 23, 2021
-
Updated
Jun 2, 2021
-
Updated
Mar 5, 2020 - Python
Currently EntityResourceTest#get_entityListWithPagination_200 pagination creates 40 entities and performs listing with pagination tests. This is done for all the entities from the base class. such as table, databases, dashboards etc.
int maxEntities = 40;
Change maxEntities to a random number between 5 - 40 to reduce the time spent in testing during builds.
-
Updated
Dec 23, 2021 - Python
-
Updated
Oct 25, 2021
-
Updated
Dec 23, 2021 - Python
-
Updated
Nov 6, 2021 - Ruby
Is there an existing issue for this?
- I have searched the existing issues
Current Behavior
A large amount of output goes to the log, this should not happen by default.
Expected Behavior
much less content in the output of the FVT and the build bu default
Switch on debug in the logging configuration and then see all the output.
Steps To Reproduce
run the build
Env
-
Updated
Dec 9, 2021 - TypeScript
Improve this page
Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."
Screenshot
I've added a red vertical ruler so that you see the issue
Description
As already explained in numerous issues, the use of 'Inter' font is problematic, it does not allow to align dates for instance,
and does not play nice with numbers either.
In my supe