data-engineering
Here are 925 public repositories matching this topic...
-
Updated
Aug 26, 2021
-
Updated
Aug 14, 2021
-
Updated
May 28, 2021
It seems we have a user-contributed task for SodaSQL that we don't expose in our API documentation, making it very hard to discover; task definition and docstrings here: https://github.com/PrefectHQ/prefect/blob/5de58efaba956b431335d99acab07eaf6a362e1b/src/prefect/tasks/sodasql/sodasql_tasks.py
We should add this to our reference docs using [our development guidelines](https://docs.prefect.io/c
Describe the bug
data docs columns shrink to 1 character width with long query
To Reproduce
Steps to reproduce the behavior:
- make a batch from a long query string
- run validation
- render result to data docs
- See screenshot
<img width="1525" alt="Data_documentation_compiled_by_Great_Expectations" src="https://user-images.githubusercontent.com/928247/103230647-30eca500-4
-
Updated
Aug 27, 2021 - Go
-
Updated
Aug 27, 2021 - Python
-
Updated
Aug 3, 2021
See https://blog.min.io/minio-optimizes-small-objects/.
minIO adds PutObjectExtract, a way to load many small(-ish) objects onto a bucket by uploading a tar file. Support it on lakeFS repos.
-
Updated
Aug 2, 2021 - JavaScript
-
Updated
Aug 27, 2021 - Jupyter Notebook
-
Updated
Aug 27, 2021 - Jupyter Notebook
-
Updated
Jul 3, 2021
-
Updated
Mar 9, 2020 - Python
if they are not class methods then the method would be invoked for every test and a session would be created for each of those tests.
`class PySparkTest(unittest.TestCase):
@classmethod
def suppress_py4j_logging(cls):
logger = logging.getLogger('py4j')
logger.setLevel(logging.WARN)
@classmethod
def create_testing_pyspark_session(cls):
return Sp
Hi ,
I am using some basic functions from pyjanitor such as - clean_names() , collapse_levels() in one of my code which I want to productionise.
And there are limitations on the size of the production code base.
Currently ,if I just look at the requirements.txt for just "pyjanitor" , its huge .
I don't think I require all the dependencies in my code.
How can I remove the unnecessary ones ?
-
Updated
Jun 2, 2021
-
Updated
Mar 5, 2020 - Python
-
Updated
Aug 26, 2021
-
Updated
May 22, 2021
-
Updated
Aug 4, 2021 - Ruby
In the repository handler
- removeEntity tries to delete then if delete is not supported issues a purge, the purge method issues an audit log
- There are 2 callers to purgeRelationship only one of which audit logs
This is inconsistent.
I suggest we move the relationship audit log to the purge method, which means that both callers will audit log.
-
Updated
Aug 21, 2021 - TypeScript
-
Updated
Aug 26, 2021 - Python
We have a new cookbook section: https://github.com/ploomber/projects/tree/master/cookbook
where we add minimal examples for certain use cases, we should add links to each one on the relevant sections in the docs
-
Updated
Feb 7, 2021 - CSS
-
Updated
Nov 29, 2018 - Java
-
Updated
Aug 15, 2021
-
Updated
Jun 9, 2021 - Python
Improve this page
Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."
Currently, we use Native filter on Superset version 1.2, but looks like The actual time range does not show correctly with SIP-15 (in the SIP-15 the time range must is [inclusive, exclusive) ). So that mean the actual time range and the tool tip must show label as: from_date <= col < to_date.
Expected results
![image](https://user-images.githubusercontent.com/37523968/130939207-7ff847a