data-engineering
Here are 445 public repositories matching this topic...
-
Updated
Jul 6, 2020 - JavaScript
-
Updated
Jun 12, 2020
Situation
When creating a package:
import quilt3
quilt3.config(default_remote_registry='s3://your-bucket')
p = quilt3.Package()
p.push("username/packagename")
The package name can be any string. In particular it may be e.g. fashion-mnist.
Why is it wrong?
I would like
-
Updated
Jul 3, 2020 - Python
-
Updated
Jun 30, 2020 - Jupyter Notebook
-
Updated
Mar 9, 2020 - Python
janitor.biology could do with a to_fasta function, I think. The intent here would be to conveniently export a dataframe of sequences as a FASTA file, using one column as the fasta header.
strawman implementation below:
import pandas_flavor as pf
from Bio.SeqRecord import SeqRecord
from Bio.Seq import Seq
from Bio import SeqIO
@pf.register_dataframe_method
def to_fasta(d-
Updated
Jul 9, 2020 - Python
In this example the generated table of contents doesn't link to the sections on the page, because the headers have anchor tags in them. These should be sanitized out.
-
Updated
Nov 29, 2018 - Java
-
Updated
Mar 5, 2020 - Python
-
Updated
Apr 20, 2020 - Python
-
Updated
May 6, 2020
Followon from #2324 but will target for 1.4 release - remove the unnecessary excusions
-
Updated
May 11, 2020
-
Updated
Jun 18, 2020 - Python
-
Updated
May 12, 2020 - Python
-
Updated
Jun 24, 2020 - Python
-
Updated
Mar 25, 2019
-
Updated
Jun 30, 2020 - Scala
-
Updated
Aug 7, 2019 - Jupyter Notebook
-
Updated
Jun 22, 2020 - Python
Ubuntu 16.04, Ansible 2.3.0
As per the readme, a directory should be created at /etc/ansible/hosts. However, the default Ansible inventory location is /etc/ansible/hosts. This means the default inventory location specified in /etc/ansible/ansible.cfg must be changed to some other location. It would be good to specify it in the readme to not confuse Ansible newcomers. Thanks!
-
Updated
Jun 6, 2020 - JavaScript
Update elements of GettingStarted.md based on snags and issues during development and testing
Expected Behavior
Consistent use of logging levels in Waimak. Minimal use of INFO level.
Actual Behavior
Many messages (especially in storage) and logged at INFO when DEBUG should be used.
Specifications
- Spark Version: 2.2
- Operating System: Linux
- Waimak Module: waimak-core, waimak-storage...
Improve this page
Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."
Use Case
Please provide a use case to help us understand your request in context
The Kubernetes Job tasks in our task library mimic the Kubernetes API, but an expected 'normal' use case of them is composed of several steps, namely creating a namespaced job, polling for it to complete, and deleting the job at the end. Right now no task in the task library knows how to poll for job status, an