data-engineering

Use Case

Please provide a use case to help us understand your request in context
The Kubernetes Job tasks in our task library mimic the Kubernetes API, but an expected 'normal' use case of them is composed of several steps, namely creating a namespaced job, polling for it to complete, and deleting the job at the end. Right now no task in the task library knows how to poll for job status, an

Situation

When creating a package:

import quilt3
quilt3.config(default_remote_registry='s3://your-bucket')
p = quilt3.Package()
p.push("username/packagename")

The package name can be any string. In particular it may be e.g. fashion-mnist.

Why is it wrong?

I would like

janitor.biology could do with a to_fasta function, I think. The intent here would be to conveniently export a dataframe of sequences as a FASTA file, using one column as the fasta header.

strawman implementation below:

import pandas_flavor as pf
from Bio.SeqRecord import SeqRecord
from Bio.Seq import Seq
from Bio import SeqIO

@pf.register_dataframe_method
def to_fasta(d

In this example the generated table of contents doesn't link to the sections on the page, because the headers have anchor tags in them. These should be sanitized out.

select * from regexp_extract('asd', 'asd123')

= format =>

select
  *
from
  regexp_extract('asd', 'asd123')

but

select * from regexp_extract('asd, 'asd123')

= format =>

select
  *
from
  regexp_extract(
    'asd, ' asd123 ')

Followon from #2324 but will target for 1.4 release - remove the unnecessary excusions

If you make a bokeh plot, it will not show the bottom portion or the x-axis unless you put the window in fullscreen mode. Width is okay.

Ubuntu 16.04, Ansible 2.3.0
As per the readme, a directory should be created at /etc/ansible/hosts. However, the default Ansible inventory location is /etc/ansible/hosts. This means the default inventory location specified in /etc/ansible/ansible.cfg must be changed to some other location. It would be good to specify it in the readme to not confuse Ansible newcomers. Thanks!

Update elements of GettingStarted.md based on snags and issues during development and testing

Expected Behavior

Consistent use of logging levels in Waimak. Minimal use of INFO level.

Actual Behavior

Many messages (especially in storage) and logged at INFO when DEBUG should be used.

Specifications

Spark Version: 2.2
Operating System: Linux
Waimak Module: waimak-core, waimak-storage...

data-engineering

Here are 445 public repositories matching this topic...

PrefectHQ / prefect

Use Case

kantord / just-dashboard

adilkhash / Data-Engineering-HowTo

quiltdata / quilt

Situation

Why is it wrong?

awslabs / aws-data-wrangler

GoogleCloudPlatform / data-science-on-gcp

san089 / goodreads_etl_pipeline

ericmjl / pyjanitor

AlexIoannides / pyspark-example-project

kevintpeng / Learn-Something-Every-Day

Cascading / cascading

san089 / Udacity-Data-Engineering-Projects

alexklibisz / elastik-nearest-neighbors

sderosiaux / every-single-day-i-tldr

dataform-co / dataform

odpi / egeria

gunnarmorling / awesome-opensource-data-engineering

aiguofer / gspread-pandas

LGE-ARC-AdvancedAI / auptimizer

eBay / accelerator

Leverege / gcp-data-engineer-exam

d6t / d6t-python

swoop-inc / spark-alchemy

Flor91 / Data-engineering-nanodegree

sernst / cauldron

Minyus / pipelinex

InsightDataScience / ansible-playbook

polakowo / datadocs

finos / datahelix

CoxAutomotiveDataSolutions / waimak

Expected Behavior

Actual Behavior

Specifications

Improve this page

Add this topic to your repo