DataOps

DataOps is an automated, process-oriented methodology, used by analytic and data teams, to improve the quality and reduce the cycle time of data analytics. While DataOps began as a set of best practices, it has now matured to become a new and independent approach to data analytics. DataOps applies to the entire data lifecycle from data preparation to reporting, and recognizes the interconnected nature of the data analytics team and information technology operations.

We are using the protobuf-git configuration as described at https://cloudhut.dev/docs/features/protobuf#git-repository
In our repository the proto-files live within a proto directory, which seems to be very common, and contains 5 levels of nested folders.

Currently KOWL searches only the first 5 levels of the checkout for .proto files, so our last level is not considered.

~~Please~~

TL;DR

With all the work behind us in pyflyte package, run, and register, along with the legacy serialize, the pyflyte code could probably use some cleanup.

Details

This ticket includes but shouldn't be construed as to be limited to the following:

Use the same module loading code between the run and register commands. Since run has to first register, and since it also

Summary

There is a typo in the output here: https://github.com/whylabs/whylogs/blob/mainline/src/whylogs/cli/demo_cli.py#L113

Also we can update this to better align with our examples and suggested use, e.g. we can remove prompting for 'project' and 'pipeline' since these aren't required, and streamline the script to simplify the demo experience.

The default RubrixLogHTTPMiddleware record mapper for token classification expect a structured including a text field for inputs. This could make prediction model inputs a bit cumbersome. Default mapper could accepts also flat strings as inputs:

def token_classification_mapper(inputs, outputs):
    i

Sending a rest call to delete a job specification throws 404 where as grpc call works fine. Steps to reproduce

curl -X DELETE "http://localhost:9100/v1/project/my-project/namespace/kush/helloworld" -H  "accept: application/json"

Task Overview

Currently timestamp_column is the only configuration that is needed to be configured globally in the model config section (usually it's being configured in the properties.yml under elementary in the config tag).
Passing the timestamp_column as a test param will enable running multiple tests with different timestamp columns. For example running a test with updated_at colum

Currently, both Kafka and Influx sink logs only the data(Row) that is being sent.

Add support for logging column names as well along with data points similar to the implementation in log sink.

This will enable users to correlate the data points with column names.

Zap configurations should be pushed to grpc middleware here: cmd/setup.go#L47

In golang client, consumers get dynamic message instance after parsing. Add an example in the docs on how to use dynamic message instance to get values of different types in consumer code.

List of protobuf types to cover

timestamp
duration
bytes
message type
struct
map

Siren creates alertmanager config and sycns with alertmanger. The alert manager config can change for the same subscriptions if their order changes. We should follow some sorting conventions and stick to those conventions to create an alert manager config.

DataOps

Here are 109 public repositories matching this topic...

redpanda-data / kowl

flyteorg / flyte

TL;DR

Details

lensesio / fast-data-dev

whylabs / whylogs

Summary

recognai / rubrix

lensesio / stream-reactor

odpf / optimus

elementary-data / elementary

alibaba / SREWorks

polyaxon / datatile

taivop / awesome-data-annotation

qlangtech / tis

Azure-Samples / modern-data-warehouse-dataops

odpf / firehose

merantix-momentum / squirrel-core

odpf / dagger

odpf / shield

meltano / meltano

gojekfarm / beast

lensesio / lenses-docker

odpf / raccoon

odpf / guardian

odpf / stencil

odpf / meteor

sernst / cauldron

lensesio / kafka-helm-charts

beneath-hq / beneath

opendatadiscovery / awesome-data-catalogs

awslabs / aws-ddk

odpf / siren

Related Topics