data-catalog

Currently we only support db store publisher (e.g neo4j, mysql,neptune). But it would be pretty easy to support message queue publisher using the interface (e.g SQS, kinesis, Eventhub, kafka) which allows push ETL model support.

There is a pr (amundsen-io/amundsendatabuilder#431) which unfortunately isn't get merged. The pr could be used as an example on how to support t

The renovate configurations are intentionally configured to prevent automated major version upgrades, as seen in #1639. Therefore, a contribution is requested to manually upgrade the code to use the latest version of Gradle.

The original PR will show the comprehensive change log for Gradle itself, which can be referenced to look for applicable breaking changes. Outside of those, the existing a

@cantzakas

@cantzakas created the SQL query necessary to pull metadata in (hyperqueryhq/whale#140) -- we just have to make the greenplum extractor scaffolding. This should just follow the exact same shape as the Postgres extractor.

It is not surprising that deep and shallow scan show different results. Shallow scan only looks at column names. Deep scan looks at a sample of the data. I've even noticed that two different runs of deep scan show different results as sample rows are different. This is the challenge with not scanning all of the data. Its a trade-off between performance/cost and accuracy. There is no right answer.

Add more logging in all modules to emit debug signals for improved logging.

Motivation

As odd-platform supports redshift and so on, it would be awesome to support BigQuery integration.

Intake-esm adds the attribute intake_esm_varname to datasets, and I have encountered cases where that ends up being None (still looking for the exact model).

Zarr does not like that type of metadata:

import xarray as xr
ds_test = xr.DataArray(5).to_dataset(name='test')
ds_test.attrs['test'] = None

ds_test.to_zarr('test.zarr')

gives

------------------------

Is your feature request related to a problem? Please describe.
I want to be able to check and monitor how many metadata my recipe is processing/extracting.

Describe the solution you'd like

Gather additional metrics for recipe run total data (e.g. runDataCount)
Print out run report in a tabular format after all recipes are finished running

**Describe alternatives you've con

What would you like to be added:
It would be great to add support for a datacatalog-connectors-bi for Sisense.

Why is this needed:
Sisense is a popular BI solution, named as a visionary in the Gartner quadrant

pattern= catalog : dataset name : url : comment
ocean: World Ocean Atlas: https://www.nodc.noaa.gov/OC5/woa18/ : different versions and variables via parameter #15
global carbon budget with https://github.com/edjdavid/intake-excel #22
land: precipitation: https://psl.noaa.gov/data/gridded/tables/precipitation.html:
Mauna Loa CO2 netcdf ftp://aftp.cmdl.noaa.go

data-catalog

Here are 65 public repositories matching this topic...

linkedin / datahub

amundsen-io / amundsen

MarquezProject / marquez

hyperqueryhq / whale

intake / intake

tokern / piicatcher

aws-samples / aws-dbs-refarch-datalake

GoogleCloudPlatform / bigquery-data-lineage

opendatadiscovery / odd-platform

Motivation

intake / intake-esm

getmetamapper / metamapper

odpf / meteor

GoogleCloudPlatform / datacatalog-connectors-rdbms

Bayer-Group / COLID-Documentation

ihsn / nada

GoogleCloudPlatform / datacatalog-connectors-bi

datopian / portal.js.bak

SciCatProject / catanie

FINRAOS / herd-mdl

opendatadiscovery / awesome-data-catalogs

GoogleCloudPlatform / datacatalog-tag-history

slaclab / datacat

aaronspring / remote_climate_data

Bayer-Group / COLID-Setup

Bayer-Group / COLID-Data-Marketplace-Frontend

NCAR / esm-collection-spec

dbt-content / google-datacatalog-dbt-tag

Bayer-Group / COLID-Search-Service

Bayer-Group / COLID-Indexing-Crawler-Service

darenasc / aeda

Improve this page

Add this topic to your repo