data-catalog

Currently we only support db store publisher (e.g neo4j, mysql,neptune). But it would be pretty easy to support message queue publisher using the interface (e.g SQS, kinesis, Eventhub, kafka) which allows push ETL model support.

There is a pr (amundsen-io/amundsendatabuilder#431) which unfortunately isn't get merged. The pr could be used as an example on how to support t

The renovate configurations are intentionally configured to prevent automated major version upgrades, as seen in #1639. Therefore, a contribution is requested to manually upgrade the code to use the latest version of Gradle.

The original PR will show the comprehensive change log for Gradle itself, which can be referenced to look for applicable breaking changes. Outside of those, the existing a

@cantzakas

@cantzakas created the SQL query necessary to pull metadata in (hyperqueryhq/whale#140) -- we just have to make the greenplum extractor scaffolding. This should just follow the exact same shape as the Postgres extractor.

It is not surprising that deep and shallow scan show different results. Shallow scan only looks at column names. Deep scan looks at a sample of the data. I've even noticed that two different runs of deep scan show different results as sample rows are different. This is the challenge with not scanning all of the data. Its a trade-off between performance/cost and accuracy. There is no right answer.

Motivation

As odd-platform supports redshift and so on, it would be awesome to support BigQuery integration.

Add more logging in all modules to emit debug signals for improved logging.

Intake-esm adds the attribute intake_esm_varname to datasets, and I have encountered cases where that ends up being None (still looking for the exact model).

Zarr does not like that type of metadata:

import xarray as xr
ds_test = xr.DataArray(5).to_dataset(name='test')
ds_test.attrs['test'] = None

ds_test.to_zarr('test.zarr')

gives

------------------------

Deliverables

add unit tests
add extractor
add README.md in plugins/extractors/mariadb, defining output
register your extractor plugins/extractors/populate.go
add extractor the extractor list in docs/reference/extractor.md

Output must contain a Table

Table

Field	Sample Value
`urn`	`my_database.my_t

What would you like to be added:
It would be great to add support for a datacatalog-connectors-bi for Sisense.

Why is this needed:
Sisense is a popular BI solution, named as a visionary in the Gartner quadrant

pattern= catalog : dataset name : url : comment
ocean: World Ocean Atlas: https://www.nodc.noaa.gov/OC5/woa18/ : different versions and variables via parameter #15
global carbon budget with https://github.com/edjdavid/intake-excel #22
land: precipitation: https://psl.noaa.gov/data/gridded/tables/precipitation.html:
Mauna Loa CO2 netcdf ftp://aftp.cmdl.noaa.go

data-catalog

Here are 65 public repositories matching this topic...

linkedin / datahub

amundsen-io / amundsen

MarquezProject / marquez

hyperqueryhq / whale

intake / intake

tokern / piicatcher

opendatadiscovery / odd-platform

Motivation

GoogleCloudPlatform / bigquery-data-lineage

aws-samples / aws-dbs-refarch-datalake

intake / intake-esm

odpf / meteor

Deliverables

Output must contain a Table

Table

getmetamapper / metamapper

GoogleCloudPlatform / datacatalog-connectors-rdbms

Bayer-Group / COLID-Documentation

ihsn / nada

GoogleCloudPlatform / datacatalog-connectors-bi

datopian / portal.js.bak

opendatadiscovery / awesome-data-catalogs

FINRAOS / herd-mdl

SciCatProject / catanie

GoogleCloudPlatform / datacatalog-tag-history

slaclab / datacat

aaronspring / remote_climate_data

Bayer-Group / COLID-Setup

Bayer-Group / COLID-Data-Marketplace-Frontend

dbt-content / google-datacatalog-dbt-tag

NCAR / esm-collection-spec

carte-data / carte

Bayer-Group / COLID-Search-Service

Bayer-Group / COLID-Indexing-Crawler-Service

Improve this page

Add this topic to your repo