Skip to content
#

data-catalog

Here are 65 public repositories matching this topic...

feng-tao
feng-tao commented May 14, 2021

Currently we only support db store publisher (e.g neo4j, mysql,neptune). But it would be pretty easy to support message queue publisher using the interface (e.g SQS, kinesis, Eventhub, kafka) which allows push ETL model support.

There is a pr (amundsen-io/amundsendatabuilder#431) which unfortunately isn't get merged. The pr could be used as an example on how to support t

KevinMellott91
KevinMellott91 commented Sep 14, 2021

The renovate configurations are intentionally configured to prevent automated major version upgrades, as seen in #1639. Therefore, a contribution is requested to manually upgrade the code to use the latest version of Gradle.

The original PR will show the comprehensive change log for Gradle itself, which can be referenced to look for applicable breaking changes. Outside of those, the existing a

vrajat
vrajat commented Feb 14, 2020

It is not surprising that deep and shallow scan show different results. Shallow scan only looks at column names. Deep scan looks at a sample of the data. I've even noticed that two different runs of deep scan show different results as sample rows are different. This is the challenge with not scanning all of the data. Its a trade-off between performance/cost and accuracy. There is no right answer.

jbusecke
jbusecke commented Feb 18, 2021

Intake-esm adds the attribute intake_esm_varname to datasets, and I have encountered cases where that ends up being None (still looking for the exact model).

Zarr does not like that type of metadata:

import xarray as xr
ds_test = xr.DataArray(5).to_dataset(name='test')
ds_test.attrs['test'] = None

ds_test.to_zarr('test.zarr')

gives

------------------------
StewartJingga
StewartJingga commented Sep 4, 2021

Is your feature request related to a problem? Please describe.
I want to be able to check and monitor how many metadata my recipe is processing/extracting.

Describe the solution you'd like

  1. Gather additional metrics for recipe run total data (e.g. runDataCount)
  2. Print out run report in a tabular format after all recipes are finished running

**Describe alternatives you've con

National Data Archive (NADA) is an open source data cataloging system that serves as a portal for researchers to browse, search, compare, apply for access, and download relevant census or survey information. It was originally developed to support the establishment of national survey data archives.

  • Updated Sep 8, 2021
  • PHP

The Data Marketplace frontend repository is part of the Corporate Linked Data Catalog - short: COLID - application. Users can search for registered resources in COLID. It provides a search bar, aggregation filters and search result displaying including term highlighting.

  • Updated Jul 9, 2021
  • TypeScript

The Indexing Crawler Service (ICS) repository is part of the Corporate Linked Data Catalog - short: COLID - application. It is responsible to extract data from a RDF storage system, transform and enrich the data and finally to send it via a message queue to the DMP Webservice for indexing.

  • Updated Sep 25, 2020
  • C#

Improve this page

Add a description, image, and links to the data-catalog topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-catalog topic, visit your repo's landing page and select "manage topics."

Learn more