Skip to content
#

deduplication

Here are 246 public repositories matching this topic...

jkowalski
jkowalski commented Nov 29, 2021

Currently directory contents are uploaded without compression, which makes q blobs larger than they need to be.

Given that directory JSON data is trivially 3x compressible with high throughput using pgzip algorithm, we should enable compression by default and/or make it selectable by policy.

Unfortunately this can't be the default in v1 index format, because compression is done in obj

zouzias
zouzias commented Apr 21, 2019

Is your feature request related to a problem? Please describe.
Currently, MapType are not supported for Spark DataFrames

Describe the solution you'd like
Add support for MapType Spark DataFrame columns

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other co

Improve this page

Add a description, image, and links to the deduplication topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the deduplication topic, visit your repo's landing page and select "manage topics."

Learn more