parquet
Here are 167 public repositories matching this topic...
-
Updated
Sep 8, 2020 - Jupyter Notebook
Hello everyone,
Recently I tried to set up petastorm on my company's hadoop cluster.
However as the cluster uses Kerberos for authentication using petastorm failed.
I figured out that petastorm relies on pyarrow which actually supports kerberos authentication.
I hacked "petastorm/petastorm/hdfs/namenode.py" line 250
and replaced it with
driver = 'libhdfs'
return pyarrow.hdfs.c-
Updated
Jul 29, 2020 - JavaScript
Currently, there isn't a way to get the table properties in the SparkOrcWriter via the WriterFactory.
-
Updated
Sep 9, 2020 - Python
-
Updated
May 20, 2020 - Python
Over time we've had some things leak into the diff methods that make it more cumbersome to use BigDiffy via code instead of CLI.
For example diffAvro here https://github.com/spotify/ratatool/blob/master/ratatool-diffy/src/main/scala/com/spotify/ratatool/diffy/BigDiffy.scala#L284
User has to manually pass in schema otherwise we they receive a non-informative error regarding null schema, add
-
Updated
Aug 14, 2020 - C#
-
Updated
Aug 8, 2020 - Python
-
Updated
Aug 27, 2020 - JavaScript
-
Updated
Feb 1, 2019 - TypeScript
-
Updated
Aug 5, 2020 - C++
Something like:
from kartothek.core.dataset import DatasetMetadata
from kartothek.core.factory import DatasetFactory
from kartothek.io_components.metapartition import SINGLE_TABLE
def get_pyarrow_schema(factory: DatasetFactory, table: str = SINGLE_TABLE):
dm = DatasetMetadata(uuid=factory.dataset_uuid).load_from_store(
uuid=factory.dataset_uuid, store=factory.storeWhen opening a parquet file, ParquetViewer first launches a popup "Select fields to load", where you either can confirm to load all fields, or select the fields you want.
In all use cases relevant for me, I want to display all fields. Hence I'm wondering if it would be possible to skip this popup all together? It's just inconvenient to always confirm the "All fields...", before you see any data
-
Updated
Mar 5, 2020 - Scala
-
Updated
Sep 7, 2020 - Scala
-
Updated
Sep 9, 2020 - Python
-
Updated
May 26, 2020 - Go
-
Updated
Jul 1, 2020 - Scala
Improve this page
Add a description, image, and links to the parquet topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the parquet topic, visit your repo's landing page and select "manage topics."
This will be a breaking change so needs to be done with a major release.
Currently OperationDetail and OperationField are inner classes within OperationServiceV2. These should be in their own outer classes for ease of serialisation and consistency.