parquet
Here are 228 public repositories matching this topic...
Hello everyone,
Recently I tried to set up petastorm on my company's hadoop cluster.
However as the cluster uses Kerberos for authentication using petastorm failed.
I figured out that petastorm relies on pyarrow which actually supports kerberos authentication.
I hacked "petastorm/petastorm/hdfs/namenode.py" line 250
and replaced it with
driver = 'libhdfs'
return pyarrow.hdfs.c-
Updated
Oct 25, 2021 - Jupyter Notebook
Note sure if it could be interesting but:
When registering a table:
addr: 0.0.0.0:8084
tables:
- name: "example"
uri: "/data/"
option:
format: "parquet"
use_memory_table: false
add in options:
glob
pattern: "file_typev1*.parquet"
or regexp
pattern: "\wfile_type\wv1\w*.parquet"
It would allow selecting in uri's with different exte
-
Updated
Oct 1, 2021 - Python
Currently, there isn't a way to get the table properties in the SparkOrcWriter via the WriterFactory.
-
Updated
May 29, 2021 - JavaScript
-
Updated
Oct 19, 2021 - Python
-
Updated
Jun 11, 2021 - C#
Over time we've had some things leak into the diff methods that make it more cumbersome to use BigDiffy via code instead of CLI.
For example diffAvro here https://github.com/spotify/ratatool/blob/master/ratatool-diffy/src/main/scala/com/spotify/ratatool/diffy/BigDiffy.scala#L284
User has to manually pass in schema otherwise we they receive a non-informative error regarding null schema, add
Let's show some examples of integration with kgextension
https://kgextension.readthedocs.io/en/latest/
Could be another notebook added to the tutorial.
Where it fits, we might also integrate as a dependency?
-
Updated
May 30, 2021 - JavaScript
-
Updated
Sep 29, 2021 - C#
-
Updated
Feb 8, 2021 - Python
-
Updated
Feb 1, 2019 - TypeScript
-
Updated
May 18, 2021 - C++
Problem description
Reading a dataset with eager's read functionality raises a ValueError when providing columns.
Example code (ideally copy-pastable)
import pandas as pd
from tempfile import TemporaryDirectory
from functools import partial
from storefact import get_store_from_url
from kartothek.io.eager import store_dataframes_as_dataset, read_dataset_as_data-
Updated
Oct 11, 2021 - Go
-
Updated
Oct 25, 2021 - Scala
Improve this page
Add a description, image, and links to the parquet topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the parquet topic, visit your repo's landing page and select "manage topics."
Append
classto allHashCodeBuildersin Gaffer for the below issue to minimise hash collisions.