parquet
Here are 250 public repositories matching this topic...
-
Updated
May 27, 2022 - Java
Note sure if it could be interesting but:
When registering a table:
addr: 0.0.0.0:8084
tables:
- name: "example"
uri: "/data/"
option:
format: "parquet"
use_memory_table: false
add in options:
glob
pattern: "file_typev1*.parquet"
or regexp
pattern: "\wfile_type\wv1\w*.parquet"
It would allow selecting in uri's with different exte
-
Updated
May 27, 2022 - Jupyter Notebook
-
Updated
May 26, 2022 - Python
Currently, there isn't a way to get the table properties in the SparkOrcWriter via the WriterFactory.
-
Updated
May 29, 2021 - JavaScript
RDF tests failing
I'm submitting a
- [x ] bug report.
Current Behaviour:
After #249
Trying to run tests with pytest tests/rdf_tests/test_rdf_basic.py -k test_rdf_runner -s, you get a report file with all the tests run.
Some tests return errors, for example:
{
"Basic - Term 7": {
"input": "basic/data-4.ttl",
"query": "basic/term-7.rq",
"error": "Expected {Sele
-
Updated
Apr 9, 2022 - Python
Over time we've had some things leak into the diff methods that make it more cumbersome to use BigDiffy via code instead of CLI.
For example diffAvro here https://github.com/spotify/ratatool/blob/master/ratatool-diffy/src/main/scala/com/spotify/ratatool/diffy/BigDiffy.scala#L284
User has to manually pass in schema otherwise we they receive a non-informative error regarding null schema, add
-
Updated
Mar 3, 2022 - C#
-
Updated
May 11, 2022 - C#
-
Updated
Apr 25, 2022 - JavaScript
-
Updated
Feb 8, 2021 - Python
-
Updated
May 4, 2022 - Scala
-
Updated
Feb 1, 2019 - TypeScript
-
Updated
May 18, 2021 - C++
When an Item in the queue is added with incorrect type for the corresponding Data Mapper, the Job fails during planning, without any information about which datamapper/queue item id is involved.
Let's take a Data Mapper with a identifier of type int for instance. If we add foo to the deletion queue, the find will fail with a log like this:
{
"EventData": {
"Error": "ValueError
Improve this page
Add a description, image, and links to the parquet topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the parquet topic, visit your repo's landing page and select "manage topics."
Hello everyone,
Recently I tried to set up petastorm on my company's hadoop cluster.
However as the cluster uses Kerberos for authentication using petastorm failed.
I figured out that petastorm relies on pyarrow which actually supports kerberos authentication.
I hacked "petastorm/petastorm/hdfs/namenode.py" line 250
and replaced it with