arrow

Feature Request

Many locales have the bare minimum when it comes to test cases. While I understand it can be tedious and repetitive to write out test case

We now have native ODBC support upstream. This has to be exposed in polars similarly to existing IO readers and writers.

@bdice

We have two similar methods with different names:

structs_column_view::get_sliced_child
structs_column_device_view::sliced_child

We should rename structs_column_view::get_sliced_child to structs_column_view::sliced_child to align with the other method and avoid unnecessary get_ prefixes as is the normal practice in libcudf.

_Originally posted by @bdice in https://github.com/r

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently, DataFusion creates column names by https://github.com/apache/arrow-datafusion/blob/7be0e268a69ffecbf06823c98ca572733dddb29e/datafusion/src/physical_plan/planner.rs#L91.
The approach has two existing problems:

result in potential bugs, such as https://github.com/apache/arrow-

We no longer need to control the number of concurrent kernels, since now we control the number of concurrent tasks

Note sure if it could be interesting but:

When registering a table:

addr: 0.0.0.0:8084
tables:
  - name: "example"
    uri: "/data/"
    option:
      format: "parquet"
      use_memory_table: false

add in options:
glob

pattern: "file_typev1*.parquet"

or regexp

pattern: "\wfile_type\wv1\w*.parquet"

It would allow selecting in uri's with different exte

It would be helpful to have Fletchgen output warnings for unused metadata fields that start with fletcher_. For example, (this happened to me) when someone adds fletchgen_epc to Schema metadata instead of Field metadata.

Problem description

Reading a dataset with eager's read functionality raises a ValueError when providing columns.

Example code (ideally copy-pastable)

import pandas as pd

from tempfile import TemporaryDirectory
from functools import partial
from storefact import get_store_from_url

from kartothek.io.eager import store_dataframes_as_dataset, read_dataset_as_data

Motivation:

Improved compile times (at least by 2x compared to arrow-rs).
Faster Parquet impl
Projects are migrating to arrow2 (including Datafusion and Polars)

arrow

Here are 258 public repositories matching this topic...

apache / arrow

arrow-py / arrow

Feature Request

pola-rs / polars

arrow-kt / arrow

rapidsai / cudf

ballista-compute / ballista

anseki / leader-line

apache / arrow-datafusion

ibis-project / ibis

BlazingDB / blazingsql

roapi / roapi

zagum / Android-ExpandIcon

pierpo / react-archer

andygrove / datafusion

freshOS / Arrow

RandomFractals / vscode-data-preview

antoniocasero / Arrows

faridsabitov / Sketch-Connection-Flow-Arrows

milosmns / actual-number-picker

scikit-hep / awkward-0.x

Chivorns / SmartMaterialSpinner

abs-tudelft / fletcher

JDASoftwareGroup / kartothek

Problem description

Example code (ideally copy-pastable)

oap-project / gazelle_plugin

cda-group / arcon

calm / tooltip

JuliaData / Feather.jl

yeun / open-arrow

arrow-kt / arrow-core

AceFire6 / ordered-arrowverse

Improve this page

Add this topic to your repo