arrow

Feature Request

Many locales have the bare minimum when it comes to test cases. While I understand it can be tedious and repetitive to write out test case

Is your feature request related to a problem? Please describe.
Hi,

While porting some code from Pandas to cuDF, I have noticed that cuDF series do not support unstack method.
As an additional request, It would be great if fill_values could be supported in both cudf.DataFrame.unstack and cudf.Series.unstack methods. Thanks!

Describe the solution you'd like
To have that meth

Edited request

Can polars allow mixing string and type selection using col(["x", pl.Int64])

See suggestion here.

Initial request

Any thoughts on an .include() method that is the opposite of .exclude()?

That way these would both work:

df.select(col('x').include(pl.Int64))

df.se

We no longer need to control the number of concurrent kernels, since now we control the number of concurrent tasks

Describe the bug

I have a data set created by Apache Spark and I tried to query it from the DataFusion CLI. It failed, saying that a parquet file was corrupt.

 CREATE EXTERNAL TABLE store_sales STORED AS PARQUET LOCATION 'store_sales.dat';
0 rows in set. Query took 0.002 seconds.
❯ select count(*) from store_sales;
Parquet reader thread terminated due to error: ParquetError(Gener

Note sure if it could be interesting but:

When registering a table:

addr: 0.0.0.0:8084
tables:
  - name: "example"
    uri: "/data/"
    option:
      format: "parquet"
      use_memory_table: false

add in options:
glob

pattern: "file_typev1*.parquet"

or regexp

pattern: "\wfile_type\wv1\w*.parquet"

It would allow selecting in uri's with different exte

It would be helpful to have Fletchgen output warnings for unused metadata fields that start with fletcher_. For example, (this happened to me) when someone adds fletchgen_epc to Schema metadata instead of Field metadata.

Problem description

Reading a dataset with eager's read functionality raises a ValueError when providing columns.

Example code (ideally copy-pastable)

import pandas as pd

from tempfile import TemporaryDirectory
from functools import partial
from storefact import get_store_from_url

from kartothek.io.eager import store_dataframes_as_dataset, read_dataset_as_data

An Operator that both filters and maps.

Akin to Rust's own FilterMap but on a Stream rather than Iterator.

let strings = ["1", "two", "NaN", "four", "5"];
let mut app = Application::default()
  .iterator(strings, |conf| {
     conf.set_arcon_time(ArconTime::Process);
  })
  .filter_map(|s| s.parse().ok())
  .b

arrow

Here are 251 public repositories matching this topic...

apache / arrow

arrow-py / arrow

Feature Request

arrow-kt / arrow

rapidsai / cudf

pola-rs / polars

ballista-compute / ballista

anseki / leader-line

BlazingDB / blazingsql

apache / arrow-datafusion

roapi / roapi

zagum / Android-ExpandIcon

pierpo / react-archer

andygrove / datafusion

freshOS / Arrow

RandomFractals / vscode-data-preview

antoniocasero / Arrows

faridsabitov / Sketch-Connection-Flow-Arrows

milosmns / actual-number-picker

scikit-hep / awkward-0.x

Chivorns / SmartMaterialSpinner

abs-tudelft / fletcher

JDASoftwareGroup / kartothek

Problem description

Example code (ideally copy-pastable)

calm / tooltip

oap-project / gazelle_plugin

cda-group / arcon

JuliaData / Feather.jl

jorgecarleitao / parquet2

PacificBiosciences / GenomicConsensus

yeun / open-arrow

arrow-kt / arrow-core

Improve this page

Add this topic to your repo