Skip to content
#

arrow

Here are 265 public repositories matching this topic...

ttnghia
ttnghia commented Jun 15, 2022

The API lists::drop_list_duplicates operates on a pair of keys-values input lists columns with duplicate_keep_option. This is Spark's specific feature request. Now we have lists::distinct which purely extracts distinct list elements from the input lists column. This API is more standard and is used in both Python and Spark.

Therefore, we should remove lists::drop_list_duplicates complet

feature request good first issue libcudf helps: Spark
andygrove
andygrove commented Jul 11, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
In the SO post https://stackoverflow.com/questions/72888852/extract-year-month-day-from-unix-timestamp-column-in-rust-datafusion-dataframe/72941102#72941102 the user needed help with translating a unixtime to a timestamp. The solution now is to use a cast and this is verbose and not obvious.

enhancement good first issue
atomotic
atomotic commented May 31, 2022

the pre-built binary is not supporting database?

roapi -t "vocabs=sqlite:///data/vocabulary.sqlite"
[2022-05-31T06:48:11Z INFO  roapi::context] loading `uri(sqlite:///data/vocabulary.sqlite)` as table `vocabs`
Error: Database error: Enable 'database' feature flag to support this

would you explain in README how to enable it?

I'm new to rust, after some searching i got it workin

bug good first issue help wanted
fletcher
NeroCorleone
NeroCorleone commented Aug 11, 2020

Problem description

Reading a dataset with eager's read functionality raises a ValueError when providing columns.

Example code (ideally copy-pastable)

import pandas as pd

from tempfile import TemporaryDirectory
from functools import partial
from storefact import get_store_from_url

from kartothek.io.eager import store_dataframes_as_dataset, read_dataset_as_data
good first issue usability
andygrove
andygrove commented May 10, 2022

Describe the bug
We have a hard-coded distinct = false parameter in ballista/rust/core/src/serde/physical_plan/mod.rs.

Ok(create_aggregate_expr(
    &aggr_function.into(),
    false, // <-- hard-coded "distinct"
    input_phy_expr.as_slice(),
    &physical_schema,
    name.to_string(),
)?)

To Reproduce
Try running a COUNT(DISTINCT expr) in Ballista

**E

bug good first issue

Improve this page

Add a description, image, and links to the arrow topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the arrow topic, visit your repo's landing page and select "manage topics."

Learn more