dataframe

Thank you for reaching out and helping us improve Vaex!

Before you submit a new Issue, please read through the documentation. Also, make sure you search through the Open and Closed Issues - your problem may already be discussed or addressed.

Description
Please provide a clear and concise description of the problem. This should contain all the steps nee

We can reduce friction by figuring out how to load data most efficiently to polars memory.

Is your feature request related to a problem? Please describe.
The current parquet CheckPageRows test relies on POSIX functions to handle the test file and uses a flatten char array for the buffer.

Describe the solution you'd like
We should get rid of such c-style expressions in the test code and refactor it with STL stream (or cudf::io::datasource).

I would like to convert a DataFrame to a JSON object the same way that Pandas does with to_dict().

toJSON() treats rows as elements in an array, and ignores the index labels. But to_dict() uses the index as keys.

Here is an example of what I have in mind:

function to_dict(df) {
    const rows = df.toJSON();
    const entries = df.index.map((e, i) => ({ [e]: rows[i] }));

For example, the data is (3.8,4.5,4.6,4.7,4.9)
while I'm using tech.tablesaw.aggregate.AggregateFunctions.percentile function, the 90th percentile is 4.9, however, if the percentile function supports linear interpolation, the 90th percentile should be 4.82, which is adopted by most other programming languages.

**Which version are you running?
The last version : (from python-dateutil>=2.8.1->pandas->pandas-ta==0.3.14b0)

Is your feature request related to a problem? Please describe.
The result of vidya function ta.vidya(df["Close"],length=7).iloc[-1] not equal with tradingview

Is your feature request related to a problem? Please describe.
Implements classification_report for classification metrics.(https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html)

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
From discussion in apache/arrow-datafusion#2690 (comment)

What about only showing the projection when there is one and ommiting it when there are none.
This could remove the None/Some too:

TableScan a projection=[col1,col2]

vs

Ta

Hi ,

I am using some basic functions from pyjanitor such as - clean_names() , collapse_levels() in one of my code which I want to productionise.
And there are limitations on the size of the production code base.
Currently ,if I just look at the requirements.txt for just "pyjanitor" , its huge .
I don't think I require all the dependencies in my code.
How can I remove the unnecessary ones ?

For pipeline stages provided by the pdpipe.basic_stages, supplying conditions to the prec and post keyword arguments may not return the correct error messages.

Example Code

import pandas as pd; import pdpipe as pdp;
df = pd.DataFrame([[1,4],[4,5],[1,11]], [1,2,3], ['a','b'])
pline = pdp.PdPipeline([
  pdp.FreqDrop(2, 'a', prec=pdp.cond.HasAllColumns(['x']))
])
pline.apply(

Is your feature request related to a problem? Please describe.
The friction to getting the examples up and running is installing the dependencies. A docker container with them already provided would reduce friction for people to get started with Hamilton.

Describe the solution you'd like

A docker container, that has different python virtual environments, that has the dependencies t

dataframe

Here are 595 public repositories matching this topic...

modin-project / modin

vaexio / vaex

pola-rs / polars

haifengl / smile

rapidsai / cudf

javascriptdata / danfojs

databricks / koalas

jtablesaw / tablesaw

adamerose / PandasGUI

twopirllc / pandas-ta

mars-project / mars

ballista-compute / ballista

apache / arrow-datafusion

hosseinmoein / DataFrame

alexhallam / tv

sngyai / Sequoia

microsoft / Mobius

pyjanitor-devs / pyjanitor

RedisLabs / spark-redis

uwdata / arquero

rocketlaunchr / dataframe-go

MrPowers / spark-daria

pdpipe / pdpipe

Example Code

sfu-db / connector-x

shramos / Awesome-Cybersecurity-Datasets

andygrove / datafusion

Squarespace / datasheets

michaelchu / optopsy

dmnfarrell / pandastable

stitchfix / hamilton

Improve this page

Add this topic to your repo