dataframe

Hello,
Considering your amazing efficiency on pandas, numpy, and more, it would seem to make sense for your module to work with even bigger data, such as Audio (for example .mp3 and .wav). This is something that would help a lot considering the nature audio (ie. where one of the lowest and most common sampling rates is still 44,100 samples/sec). For a use case, I would consider vaex.open('Hu

Describe your feature request

Support projection (reading only specific columns) when reading an IPC file in lazy mode:

https://github.com/pola-rs/polars/blob/1a6db2919ef312e3da959c5f74da09e6cd99ebb0/polars/polars-io/src/ipc.rs

Support for projection in lazy mode can be adapted from:
https://github.com/pola-rs/polars/blob/master/polars/polars-io/src/parquet.rs

Specifying colum

Describe the bug

Failed to execute Series.drop_duplicates.

In [75]: a = md.DataFrame(np.random.rand(10, 2), columns=['a', 'b'], chunk_size=2)                  

In [76]: a['a'].drop_duplicates().execute()

Version: 0.3.14b0

Please add overlap: Hilbert Transform - Instantaneous Trendline

Indexes in DataFrame are not saved when exporting to csv using to_csv.

const df = new DataFrame(input, { index: ['A', 'B', 'C'] } );
df.to_csv(filename);

Example:
In the image below the word starships should begin on a new line to avoid being split.

Terminal width is provided to determine how many columns to print. The terminal width or the total width of the column headers may be used to wrap the text in the footer.

Hi ,

I am using some basic functions from pyjanitor such as - clean_names() , collapse_levels() in one of my code which I want to productionise.
And there are limitations on the size of the production code base.
Currently ,if I just look at the requirements.txt for just "pyjanitor" , its huge .
I don't think I require all the dependencies in my code.
How can I remove the unnecessary ones ?

Thanks for this library, it made my pandas workflow clean & easy to maintain.

Feature Request:
pd.merge is a common operation when dealing with 2 or more data frames & pipeline stage for this is missing.
Currently, I am implementing a custom pipeline stage pdp.Merge from PdPipelineStage & would like to contribute.

dataframe

Here are 535 public repositories matching this topic...

vaexio / vaex

modin-project / modin

haifengl / smile

databricks / koalas

pola-rs / polars

Describe your feature request

jtablesaw / tablesaw

adamerose / PandasGUI

mars-project / mars

ballista-compute / ballista

twopirllc / pandas-ta

opensource9ja / danfojs

alexhallam / tv

hosseinmoein / DataFrame

microsoft / Mobius

sngyai / Sequoia

RedisLabs / spark-redis

pyjanitor-devs / pyjanitor

rocketlaunchr / dataframe-go

MrPowers / spark-daria

pdpipe / pdpipe

uwdata / arquero

andygrove / datafusion

Squarespace / datasheets

shramos / Awesome-Cybersecurity-Datasets

michaelchu / optopsy

sfu-db / connector-x

dmnfarrell / pandastable

Gmousse / dataframe-js

techascent / tech.ml.dataset

ranaroussi / pystore

Improve this page

Add this topic to your repo