dataframe

Description
Trying to convert an "object" column to string fails.

Example code:

import vaex
import numpy as np

if __name__ == "__main__":
    arr = np.array([123, "test", None], dtype=object)
    df = vaex.from_arrays(test=arr)
    df['test'] = df['test'].astype('str')
    print(df.head())

Exception:

TypeError: to_string(): incompatible function argu

Are you using Python or Rust?

Python 3.8

What version of polars are you using?

0.10.23

What operating system are you using polars on?

Linux, Ubuntu 20.04

Describe your bug.

Unpickling a list-type series results in an error, works fine for non-list series.

What are the steps to reproduce the behavior?

import pickle
import polars as pl

Describe the bug

Failed to execute Series.drop_duplicates.

In [75]: a = md.DataFrame(np.random.rand(10, 2), columns=['a', 'b'], chunk_size=2)                  

In [76]: a['a'].drop_duplicates().execute()

Can random walk index indicator be added to the panda-ta library ? thanks

References:
Technical Indicators
trading sim
linnsoft
fmlabs

It would be really useful if there was a method that could insert a column into an existing Dataframe between two existing columns. I know about .addColumn, but that seems to place the new column at the end of the Dataframe.

For example:

df.print()

A | B 
======
7 | 5
3 | 6

df.insert({ "afterColumn": "A", "newColumnName": "C", "data": [4,1], inplace: true })
df.print()

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
We have had performance regressions in the TPCH plans due to plan changes (e.g. apache/arrow-datafusion#1367)

In order to avoid this situation in the future, we would like to have the explain plans committed as tests (so we can evaluate any changes to those plans

Example:
In the image below the word starships should begin on a new line to avoid being split.

Terminal width is provided to determine how many columns to print. The terminal width or the total width of the column headers may be used to wrap the text in the footer.

Hi ,

I am using some basic functions from pyjanitor such as - clean_names() , collapse_levels() in one of my code which I want to productionise.
And there are limitations on the size of the production code base.
Currently ,if I just look at the requirements.txt for just "pyjanitor" , its huge .
I don't think I require all the dependencies in my code.
How can I remove the unnecessary ones ?

If I'm using a pdp.cond.Condition object, for example pdp.cond.HasAllColumns(['a', 'b']), as a precondition (or a post-condition) in my pipeline stage and it fails, I expect an informative exception message, such as "Not all required columns ['a', 'b'] found in input dataframe" (or better yet, "Required columns ['a'] not found in input dataframe", in the case only "a" is missing).

Thi

dataframe

Here are 541 public repositories matching this topic...

vaexio / vaex

modin-project / modin

haifengl / smile

pola-rs / polars

Are you using Python or Rust?

What version of polars are you using?

What operating system are you using polars on?

Describe your bug.

What are the steps to reproduce the behavior?

databricks / koalas

jtablesaw / tablesaw

adamerose / PandasGUI

mars-project / mars

ballista-compute / ballista

twopirllc / pandas-ta

opensource9ja / danfojs

apache / arrow-datafusion

alexhallam / tv

hosseinmoein / DataFrame

microsoft / Mobius

sngyai / Sequoia

RedisLabs / spark-redis

pyjanitor-devs / pyjanitor

rocketlaunchr / dataframe-go

uwdata / arquero

MrPowers / spark-daria

pdpipe / pdpipe

andygrove / datafusion

Squarespace / datasheets

shramos / Awesome-Cybersecurity-Datasets

sfu-db / connector-x

michaelchu / optopsy

dmnfarrell / pandastable

techascent / tech.ml.dataset

Gmousse / dataframe-js

Improve this page

Add this topic to your repo