dataframe

vaex.from_arrays(s=['a,b']).s.str.replace(r'(\w+)',r'--\g<1>==',regex=True)

when using capture group in str, it fails, while str_pandas.replace() is correct

Name: vaex
Version: 4.6.0
Summary: Out-of-Core DataFrames to visualize and explore big tabular datasets
Home-page:

Versions

Python 3.9 / Polars 0.10.27 / Windows 10

Describe your bug / reproduce behaviour

>>> # create trivial float series and observe the resulting repr
>>> import polars as pl
>>> pl.from_records( data=[1.0, 0.0, -1.0], columns=['test'] )

shape: (3, 1)
┌──────┐
│ test │
│ ---  │
│ f64  │
╞══════╡
│ 1    │   # <- integer repr
├╌╌╌╌╌╌┤
│ 0.0  │   # <- fl

Describe the bug

Failed to execute Series.drop_duplicates.

In [75]: a = md.DataFrame(np.random.rand(10, 2), columns=['a', 'b'], chunk_size=2)                  

In [76]: a['a'].drop_duplicates().execute()

pandas-ta: 0.3.14b0

Running df.ta.strategy() or more specifically df.ta.jma() on a simple dataframe fails with

Error
Traceback (most recent call last):
  File "/Users/andrei/Projects/BE/breakingequity/breakingequity-backtest-single-day/tests/test_ohlcdata.py", line 60, in test_jma
    df.ta.jma()
  File "/Users/andrei/.local/share/virtualenvs/breakingequity-backtest-single-day

It would be really useful if there was a method that could insert a column into an existing Dataframe between two existing columns. I know about .addColumn, but that seems to place the new column at the end of the Dataframe.

For example:

df.print()

A | B 
======
7 | 5
3 | 6

df.insert({ "afterColumn": "A", "newColumnName": "C", "data": [4,1], inplace: true })
df.print()

@hntd187

Background

@hntd187 fixed apache/arrow-datafusion#1361 via apache/arrow-datafusion#1378 but when I was reviewing the code, I found several other places that project RecordBatchs and Schemas that may also have the same subtle issues about losing the metadata. I am not sure of any bugs related to this yet but I fear they are lurking

Th

Example:
In the image below the word starships should begin on a new line to avoid being split.

Terminal width is provided to determine how many columns to print. The terminal width or the total width of the column headers may be used to wrap the text in the footer.

Hi ,

I am using some basic functions from pyjanitor such as - clean_names() , collapse_levels() in one of my code which I want to productionise.
And there are limitations on the size of the production code base.
Currently ,if I just look at the requirements.txt for just "pyjanitor" , its huge .
I don't think I require all the dependencies in my code.
How can I remove the unnecessary ones ?

For pipeline stages provided by the pdpipe.basic_stages, supplying conditions to the prec and post keyword arguments may not return the correct error messages.

Example Code

import pandas as pd; import pdpipe as pdp;
df = pd.DataFrame([[1,4],[4,5],[1,11]], [1,2,3], ['a','b'])
pline = pdp.PdPipeline([
  pdp.FreqDrop(2, 'a', prec=pdp.cond.HasAllColumns(['x']))
])
pline.apply(

dataframe

Here are 550 public repositories matching this topic...

vaexio / vaex

modin-project / modin

haifengl / smile

pola-rs / polars

Versions

Describe your bug / reproduce behaviour

databricks / koalas

jtablesaw / tablesaw

adamerose / PandasGUI

mars-project / mars

ballista-compute / ballista

twopirllc / pandas-ta

javascriptdata / danfojs

apache / arrow-datafusion

alexhallam / tv

hosseinmoein / DataFrame

microsoft / Mobius

sngyai / Sequoia

RedisLabs / spark-redis

pyjanitor-devs / pyjanitor

rocketlaunchr / dataframe-go

uwdata / arquero

MrPowers / spark-daria

pdpipe / pdpipe

Example Code

andygrove / datafusion

Squarespace / datasheets

shramos / Awesome-Cybersecurity-Datasets

sfu-db / connector-x

michaelchu / optopsy

dmnfarrell / pandastable

techascent / tech.ml.dataset

Gmousse / dataframe-js

Improve this page

Add this topic to your repo