dataframe

vaex.from_arrays(s=['a,b']).s.str.replace(r'(\w+)',r'--\g<1>==',regex=True)

when using capture group in str, it fails, while str_pandas.replace() is correct

Name: vaex
Version: 4.6.0
Summary: Out-of-Core DataFrames to visualize and explore big tabular datasets
Home-page:

We now have native ODBC support upstream. This has to be exposed in polars similarly to existing IO readers and writers.

Is your feature request related to a problem? Please describe.

Based on discussion in rapidsai/cudf#10200 (comment) there are a number of improvements we should make to the exceptions libcudf throws when a CUDA error occurs.

Describe the solution you'd like

Add a cudaError_t member to [cudf::cuda_error](https://github.com/rapidsai/cud

I would like to convert a DataFrame to a JSON object the same way that Pandas does with to_dict().

toJSON() treats rows as elements in an array, and ignores the index labels. But to_dict() uses the index as keys.

Here is an example of what I have in mind:

function to_dict(df) {
    const rows = df.toJSON();
    const entries = df.index.map((e, i) => ({ [e]: rows[i] }));

For example, the data is (3.8,4.5,4.6,4.7,4.9)
while I'm using tech.tablesaw.aggregate.AggregateFunctions.percentile function, the 90th percentile is 4.9, however, if the percentile function supports linear interpolation, the 90th percentile should be 4.82, which is adopted by most other programming languages.

Is your feature request related to a problem? Please describe.
Implements classification_report for classification metrics.(https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html)

When I ran Center of Gravity: cg over 3 months of Bitcoin prices ("20200801" to "20201101"), I got

	Close	cg
count	132481.000000	132472.000000
mean	11378.306788	-5.499988
std	844.355621	0.001991
min	9881.820000	-5.616297
25%	10710.500000	-5.500833
50%	11368.680000	-5.499987
75%	11742.540000	-5.499146
1

TPC-DS has many queries with IN predicates where all elements are constants. It's a low-hanging fruit if we could implement an InSet function for this all constants value case.

While implementing this, we could either use a hashtable or a chain of if-elif-else, depending on the length and the type of the constants array.

Q8:

 WHERE substr(ca_zip, 1, 5) IN (
               '2412

Background

This thread is borne out of the discussion from #968 , in an effort to make documentation more beginner-friendly & more understandable.
One of the subtasks mentioned in that thread was to go through the function docstrings and include a minimal working example to each of the public functions in pyjanitor.

Criteria reiterated here for the benefit of discussion:

It sh

For pipeline stages provided by the pdpipe.basic_stages, supplying conditions to the prec and post keyword arguments may not return the correct error messages.

Example Code

import pandas as pd; import pdpipe as pdp;
df = pd.DataFrame([[1,4],[4,5],[1,11]], [1,2,3], ['a','b'])
pline = pdp.PdPipeline([
  pdp.FreqDrop(2, 'a', prec=pdp.cond.HasAllColumns(['x']))
])
pline.apply(

dataframe

Here are 578 public repositories matching this topic...

modin-project / modin

vaexio / vaex

haifengl / smile

pola-rs / polars

rapidsai / cudf

javascriptdata / danfojs

databricks / koalas

jtablesaw / tablesaw

adamerose / PandasGUI

mars-project / mars

twopirllc / pandas-ta

ballista-compute / ballista

apache / arrow-datafusion

alexhallam / tv

hosseinmoein / DataFrame

sngyai / Sequoia

microsoft / Mobius

pyjanitor-devs / pyjanitor

Background

RedisLabs / spark-redis

rocketlaunchr / dataframe-go

uwdata / arquero

pdpipe / pdpipe

Example Code

MrPowers / spark-daria

shramos / Awesome-Cybersecurity-Datasets

andygrove / datafusion

Squarespace / datasheets

sfu-db / connector-x

michaelchu / optopsy

dmnfarrell / pandastable

techascent / tech.ml.dataset

Improve this page

Add this topic to your repo