dataframes

We now have native ODBC support upstream. This has to be exposed in polars similarly to existing IO readers and writers.

Describe the bug
pa.errors.SchemaErrors.failure_cases only returns the first 10 failure_cases

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandera. 0.6.5
(optional) I have confirmed this bug exists on the master branch of pandera.

Note: Please read [this guide](https://matthewrocklin.c

For pipeline stages provided by the pdpipe.basic_stages, supplying conditions to the prec and post keyword arguments may not return the correct error messages.

Example Code

import pandas as pd; import pdpipe as pdp;
df = pd.DataFrame([[1,4],[4,5],[1,11]], [1,2,3], ['a','b'])
pline = pdp.PdPipeline([
  pdp.FreqDrop(2, 'a', prec=pdp.cond.HasAllColumns(['x']))
])
pline.apply(

Currently we don't test (or document) that Eland should work with data streams, we should probably test that everything works properly.

riptable currently only supports changing settings (e.g. number of threads to use for calculations and I/O) by calling functions of the library or setting class-level attributes.

It'd be helpful if the default values for these settings -- at least the most important ones -- could be overridden using environment variables, e.g. how numba supports changing the cache path or number of threads to b

Some unit tests asserting e.g. the length or some other property of the datasets would be nice to have.

As a user, I wish I could access a table's column schema with a column_schemas attribute that is a dictionary of column schemas.

df.ww.column_schemas

This could be useful for helping users understand that they can df.ww.column_schemas[col] instead of df.ww[col].schema better than the columns attribute does.

We should not remove the columns attribute so we don't

Add a few useful date/time types from time (https://hackage.haskell.org/package/time) , e.g.

POSIXTime
Date
etc.

A checklist for where to add things :

prim constructors go in here : https://github.com/ocramz/heidi/blob/master/src/Data/Generics/Encode/Internal/Prim.hs#L25
Heidi instances go here : https://github.com/ocramz/heidi/blob/master/src/Data/Generics

dataframes

Here are 182 public repositories matching this topic...

pola-rs / polars

JuliaData / DataFrames.jl

TileDB-Inc / TileDB

pandera-dev / pandera

rocketlaunchr / dataframe-go

pdpipe / pdpipe

Example Code

polyaxon / datatile

JuliaData / DataFramesMeta.jl

elixir-nx / explorer

elastic / eland

rtosholdings / riptable

aiguofer / gspread-pandas

RumbleDB / rumble

stefmolin / pandas-workshop

DataHaskell / dh-core

zbrookle / dataframe_sql

JuliaAcademy / DataFrames

alteryx / woodwork

hablapps / sparkOptics

Thomas-George-T / Movies-Analytics-in-Spark-and-Scala

JuliaData / DataTables.jl

isarn / isarn-sketches-spark

hackersandslackers / pandas-sqlalchemy-tutorial

zbrookle / sql_to_ibis

JuliaGraphs / GraphDataFrameBridge.jl

dkaslovsky / ElasticBatch

zgbjgg / jun

ocramz / heidi

kmatarese / glide

dlab-berkeley / R-Data-Wrangling

Improve this page

Add this topic to your repo