Pinned
Repositories
-
- arrow Public
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication…
- remote-shuffle Public
Spark* shuffle plugin for support shuffling data through a remote Hadoop-compatible file system, as opposed to vanilla Spark's local-disks.
- sql-ds-cache Public archive
Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.
-
- arrow-data-source Public archive
Spark DataSouce plugin for reading files from various formats like Parquet into Arrow compatible columnar vectors.