Optimized Analytics Package for Spark Platform (OAP)

gluten Public

Scala 379 Apache-2.0 112 44 27 Updated Jan 13, 2023
velox Public
A new C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.

C++ 5 Apache-2.0 456 0 8 Updated Jan 13, 2023
cloudtik Public
Cloud Scale Platform for Distributed Analytics and AI

Python 19 Apache-2.0 8 1 2 Updated Jan 13, 2023
gazelle_plugin Public
Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.

Scala 235 Apache-2.0 73 190 24 Updated Jan 13, 2023
arrow Public
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication…

C++ 4 Apache-2.0 2,688 0 21 Updated Jan 13, 2023
raydp Public
RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.

Python 196 Apache-2.0 46 43 2 Updated Jan 9, 2023
remote-shuffle Public
Spark* shuffle plugin for support shuffling data through a remote Hadoop-compatible file system, as opposed to vanilla Spark's local-disks.

Scala 15 Apache-2.0 7 5 0 Updated Jan 5, 2023
sql-ds-cache Public archive
Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.

Scala 37 Apache-2.0 25 15 4 Updated Jan 3, 2023
libhdfs3-downstream Public archive
a native c/c++ hdfs client (downstream fork from apache-hawq)

C++ 0 Apache-2.0 55 0 0 Updated Jan 3, 2023
arrow-data-source Public archive
Spark DataSouce plugin for reading files from various formats like Parquet into Arrow compatible columnar vectors.

Scala 5 Apache-2.0 10 3 0 Updated Jan 4, 2023

View all repositories

Optimized Analytics Package for Spark Platform (OAP)

Pinned

Repositories

People

Top languages

Most used topics