Skip to content
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication…
C++ Java Python Rust TypeScript Ruby Other
Branch: master
Clone or download

Latest commit

jorisvandenbossche and bkietz ARROW-8690: [Python] Clean-up dataset+parquet tests now order is dete…
…rminstic

Closes #7097 from jorisvandenbossche/ARROW-8690

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
Latest commit d13e8f3 May 4, 2020

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github ARROW-8573: [Rust] Upgrade Rust to 1.44 nightly Apr 29, 2020
c_glib ARROW-8509: [GLib] Add low level record batch read/write functions May 3, 2020
ci ARROW-8648: [Rust] Optimize Rust CI Workflows Apr 30, 2020
cpp ARROW-8689: [C++] Fix linking S3FS benchmarks May 4, 2020
csharp ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values… May 2, 2020
dev ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments May 3, 2020
docs ARROW-8663: [Documentation] Small correction to building.rst May 1, 2020
format ARROW-300: [Format] Proposal for "trivial" IPC body buffer compressio… May 1, 2020
go ARROW-8098: [Go] Avoid unsafe unsafe.Pointer usage Apr 3, 2020
java ARROW-8687: [Java] Remove references to io.netty.buffer.ArrowBuf May 4, 2020
js [Release] Update versions for 0.18.0-SNAPSHOT Apr 16, 2020
matlab [Release] Update versions for 0.18.0-SNAPSHOT Apr 16, 2020
python ARROW-8690: [Python] Clean-up dataset+parquet tests now order is dete… May 4, 2020
r ARROW-8619: [C++] Use distinct enum values for MonthInterval, DayTime… May 1, 2020
ruby ARROW-8682: [Ruby][Parquet] Add support for column level compression May 4, 2020
rust ARROW-8659: [Rust] ListBuilder allocate with_capacity May 2, 2020
testing @ 3772a1b ARROW-8441: [C++] Check invalid input in ipc::MessageDecoder Apr 14, 2020
.asf.yaml ARROW-8520: [Developer] Use .asf.yaml to direct GitHub notifications … Apr 20, 2020
.clang-format ARROW-3313: [R] Move .clang-format to top level. Add r/lint.sh script… Sep 26, 2018
.clang-tidy ARROW-7523: [Developer] Relax clang-tidy check Jan 10, 2020
.clang-tidy-ignore ARROW-3313: [R] Move .clang-format to top level. Add r/lint.sh script… Sep 26, 2018
.dir-locals.el ARROW-7994: [CI][C++][GLib][Ruby] Move MinGW CI to GitHub Actions fro… Mar 12, 2020
.dockerignore ARROW-8064: [Dev] Implement Comment bot via Github actions Mar 12, 2020
.env ARROW-8573: [Rust] Upgrade Rust to 1.44 nightly Apr 29, 2020
.gitattributes ARROW-5488: [R] Workaround when C++ lib not available Jun 12, 2019
.gitignore ARROW-7949: [Git] Ignore macOS specific file: 'Brewfile.lock.json' Feb 27, 2020
.gitmodules ARROW-4459: [Testing] Add arrow-testing repo as submodule Feb 8, 2019
.hadolint.yaml ARROW-6214: [R] Add R sanitizer docker image Sep 19, 2019
.pre-commit-config.yaml ARROW-4909: [CI] Use hadolint to lint Dockerfiles Mar 18, 2019
.readthedocs.yml ARROW-1142: [C++] Port over compression toolchain and interfaces from… Jun 23, 2017
CHANGELOG.md [Release] Update CHANGELOG.md for 0.17.0 Apr 16, 2020
CODE_OF_CONDUCT.md ARROW-4006: Add CODE_OF_CONDUCT.md Dec 15, 2018
CONTRIBUTING.md ARROW-7489: [CI] Fix typos Jan 3, 2020
LICENSE.txt ARROW-8064: [Dev] Implement Comment bot via Github actions Mar 12, 2020
Makefile.docker ARROW-5265: [Python][CI] Add integration test with kartothek Mar 12, 2020
NOTICE.txt ARROW-5934: [Python] Bundle arrow's LICENSE with the wheels Jul 15, 2019
README.md ARROW-7712: [CI] [Crossbow] Delete fuzzit jobs Feb 7, 2020
appveyor.yml ARROW-8571: [C++] Switch AppVeyor image to VS 2017 Apr 23, 2020
cmake-format.py ARROW-4363: [CI] [C++] Add CMake format checks Feb 11, 2019
docker-compose.yml ARROW-8549: [R] Assorted post-0.17 release cleanups Apr 22, 2020
header ARROW-259: Use Flatbuffer Field type instead of MaterializedField Aug 18, 2016
run-cmake-format.py ARROW-6841: [C++] Migrate to LLVM 8 Mar 18, 2020

README.md

Apache Arrow

Build Status Coverage Status Fuzzing Status License Twitter Follow

Powering In-Memory Analytics

Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to process and move data fast.

Major components of the project include:

Arrow is an Apache Software Foundation project. Learn more at arrow.apache.org.

What's in the Arrow libraries?

The reference Arrow libraries contain a number of distinct software components:

  • Columnar vector and table-like containers (similar to data frames) supporting flat or nested types
  • Fast, language agnostic metadata messaging layer (using Google's Flatbuffers library)
  • Reference-counted off-heap buffer memory management, for zero-copy memory sharing and handling memory-mapped files
  • IO interfaces to local and remote filesystems
  • Self-describing binary wire formats (streaming and batch/file-like) for remote procedure calls (RPC) and interprocess communication (IPC)
  • Integration tests for verifying binary compatibility between the implementations (e.g. sending data from Java to C++)
  • Conversions to and from other in-memory data structures

How to Contribute

Please read our latest project contribution guide.

Getting involved

Even if you do not plan to contribute to Apache Arrow itself or Arrow integrations in other projects, we'd be happy to have you involved:

You can’t perform that action at this time.