Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
dev
Jun 8, 2020
mr
Aug 3, 2020
Jun 2, 2020
dev
May 29, 2020
Jan 19, 2020
Jan 19, 2020
Jan 19, 2020
Feb 5, 2020

README.md

Introduction

ES-Fastloader

The ES-Fastloader uses the fault tolerance and parallelism of Hadoop and builds individual ElasticSearch shards in multiple reducer nodes, then transfers shards to ElasticSearch cluster for serving. The loader will create a Hadoop job to read data from data files in HDFS, repartitions it on a per-node basis, and finally writes the generated indices to ES shards. In DiDi we have been using ES-Fastloader to create large-scale ElasticSearch indices from TB/PB level sequence files in Hive.

Features

  • Supports batch construction of ES indexes, which can quickly process dozens of terabytes of data in 1-2 hours, and solve the low-efficiency problem when building massive ES index files.
  • Support the horizontal expansion of computing power, and facilitate the expansion. By increasing the machine resources, you can further increase the index construction speed and the amount of data processed.

Requirements

  • JDK: 8 or greater
  • ElasticSearch: 6.6.X or greater

Developer guide

Contributing

Welcome to contribute by creating issues or sending pull requests. See Contributing Guide for guidelines.

Who is using ES-Fastloader?

滴滴出行

License

ES-Fastloader is licensed under the Apache License 2.0. See the LICENSE file.

Contact us

微信交流群

About

Quickly build large-scale ElasticSearch indices by using the fault tolerance and parallelism of Hadoop

Resources

License

Releases

No releases published

Packages

No packages published
You can’t perform that action at this time.