Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

GCS Tools

Build Status GitHub license

Raison d'être:

Light weight wrapper that adds Google Cloud Storage (GCS) support to common Hadoop tools, including avro-tools, parquet-tools and proto-tools for Scio's Protobuf in Avro file, so that they can be used from regular workstations or laptops, outside of a Google Compute Engine (GCE) instance.

It uses your existing OAuth2 credentials and allows authentication via a browser.

Usage:

You can install the tools via our Homebrew tap on Mac.

brew tap spotify/public
brew install gcs-avro-tools gcs-parquet-tools gcs-proto-tools
avro-tools tojson <GCS_PATH>
parquet-tools cat <GCS_PATH>
proto-tools tojson <GCS_PATH>

Or build them yourself.

sbt assembly
java -jar avro-tools/target/scala-2.12/avro-tools-1.8.2.jar tojson <GCS_PATH>
java -jar parquet-tools/target/scala-2.12/parquet-tools-1.11.0.jar cat <GCS_PATH>
java -jar proto-tools/target/scala-2.12/proto-tools-3.12.2.jar cat <GCS_PATH>

How it works:

To make avro-tools and parquet-tools work with GCS we need:

GCS connector won't pick up your local gcloud configuration, and instead expects settings in core-site.xml.

You can’t perform that action at this time.