A generic interface for building custom supervised machine learning algorithms with Spark
Scala
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
src/main comments Sep 27, 2017
LICENSE license and data Sep 25, 2017
README.md comments Sep 27, 2017
pom.xml first commit Sep 19, 2017

README.md

Spark-Opt

Spark-Opt provides flexible abstractions for building machine learning models/algorithms using Apache Spark ML. Specifically, the following are easily customizable:

  • prediction function
  • loss function
  • optimization routine

Being able to easily plug in the custom components above allows users to improve the scale of their algorithms, express a richer set of algorithms than what is currently available in Spark ML, and even improve upon existing Spark ML algorithms.

Build

Build SPARK-2.3.0-SNAPSHOT

This project requires building Spark 2.3+.

git clone https://github.com/apache/spark
cd spark
build/mvn clean install -DskipTests -Dmaven.javadoc.skip=True
cd [this repo]
mvn package

Run example

$SPARK_HOME/bin/spark-submit \
--class com.sethah.spark.sparkopt.examples.LogisticRegressionExample \
target/sparkopt-1.0-SNAPSHOT-jar-with-dependencies.jar \
--trainPath src/main/resources/binary \
--minimizer admm \
--l1Reg 0.05 \
--leReg 0.05