apache-spark

Thank you for submitting an issue. Please refer to our issue policy
for information on what types of issues we address. For help with debugging your code, please refer to Stack Overflow.

Please fill in this template and do not delete it unless you are sure your issue is outs

According to the generated build

The commands to launch are the following :

docker pull andypetrella/spark-notebook:0.7.0-scala-2.11.8-spark-2.1.1-hadoop-2.7.2-with-hive
docker run -p 9001:9001 andypetrella/spark-notebook:0.7.0-scala-2.11.8-spark-2.1.1-hadoop-2.7.2-with-hive

Using that image (and I think it i

1. add docs to describe the model, explain the arguments (as well as how to configure in recipe) and best pratices. A good reference: https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/coxph.html
2. use the same parameter name in MTNetForecaster and AutoTS recipes.
3. in MTNetGridRandomRecipe, past_seq_len actually is conditioned on time_step and long_num,
https://gi

We create multiple jars during our builds to accommodate multiple versions of Apache Spark. In the current approach, the implementation is copied from one version to another and then necessary changes are made.

An ideal approach could create a common directory and extract common classes from duplicate code. Note that even if class/code is exactly the same, you cannot pull out to a common clas

Currently the documentation is in the form of a bunch of markdown files under the docs folder of the repo. It will be great to have a dedicated website for the project to host the documentation and announcement like releases.

@eliasah

Since we already consider #140 I guess we should look at the ML Flow as well. Definitely not now, but maybe when / if it gets the first stable release

https://mlflow.org/

CC @eliasah

The documentation file appears to have been generated with no space between the hashes and the header text. This is causing the headers to not display correctly, and is difficult to read. See below for an example of with and without the space:

##

Mobius API Documentation

###Microsoft.Spark.CSharp.Core.Accumulator</

Because some user has had problems configuring these services could be helpful to make some examples or videos about how to properly setup Optimus in this services.

The "components" returned from ml_pca() are NULL

# example
sc <- spark_connect(master = "local")
iris_tbl <- sdf_copy_to(sc, iris, name = "iris_tbl", overwrite = TRUE)

pca <- iris_tbl %>%
  select(-Species) %>%
  ml_pca()

pca$components
NULL

R session information:

devtools::session_info()
Session info -----------------------------------------------------------

Currently, geospark is based on the jts.STRTree. prestodb/presto#13079 could be a great enhancement regarding memory pressure, i.e. implementing it using Hilbert Packed RTree (Flatbush)

If a cluster launch is interrupted before AWS can even return a list of instances, we hit a part of the code where cluster_instances is not defined. We should protect against that.

Additionally, when Flintrock comes across a broken cluster (e.g. missing tags) left behind by an interrup

While trying to write some tests of sparkle using tasty I found that it doesn't seem to work when bound threads other than the main one are used. The following program fails with:

$ stack --nix exec -- spark-submit --master 'local[1]' sparkle-example-osthreads.jar
16/12/19 10:30:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes

the curl in the book is curl -XGET 'localhost:9200/agile_data_science/airplanes/_search?q=*' which is incorrect.

"airplanes" should not be plural, but singular. Correct line is: curl -XGET 'localhost:9200/agile_data_science/airplane/_search?q=*'

At least, it should be mentioned that scala 2.12 is supported and how to set up mist and project for this version.

apache-spark

Here are 900 public repositories matching this topic...

mlflow / mlflow

lw-lin / CoolplaySpark

spark-notebook / spark-notebook

intel-analytics / analytics-zoo

OryxProject / oryx

dotnet / spark

big-data-europe / docker-spark

GoogleCloudPlatform / spark-on-k8s-operator

databricks / spark-sklearn

lensacom / sparkit-learn

awesome-spark / awesome-spark

japila-books / apache-spark-internals

microsoft / Mobius

Mobius API Documentation

ironmussa / Optimus

sparklyr / sparklyr

miguno / kafka-storm-starter

DataSystemsLab / GeoSpark

san089 / goodreads_etl_pipeline

cerndb / dist-keras

apache-spark-on-k8s / spark

nchammas / flintrock

openscoring / openscoring

lw-lin / streaming-readings

tweag / sparkle

rjurney / Agile_Data_Code_2

jaceklaskowski / spark-structured-streaming-book

miguno / wirbelsturm

infoslack / awesome-kafka

LucaCanali / sparkMeasure

Hydrospheredata / mist

Improve this page

Add this topic to your repo