big-data

Now insert and query share the resource ( Max Process Count control) 。 When the query with high TPS，the insert will get error (“error: too many process”). I think separator the resource for Insert and Query will makes sense. Ensure enough resource for insert。It looks like Use Yarn， Insert and Query use the different resource quota。
Or the simple way , Can we set Ratio for Insert and

Problem: the approximate method can still be slow for many trees
catboost version: master
Operating System: ubuntu 18.04
CPU: i9
GPU: RTX2080

Would be good to be able to specify how many trees to use for shapley. The model.predict and prediction_type versions allow this. lgbm/xgb allow this.

change all x-moloch-cookies to x-arkime-cookies in tests and middleware 😋

There is no technical difficulty to support includeValue option, looks like we are just missing it on the API level.

See SO question

... to make it easier to read Vespa documentation on an e-reader / offline

Vespa documentation is generated using Jekyll from .md and .html files, look into options for generating the artifact as part of site generation (there might be plugins we can use here)

Hi, if my spark app is using 2 storage type, both S3 and Azure Data Lake Store Gen2, could I put spark.delta.logStore.class=org.apache.spark.sql.delta.storage.AzureLogStore, org.apache.spark.sql.delta.storage.S3SingleDriverLogStore

Thanks in advance

and ensure its linked in the list

Use case:
Right now one can only use date_trunc() to easily define time buckets. date_trunc() only supports predefine time intervals like 1 minute, 1 hour, etc. . In time-series use cases it is often necessary to define different time bucket sizes like e.g. '5 minutes' or '20 minutes'

a workaround for this is the - error prone - integer division on the timestamp e.g.

big-data

Here are 2,408 public repositories matching this topic...

binhnguyennus / awesome-scalability

apache / spark

donnemartin / data-science-ipython-notebooks

apache / flink

ClickHouse / ClickHouse

amark / gun

apache / attic-predictionio

prestodb / presto

yahoo / CMAK

heibaiying / BigData-Notes

andkret / Cookbook

apache / storm

cython / cython

catboost / catboost

h2oai / h2o-3

apache / zeppelin

pachyderm / pachyderm

apache / couchdb

apache / beam

arkime / arkime

tschellenbach / Stream-Framework

hazelcast / hazelcast

apache / ignite

intel-analytics / BigDL

apache / hive

vespa-engine / vespa

delta-io / delta

trinodb / trino

crate / crate

linkedin / datahub

Improve this page

Add this topic to your repo