big-data
Here are 1,888 public repositories matching this topic...
-
Updated
Feb 19, 2020
I was going though the existing enhancement issues again and though it'd be nice to collect ideas for spaCy plugins and related projects. There are always people in the community who are looking for new things to build, so here's some inspiration
If you have questions about the projects I suggested,
There's no published benchmark for IOPS on S3 storage
Would it be possible to post this alongside the other benchmarks?
S3 storage would be super cheap way to get started because it's serverless (thus more folks would potentially use gun.js)
Thank you for the useful service. I would like to see more Auth/ABAC for startup usage, right now I'm using a centralized database because it's uncle
FileSystemContext in presto-raptor can now be replaced by HdfsContext given presto-hive-metastore has been separated into a standalone module.
AFAICT they are equivalent. Found a usage of PyObject_str here and it looks like the optimization isn't made in other places where we just do str(x).
I was happy to see that the usage of PyUnicode_Join was unnecessa
-
Updated
Feb 19, 2020 - Java
Description
I install new 6 node cluster. Enable authentication and add 5 nodes through Fauxton.
When I run Verify CouchDB Installation from Fauxton I see an error in Replication check
Error: unauthorized to access or create database http://0.0.0.0:5984/verifytestdb_replicate/
And on one of the node I see an error:
[error] 2019-12-22T16:05:37.312700Z couchdb@s2dfw.domain.net <0.26254.18
The pipeline spec doc say that the input field is required, but they aren't for spouts.
The docs have a great intro that explains the technology buildup to arrive at inventing stream but then it stops without explaining how stream uses Cassandra + Redis (plus celery message queue?) to solve this problem. (For all I know it doesn't.)
As a developer, a quick explanation of how this framework solves the
Can't search fields that can be in both request/response. For example adding content-type to both request and response headers creates a single http.content-type expression and which it actually searches is unknown. Probably should create http.request.content-type and http.response.content-type or something.
Work around for now is
[custom-fields]
http.request.content-type=db:http.reque
Hazelcast currently ships with java.util.logging as the default. Besides badly formatted default
output, it also makes synchronized calls for each log message which incurs some performance cost and might lead to unexpected behaviour (i.e. hiding data races, etc)
Spark 2.3 officially support run on kubernetes. While our guide of "Run on Kubernetes" is still based on a special version of Spark 2.2, which is out of date. We need to:
- update that document to Spark 2.3
- release the corresponding docker images.
-
Updated
Feb 19, 2020 - Java
Today, the Hadoop integration tools for Vespa support Hadoop and Pig for feeding and querying Vespa. The Pig feeder is a thin wrapper around the Vespa HTTP client.
We should support feeding directly from Spark as well, to avoid Spark pipelines having to write
-
Updated
Feb 18, 2020 - Java
-
Updated
Feb 14, 2020 - Python
- 参考文档
1)自定义报表的需求:
简化分析师工作,释放前端生产力---“Type SQL, Get Chart”
CBoard目前的定位和Tableau一样,是一个专业的报表引擎
拖拖拽拽完成交互式分析
.
.
- 开源选型时参考了社区的众多
I have noticed a small error in the documentation around S3 configurations:
https://docs.delta.io/latest/delta-storage.html#amazon-s3
On the read part, it should be load and not save:
spark.read.format("delta").load("s3a://<your-s3-bucket>/<path>/<to>/<delta-table>")
Also, I have successfully tested Delta 0.5.0 with on-premise S3 - https://min.io
There were some quirks around the
Improve this page
Add a description, image, and links to the big-data topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the big-data topic, visit your repo's landing page and select "manage topics."


"Bokeh is a Python interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of novel graphics in the style of D3.js, but also deliver this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easi