streaming-data

Hello.
I've come across what (to me) seems to be a problem with the FILENAME and FILENUM variables.

# mlr --version
Miller v5.6.2

# cat /tmp/csv1
A,B,C
_2GB,255,2
_4GB,120,4
_6GB,50,6
_10GB,10,10

# cat /tmp/csv2
FIRST,SECOND,THIRD,FOURTH
1,2,3,4
5,6,7,8
9,10,11,12
13,14,15,16

# mlr --icsv cat then put 'print FILENAME'   /tmp/csv1 /tmp/csv2
/tmp/csv1
A=_2GB,B=255,C=2
/

We use http_server as input and http_client as one of outputs (for a part of message batch). In case when there is some error coming from http_client, benthos starts to retry this error message indefinitely (#415). However most significant, is that it stops accepting other, normal messages.

Here is the log when I first try to send message which causes http_client to get 500 error, and

@mpenkov

Write a Windows .BAT equivalent of travis_ci_helpers.sh

We need this .BAT file for running integration tests under Appveyor, which is unable to run our existing bash script.

Alternatively, rewrite that script in Python so we can use it under both Travis and Appveyor.

Originally posted by @mpenkov in RaRe-Technologies/smart_open#479 (comment)

Problem description
Documentation for ScalingPolicy.byDataRate does not clearly indicate whether targetKBps uses units of 1000 bytes or 1024 bytes.

Problem location
Pravega client, ScalingPolicy.java

Suggestions for an improvement
Determine the units used and update the documentation.

There are already data policies at ingress for data that arrives "late". We can drop, adjust, or throw when data arrives late, and we can hold data in reserve for a certain period of time to allow some reordering.

However, if a data point arrives "too early" we do not have a way to deal with it currently. For instance, if the current data time is X, and the next data point arrives with a timest

https://streamz.readthedocs.io/en/latest/dask.html

My environment:
WSL / Debian on Windows 10 18362.778
Python 3.7.3
streamz 0.5.3
dask 2.15.0

Running the "Sequential Execution" example:

$ python stream_test.py
distributed.comm.tcp - WARNING - Could not set timeout on TCP stream: [Errno 92] Protocol not available
distributed.comm.tcp - WARNING - Could not set timeout on TCP strea

The descriptions for these sub-projects are missing descriptions in both the root README.md as well as their respective readme files

Hello, I have a CSV file that has 9 features and 9 expected targets, and I want to test 2 regression models on this data (that should be generated as a stream).

When I test the MultiTargetRegressionHoeffdingTree and RegressorChain on this data I get a bad R2-score, but when I tried normalizing my data with scikit-learn I get a pretty good R2-score. The problem is that I should not use sci

Describe the bug
Some utilities project are duplicated between singular (utility) and plural versions (utilities). Let's align to plural versions

To Reproduce
Steps to reproduce the behavior:

Go to Services\DataX.Utilities folder
Notice duplicate folders like 2 cosmosdb utils

Expected behavior
One dll per area

As a follow up of #235 , we would like to extend the getting started templates with an example of a project that uses multiple runtimes (e.g. Akka-streams + Spark or Akka-streams+ Flink).

It should show the multi-project setup in the build and the suggested directory organization per runtime.

Learning methods should detect that the provided DAG contains variables with no attributes associated (because it is randonmly generated) and does not match the attributes of the provided data.

Potential users are confused about how Euphoria compares to Apache Beam and what its feature set is. Please create a page in the wiki describing the set of supported features (maybe along the lines of https://beam.apache.org/documentation/runners/capability-matrix/) and the set of feature not supported compared to Beam.

Contributes to #21.

#757 had completed the filter infra for all APIs and added enough tests for most APIs. But, as title, some trivial APIs are ignored.

It would be good to mention why is Lithium chosen as the akka split brain resolver and maybe explain it's configuration and behavior in the cluster or architecture documentation.
How well tested is it? Can it be swapped out with other SBRs? What's it's configuration and how to override (if it makes sense for the NSDb use case).

As per #98, in #102 we added proper testing of (almost) all examples in the docs (more land in #99 too).

One of the issues is that some of these examples are also re-implemented in the test suite.
The duplication between examples in examples/ and what's in the tests is hard to avoid currently since the test currently take some of the documented examples and parameterize them with things li

streaming-data

Here are 215 public repositories matching this topic...

onurakpolat / awesome-bigdata

johnkerl / miller

Jeffail / benthos

RaRe-Technologies / smart_open

pravega / pravega

microsoft / Trill

python-streamz / streamz

Stratio / sparta

joshday / OnlineStats.jl

reugn / go-streams

swimos / swim

infoslack / awesome-kafka

scikit-multiflow / scikit-multiflow

microsoft / data-accelerator

lightbend / cloudflow

kLabUM / rrcf

bbejeck / kafka-streams-in-action

ast-al / rangeless

Sinotrade / Shioaji

amidst / toolbox

GridProtectionAlliance / gsf

axway-amplify-streams / axway-amplify-streams-js

GridProtectionAlliance / openPDC

Chulong-Li / Real-time-Sentiment-Tracking-on-Twitter-for-Brand-Improvement-and-Trend-Recognition

seznam / euphoria

oharastream / ohara

evadne / packmatic

keithknott26 / datadash

radicalbit / NSDb

goodboy / tractor

Improve this page

Add this topic to your repo