Skip to content
#

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

Here are 19,486 public repositories matching this topic...

superset
GregOnEvo
GregOnEvo commented May 11, 2021

Keyboard navigation in the control panel of the Explore view is difficult.

Expected results

You should be able to move focus between adjacent controls in the control panel with a single Tab key press
and visually distinguish what element has focus. You should be able to interact with controls the keyboard
(Enter or space bar for button-like things).

Actual results

Several tab

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

  • Updated May 13, 2021
  • Python
rlan
rlan commented Jun 11, 2021

What is the problem?

After running tune.run, the experiment results are missing from progress.csv but are in result.json.
A possible solution is written by mannyv: https://discuss.ray.io/t/saving-checkpoints-with-good-custom-metric-using-tune-run/2109/12

Ray version and other system information (Python version, TensorFlow version, OS):

Ray version 1.2.0.
Tensorflow 1.15.4.
Python

dash
pytorch-lightning
Queuecumber
Queuecumber commented Jun 10, 2021

🐛 Bug

This is a fairly important bug report that I've been meaning to make for a while.

In general, it is incorrect to try to do testing with a distributed sampler. This is because the distributed sampler is either going to mix in already processed samples or drop samples in order to make the number of batches divide evenly on the number of GPUs.

This is fine when you're doing tra

juergspaak
juergspaak commented Jun 16, 2021

When plotting plt.plot(np.ones(10), np.ones((10,0)) it raises a ZeroDivisionError, which confused me much.

Code for reproduction

import matplotlib.pyplot as plt
import numpy as np

plt.plot(np.ones(10), np.ones((10,0)))

This raises the error:

ZeroDivisionError: integer division or modulo by zero

Expected outcome

I think however, it should either r

gensim
gojomo
gojomo commented Jun 12, 2021

(triggered by SO question: https://stackoverflow.com/questions/67944732/using-my-own-stopword-list-with-gensim-corpora-textcorpus-textcorpus/67951592#67951592)

Gensim has two remove_stopwords() functions with similar, but slightly-different behavior that risks confusing users.

gensim.parsing.preprocessing.remove_stopwords takes a space-delimited string, and always consults the current

danieldeutsch
danieldeutsch commented Jun 2, 2021

Is your feature request related to a problem? Please describe.
I typically used compressed datasets (e.g. gzipped) to save disk space. This works fine with AllenNLP during training because I can write my dataset reader to load the compressed data. However, the predict command opens the file and reads lines for the Predictor. This fails when it tries to load data from my compressed files.

nni