Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

@fchollet

(e.g. for links and images), because some of these examples are now being rendered in the docs.

Added by @fchollet in requests for contributions.

The documentation for sklearn.feature_selection.RFE is something wrong.
Some of the method descriptions are missing the end.

https://github.com/scikit-learn/scikit-learn/blob/15a949460dbf19e5e196b8ef48f9712b72a3b3c3/sklearn/feature_selection/_rfe.py#L172-L184

<img width="1059" alt="スクリーンショット 2021-06-16 3 05 56" src="https://user-images.githubusercontent.com/19372617/122103149-4037be80-c

Keyboard navigation in the control panel of the Explore view is difficult.

Expected results

You should be able to move focus between adjacent controls in the control panel with a single Tab key press
and visually distinguish what element has focus. You should be able to interact with controls the keyboard
(Enter or space bar for button-like things).

Actual results

Several tab

What is the problem?

After running tune.run, the experiment results are missing from progress.csv but are in result.json.
A possible solution is written by mannyv: https://discuss.ray.io/t/saving-checkpoints-with-good-custom-metric-using-tune-run/2109/12

Ray version and other system information (Python version, TensorFlow version, OS):

Ray version 1.2.0.
Tensorflow 1.15.4.
Python

Summary

The grayish background oval indicating a selected st.radio label has too much padding on the right hand side by a few pixels. Here's an example:

(Notice how the background rounded rectangle extends further to the right past "Notion" than it does to the left of the sel

Steps to reproduce

run %autocall random

Expected result

ERROR:root:Valid modes: (0->Off, 1->Smart, 2->Full

Observed result

ValueError was raised due to parsing the argument "random" as an integer.

System info

Manjaro Linux, Python 3.9.1, IPython 7.22.0.

In recent versions (can't say from exactly when), there seems to be an off-by-one error in dcc.DatePickerRange. I set max_date_allowed = datetime.today().date(), but in the calendar, yesterday is the maximum date allowed. I see it in my apps, and it is also present in the first example on the DatePickerRange documentation page.

E

🐛 Bug

This is a fairly important bug report that I've been meaning to make for a while.

In general, it is incorrect to try to do testing with a distributed sampler. This is because the distributed sampler is either going to mix in already processed samples or drop samples in order to make the number of batches divide evenly on the number of GPUs.

This is fine when you're doing tra

When plotting plt.plot(np.ones(10), np.ones((10,0)) it raises a ZeroDivisionError, which confused me much.

Code for reproduction

import matplotlib.pyplot as plt
import numpy as np

plt.plot(np.ones(10), np.ones((10,0)))

This raises the error:

ZeroDivisionError: integer division or modulo by zero

Expected outcome

I think however, it should either r

(triggered by SO question: https://stackoverflow.com/questions/67944732/using-my-own-stopword-list-with-gensim-corpora-textcorpus-textcorpus/67951592#67951592)

Gensim has two remove_stopwords() functions with similar, but slightly-different behavior that risks confusing users.

gensim.parsing.preprocessing.remove_stopwords takes a space-delimited string, and always consults the current

Is your feature request related to a problem? Please describe.
I typically used compressed datasets (e.g. gzipped) to save disk space. This works fine with AllenNLP during training because I can write my dataset reader to load the compressed data. However, the predict command opens the file and reads lines for the Predictor. This fails when it tries to load data from my compressed files.

Data Science

Here are 19,486 public repositories matching this topic...

keras-team / keras

scikit-learn / scikit-learn

apache / superset

Expected results

Actual results

GokuMohandas / MadeWithML

CamDavidsonPilon / Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

donnemartin / data-science-ipython-notebooks

explosion / spaCy

eriklindernoren / ML-From-Scratch

ray-project / ray

What is the problem?

academic / awesome-datascience

streamlit / streamlit

Summary

ipython / ipython

Steps to reproduce

Expected result

Observed result

System info

plotly / dash

PyTorchLightning / pytorch-lightning

🐛 Bug

matplotlib / matplotlib

virgili0 / Virgilio

AMAI-GmbH / AI-Expert-Roadmap

fastai / fastbook

afshinea / stanford-cs-229-machine-learning

RaRe-Technologies / gensim

bharathgs / Awesome-pytorch-list

rasbt / python-machine-learning-book

hangtwenty / dive-into-machine-learning

eugeneyan / applied-ml

microsoft / recommenders

allenai / allennlp

d2l-ai / d2l-en

0xnr / awesome-bigdata

microsoft / nni

tflearn / tflearn

Related Topics