Skip to content
#

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

Here are 16,906 public repositories matching this topic...

ogrisel
ogrisel commented Nov 13, 2020

Most functions in scipy.linalg functions (e.g. svd, qr, eig, eigh, pinv, pinv2 ...) have a default kwarg check_finite=True that we typically leave to the default value in scikit-learn.

As we already validate the input data for most estimators in scikit-learn, this check is redundant and can cause significant overhead, especially at predict / transform time. We should probably a

superset

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

  • Updated Dec 21, 2020
  • Python
barakmich
barakmich commented Jan 14, 2021

Describe your feature request

I've been bitten by this at least twice now and it delays PRs.

CI runs an extra lint step (or at least with different arguments) than scripts/format.sh (or scripts/format.sh --all).
It also runs ./scripts/check-git-clang-format-output.sh -- and why this is different than the clang-format run in --all is unclear to me.

This is most notable in linti

dash
gensim
pytorch-lightning
nni