Skip to content
#

feature-engineering

Here are 1,493 public repositories matching this topic...

nni
featuretools
thehomebrewnerd
thehomebrewnerd commented Jun 28, 2022

In the check_schema_version utility function, there is custom code to determine whether saved schema versions are older or newer than the current schema version. This comparison could likely be simplified significantly by using the packaging library performing the version comparison instead of the custom code.

Current code:

        current = SCHEMA_VERSION.split(".")
        saved = ve
good first issue
chhabrakadabra
chhabrakadabra commented Jun 30, 2022

Is your feature request related to a problem? Please describe.

Feast is often hard to install alongside other python packages that use google-cloud-core. Specifically, Feast sets an upper-bound on this library (2.0.0), but the latest version is 2.3.1 and many python packages have a lower-bound of 2.0.0 and above.

Describe the solution you'd like

Remove google-cloud-core fr

kind/feature good first issue Community Contribution Needed
mljar-supervised
ViacheslavDanilov
ViacheslavDanilov commented May 19, 2022

I trained models on Windows, then I tried to use them on Linux, however, I could not load them due to an incorrect path joining. During model loading, I got learner_path in the following format experiments_dir/model_1/100_LightGBM\\learner_fold_0.lightgbm. The last two slashes were incorrectly concatenated with the rest part of the path. In this regard, I would suggest adding something like `l

bug help wanted good first issue
feature_engine
solegalli
solegalli commented May 7, 2022

when a variable is in a logarithmic scale, it might make sense to create the intervals based on a log scale instead of linear scale.

Quote:
"
When the numbers span multiple magnitudes, it may be better to group by powers of
10 (or powers of any constant): 0–9, 10–99, 100–999, 1000–9999, etc. The bin widths
grow exponentially
"

the idea is taken from: Feature Engineering for Machine Lear

new transformer good first issue easy
EvenOldridge
EvenOldridge commented Jun 8, 2021

Current version of bucketize uses fixed boundaries. If the user doesn't know these boundaries they need to calculate them using cudf.

We should support splitting continuous variables into buckets based on quantile and uniform splits of the data.

For uniform splits the statistics gathering phase needs to compute the min and max of the column and figure out the boundaries to create N buckets.

enhancement good first issue

FIBO Rule - 实时AI智能决策引擎、规则引擎、风控引擎、数据流引擎。 通过可视化界面进行规则配置,无需繁琐开发,节约人力,提升效率,实时监控,减少错误率,随时调整; 支持规则集、评分卡、决策树,名单库管理、机器学习模型、三方数据接入、定制化开发等;

  • Updated Jun 27, 2022
  • Java
skrawcz
skrawcz commented May 11, 2022

Is your feature request related to a problem? Please describe.
The friction to getting the examples up and running is installing the dependencies. A docker container with them already provided would reduce friction for people to get started with Hamilton.

Describe the solution you'd like

  1. A docker container, that has different python virtual environments, that has the dependencies t
documentation good first issue help wanted
evalml
chukarsten
chukarsten commented Jul 3, 2022

In PR #3133, we marked tests to skip if the environment was a Python 3.9 environment. I don't think all the tests that are being skipped need to be skipped anymore. In working through the PolynomialDetrender tests, it was noted that Python 3.9 environments were skipping these tests, probably due to sktime not being compatible with that version of Py

good first issue

Improve this page

Add a description, image, and links to the feature-engineering topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the feature-engineering topic, visit your repo's landing page and select "manage topics."

Learn more