feature-engineering

Bug with GPU Model

Currently, while using pruning methods like TaylorFOWeight Pruner, If I use a model on GPU for getting the metrics (as calculated for getting masks), it fails on line while creating masks. The reason why it fails i

In the check_schema_version utility function, there is custom code to determine whether saved schema versions are older or newer than the current schema version. This comparison could likely be simplified significantly by using the packaging library performing the version comparison instead of the custom code.

Current code:

        current = SCHEMA_VERSION.split(".")
        saved = ve

Is your feature request related to a problem? Please describe.

Feast is often hard to install alongside other python packages that use google-cloud-core. Specifically, Feast sets an upper-bound on this library (2.0.0), but the latest version is 2.3.1 and many python packages have a lower-bound of 2.0.0 and above.

Describe the solution you'd like

Remove google-cloud-core fr

I trained models on Windows, then I tried to use them on Linux, however, I could not load them due to an incorrect path joining. During model loading, I got learner_path in the following format experiments_dir/model_1/100_LightGBM\\learner_fold_0.lightgbm. The last two slashes were incorrectly concatenated with the rest part of the path. In this regard, I would suggest adding something like `l

We have an English version about this demo: https://github.com/4paradigm/OpenMLDB/tree/main/demo/talkingdata-adtracking-fraud-detection

Please translate this doc to Chinese, and save it as: docs/zh/use_case/talkingdata.md, please don't forget to update the file docs/zh/use_case/index.rst

when a variable is in a logarithmic scale, it might make sense to create the intervals based on a log scale instead of linear scale.

Quote:
"
When the numbers span multiple magnitudes, it may be better to group by powers of
10 (or powers of any constant): 0–9, 10–99, 100–999, 1000–9999, etc. The bin widths
grow exponentially
"

the idea is taken from: Feature Engineering for Machine Lear

Currently in the get_result_df function, there's no way to specify a temporary folder name. Will be useful if this function can support a parameter like local_folder or something so end user can control where to download those files.

Current version of bucketize uses fixed boundaries. If the user doesn't know these boundaries they need to calculate them using cudf.

We should support splitting continuous variables into buckets based on quantile and uniform splits of the data.

For uniform splits the statistics gathering phase needs to compute the min and max of the column and figure out the boundaries to create N buckets.

Is your feature request related to a problem? Please describe.
The friction to getting the examples up and running is installing the dependencies. A docker container with them already provided would reduce friction for people to get started with Hamilton.

Describe the solution you'd like

A docker container, that has different python virtual environments, that has the dependencies t

In PR #3133, we marked tests to skip if the environment was a Python 3.9 environment. I don't think all the tests that are being skipped need to be skipped anymore. In working through the PolynomialDetrender tests, it was noted that Python 3.9 environments were skipping these tests, probably due to sktime not being compatible with that version of Py

feature-engineering

Here are 1,493 public repositories matching this topic...

microsoft / nni

Bug with GPU Model

EpistasisLab / tpot

alteryx / featuretools

feast-dev / feast

alibaba / Alink

apachecn / fe4ml-zh

mljar / mljar-supervised

ClimbsRocks / auto_ml

4paradigm / OpenMLDB

metarank / metarank

rorysroes / SGX-Full-OrderBook-Tick-Data-Trading-Strategy

DeepWisdom / AutoDL

feature-engine / feature_engine

linkedin / feathr

HouJP / kaggle-quora-question-pairs

Yimeng-Zhang / feature-engineering-and-feature-selection

NVIDIA-Merlin / NVTabular

jeongyoonlee / Kaggler

HunterMcGushion / hyperparameter_hunter

LastAncientOne / Deep-Learning-Machine-Learning-Stock

duxuhao / Feature-Selection

fraunhoferportugal / tsfel

FiboAI / FIBO-Rule

stitchfix / hamilton

winedarksea / AutoTS

alteryx / evalml

aikho / awesome-feature-engineering

alteryx / open_source_demos

firmai / deltapy

minerva-ml / open-solution-home-credit

Improve this page

Add this topic to your repo