Skip to content
main
Switch branches/tags
Code

Latest commit

* initial commit for TimeSeriesLagTrasnformer class

* add init method and docstring template

* expand init() method

* expand init() and wrote some of the method's docstring

* create check for dateoffset/freq

* add fill_value to init()

* fix init()

* create rename_columns()

* add variables to init() and fix rename_columns()

* edit rename_variables()

* create transform()

* delete lag_time()

* fix bug

* create test_time_series dir and time-series conftest

* create test_time_lag_period_shift()

* fix bug in test

* fix test_check_estimator_encoders bug

* create __init__.py in forecasting directory

* create test_time_lag_frequency_shift_and_ignore_original_data()

* create test_time_lag_fill_value()

* create test_incorrect_periods_during_installation()

* create test_incorrect_axis_during_installation() and test_incorrect_keep_original_during_installation()

* create test_error_when_df_in_transform_is_not_a_dataframe()

* fix bug in init()

* add datetime object check

* fix style errors

* fix style errors on test_time_lag.py

* fix style errors on test_time_lag.py

* wrote notes/questions for next steps

* fix init params

* create fit()

* add docstring and edit code

* revise transform()

* change 'variables' to soley reference numerical variables

* remove 'fill_value' from code base

* fix erros

* edit tests to match changes in class code base

* edit tests to match changes in class code base

* edit tests to match changes in class code base

* create test_raises_error_when_wrong_input_params()

* create test_default_params()

* create test_default_params()

* add transformer.fit() and remove 2 tests due to redundancy

* fix error msg

* wrote TODO notes on test

* add _freq to test_raises_error_when_wrong_input_params()

* fix test_time_lag_period_shift_and_keep_original_data() and test_time_lag_fill_value(). may delete test_time_lag_fill_value()

* fix transform()

* fix transform()

* fix test_time_lag_frequency_shift_and_drop_original_data()

* test_time_lag_periods_drop_original_value()

* edit docstrings

* add boolean test for 'periods' param. otherwise 'periods' inpterprets  value is 0 or 1.

* fix test_raises_error_when_wrong_input_params()

* fix style errors

* fix style errors

* fix type errors

* initial commit for TimeSeriesLagTrasnformer class

* add init method and docstring template

* expand init() method

* expand init() and wrote some of the method's docstring

* create check for dateoffset/freq

* add fill_value to init()

* fix init()

* create rename_columns()

* add variables to init() and fix rename_columns()

* edit rename_variables()

* create transform()

* delete lag_time()

* fix bug

* create test_time_series dir and time-series conftest

* create test_time_lag_period_shift()

* fix bug in test

* create __init__.py in forecasting directory

* create test_time_lag_frequency_shift_and_ignore_original_data()

* create test_time_lag_fill_value()

* create test_incorrect_periods_during_installation()

* create test_incorrect_axis_during_installation() and test_incorrect_keep_original_during_installation()

* create test_error_when_df_in_transform_is_not_a_dataframe()

* fix bug in init()

* add datetime object check

* fix style errors

* fix style errors on test_time_lag.py

* fix style errors on test_time_lag.py

* wrote notes/questions for next steps

* fix init params

* create fit()

* add docstring and edit code

* revise transform()

* change 'variables' to soley reference numerical variables

* remove 'fill_value' from code base

* fix erros

* edit tests to match changes in class code base

* edit tests to match changes in class code base

* edit tests to match changes in class code base

* create test_raises_error_when_wrong_input_params()

* create test_default_params()

* create test_default_params()

* add transformer.fit() and remove 2 tests due to redundancy

* fix error msg

* wrote TODO notes on test

* add _freq to test_raises_error_when_wrong_input_params()

* fix test_time_lag_period_shift_and_keep_original_data() and test_time_lag_fill_value(). may delete test_time_lag_fill_value()

* fix transform()

* fix transform()

* fix test_time_lag_frequency_shift_and_drop_original_data()

* test_time_lag_periods_drop_original_value()

* edit docstrings

* add boolean test for 'periods' param. otherwise 'periods' inpterprets  value is 0 or 1.

* fix test_raises_error_when_wrong_input_params()

* fix style errors

* fix style errors

* fix type errors

* renames files and class

* completes init docstring

* adds default functionality to fit and transform

* updates lag features functionality

* adds check_estimator tests

* passes isort and black

* adds api docs folders, but docs does not build

* change 'freq' type hint in init()

* change class name on test_time_lag

* create test_class_initiation_params() to check all possible element types for init(). periods raised an error when value was None. need to fix

* revise init param - periods - to accept NoneType

* revise init param - periods - to accept NoneType

* add 'periods' constraint requiring the value to be zero or greater

* fix check for periods in init()

* add boolean check for

* fix and extend test_raises_error_when_wrong_input_params()

* create test_get_feature_names_out(). will develope multiple tests using pytest.parameterie

* add check to ensure both 'periods' and 'freq' do not have values

* change error text

* create test_error_if_periods_and_freq_have_values()

* removes extra file, changes wording init

* modifies values periods can take and adds tests

* adds  feature_names_in_ param to fit

* tests get_feature_names_out

* finishes tests lag features

* finishes check_estimator and style fixes

* moved conftest

* adds more false periods inputs, fixes error message

* fixes test feature_names_in and adds test drop_original

* adds todo task

* expands docstrings lagfeatures

* removes unness doscstrings sections and adds typehints

* adds remaining doc files

* finishes user_guide

* reorganizes toctree, adds lag features to index

* fixes wording

* fixes style error

* adds variable catch in get feature names out

* fixes docstring user guide as per kishans

* adds example to work with series

* reorders index in docs

* adds functionality to check index

* updates docstring

* fixes documentation link

* updates docs

* reorders checks for nan and unique values in index

* remove unnecessary comment from test

* removes whitespace

Co-authored-by: Soledad Galli <solegalli@protonmail.com>
c5b542b

Git stats

Files

Permalink
Failed to load latest commit information.

Feature Engine

PythonVersion License https://github.com/feature-engine/feature_engine/blob/master/LICENSE.md PyPI version Conda https://anaconda.org/conda-forge/feature_engine CircleCI https://app.circleci.com/pipelines/github/feature-engine/feature_engine?branch=1.1.X Documentation Status https://feature-engine.readthedocs.io/en/latest/index.html Join the chat at https://gitter.im/feature_engine/community Sponsorship https://www.trainindata.com/ Downloads Downloads DOI DOI

Feature-engine is a Python library with multiple transformers to engineer and select features for use in machine learning models. Feature-engine's transformers follow Scikit-learn's functionality with fit() and transform() methods to learn the transforming parameters from the data and then transform it.

Feature-engine features in the following resources

Blogs about Feature-engine

En Español

Documentation

Current Feature-engine's transformers include functionality for:

  • Missing Data Imputation
  • Categorical Encoding
  • Discretisation
  • Outlier Capping or Removal
  • Variable Transformation
  • Variable Creation
  • Variable Selection
  • Datetime Feature Extraction
  • Preprocessing
  • Scikit-learn Wrappers

Imputation Methods

  • MeanMedianImputer
  • RandomSampleImputer
  • EndTailImputer
  • AddMissingIndicator
  • CategoricalImputer
  • ArbitraryNumberImputer
  • DropMissingData

Encoding Methods

  • OneHotEncoder
  • OrdinalEncoder
  • CountFrequencyEncoder
  • MeanEncoder
  • WoEEncoder
  • PRatioEncoder
  • RareLabelEncoder
  • DecisionTreeEncoder

Discretisation methods

  • EqualFrequencyDiscretiser
  • EqualWidthDiscretiser
  • DecisionTreeDiscretiser
  • ArbitraryDiscreriser

Outlier Handling methods

  • Winsorizer
  • ArbitraryOutlierCapper
  • OutlierTrimmer

Variable Transformation methods

  • LogTransformer
  • LogCpTransformer
  • ReciprocalTransformer
  • PowerTransformer
  • BoxCoxTransformer
  • YeoJohnsonTransformer

Variable Creation:

  • MathematicalCombination
  • CombineWithReferenceFeature
  • CyclicalTransformer

Feature Selection:

  • DropFeatures
  • DropConstantFeatures
  • DropDuplicateFeatures
  • DropCorrelatedFeatures
  • SmartCorrelationSelection
  • ShuffleFeaturesSelector
  • SelectBySingleFeaturePerformance
  • SelectByTargetMeanPerformance
  • RecursiveFeatureElimination
  • RecursiveFeatureAddition
  • DropHighPSIFeatures

Datetime

  • DatetimeFeatures

Preprocessing

  • MatchVariables

Wrappers:

  • SklearnTransformerWrapper

Installation

From PyPI using pip:

pip install feature_engine

From Anaconda:

conda install -c conda-forge feature_engine

Or simply clone it:

git clone https://github.com/feature-engine/feature_engine.git

Example Usage

>>> import pandas as pd
>>> from feature_engine.encoding import RareLabelEncoder

>>> data = {'var_A': ['A'] * 10 + ['B'] * 10 + ['C'] * 2 + ['D'] * 1}
>>> data = pd.DataFrame(data)
>>> data['var_A'].value_counts()
Out[1]:
A    10
B    10
C     2
D     1
Name: var_A, dtype: int64
>>> rare_encoder = RareLabelEncoder(tol=0.10, n_categories=3)
>>> data_encoded = rare_encoder.fit_transform(data)
>>> data_encoded['var_A'].value_counts()
Out[2]:
A       10
B       10
Rare     3
Name: var_A, dtype: int64

Find more examples in our Jupyter Notebook Gallery or in the documentation.

Contribute

Details about how to contribute can be found in the Contribute Page

Briefly:

  • Fork the repo
  • Clone your fork into your local computer: git clone https://github.com/<YOURUSERNAME>/feature_engine.git
  • navigate into the repo folder cd feature_engine
  • Install Feature-engine as a developer: pip install -e .
  • Optional: Create and activate a virtual environment with any tool of choice
  • Install Feature-engine dependencies: pip install -r requirements.txt and pip install -r test_requirements.txt
  • Create a feature branch with a meaningful name for your feature: git checkout -b myfeaturebranch
  • Develop your feature, tests and documentation
  • Make sure the tests pass
  • Make a PR

Thank you!!

Documentation

Feature-engine documentation is built using Sphinx and is hosted on Read the Docs.

To build the documentation make sure you have the dependencies installed: from the root directory: pip install -r docs/requirements.txt.

Now you can build the docs using: sphinx-build -b html docs build

License

BSD 3-Clause

Donate

Sponsor us to support her continue expanding Feature-engine.