
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/ensemble/plot_feature_transformation.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_examples_ensemble_plot_feature_transformation.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_ensemble_plot_feature_transformation.py:


===============================================
Feature transformations with ensembles of trees
===============================================

Transform your features into a higher dimensional, sparse space. Then train a
linear model on these features.

First fit an ensemble of trees (totally random trees, a random forest, or
gradient boosted trees) on the training set. Then each leaf of each tree in the
ensemble is assigned a fixed arbitrary feature index in a new feature space.
These leaf indices are then encoded in a one-hot fashion.

Each sample goes through the decisions of each tree of the ensemble and ends up
in one leaf per tree. The sample is encoded by setting feature values for these
leaves to 1 and the other feature values to 0.

The resulting transformer has then learned a supervised, sparse,
high-dimensional categorical embedding of the data.

.. GENERATED FROM PYTHON SOURCE LINES 22-32

.. code-block:: default



    # Author: Tim Head <betatim@gmail.com>
    #
    # License: BSD 3 clause

    from sklearn import set_config

    set_config(display="diagram")








.. GENERATED FROM PYTHON SOURCE LINES 33-42

First, we will create a large dataset and split it into three sets:

- a set to train the ensemble methods which are later used to as a feature
  engineering transformer;
- a set to train the linear model;
- a set to test the linear model.

It is important to split the data in such way to avoid overfitting by leaking
data.

.. GENERATED FROM PYTHON SOURCE LINES 42-55

.. code-block:: default


    from sklearn.datasets import make_classification
    from sklearn.model_selection import train_test_split

    X, y = make_classification(n_samples=80000, random_state=10)

    X_full_train, X_test, y_full_train, y_test = train_test_split(
        X, y, test_size=0.5, random_state=10
    )
    X_train_ensemble, X_train_linear, y_train_ensemble, y_train_linear = train_test_split(
        X_full_train, y_full_train, test_size=0.5, random_state=10
    )








.. GENERATED FROM PYTHON SOURCE LINES 56-58

For each of the ensemble methods, we will use 10 estimators and a maximum
depth of 3 levels.

.. GENERATED FROM PYTHON SOURCE LINES 58-62

.. code-block:: default


    n_estimators = 10
    max_depth = 3








.. GENERATED FROM PYTHON SOURCE LINES 63-65

First, we will start by training the random forest and gradient boosting on
the separated training set

.. GENERATED FROM PYTHON SOURCE LINES 65-78

.. code-block:: default


    from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier

    random_forest = RandomForestClassifier(
        n_estimators=n_estimators, max_depth=max_depth, random_state=10
    )
    random_forest.fit(X_train_ensemble, y_train_ensemble)

    gradient_boosting = GradientBoostingClassifier(
        n_estimators=n_estimators, max_depth=max_depth, random_state=10
    )
    _ = gradient_boosting.fit(X_train_ensemble, y_train_ensemble)








.. GENERATED FROM PYTHON SOURCE LINES 79-81

The :class:`~sklearn.ensemble.RandomTreesEmbedding` is an unsupervised method
and thus does not required to be trained independently.

.. GENERATED FROM PYTHON SOURCE LINES 81-88

.. code-block:: default


    from sklearn.ensemble import RandomTreesEmbedding

    random_tree_embedding = RandomTreesEmbedding(
        n_estimators=n_estimators, max_depth=max_depth, random_state=0
    )








.. GENERATED FROM PYTHON SOURCE LINES 89-94

Now, we will create three pipelines that will use the above embedding as
a preprocessing stage.

The random trees embedding can be directly pipelined with the logistic
regression because it is a standard scikit-learn transformer.

.. GENERATED FROM PYTHON SOURCE LINES 94-101

.. code-block:: default


    from sklearn.linear_model import LogisticRegression
    from sklearn.pipeline import make_pipeline

    rt_model = make_pipeline(random_tree_embedding, LogisticRegression(max_iter=1000))
    rt_model.fit(X_train_linear, y_train_linear)






.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <style>#sk-6a47235c-3c86-4cb4-b192-74317bb04308 {color: black;background-color: white;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 pre{padding: 0;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 div.sk-toggleable {background-color: white;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 div.sk-estimator:hover {background-color: #d4ebff;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 div.sk-parallel-item::after {content: "";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 div.sk-serial::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 div.sk-item {z-index: 1;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 div.sk-parallel::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 div.sk-parallel-item {display: flex;flex-direction: column;position: relative;background-color: white;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 div.sk-parallel-item:only-child::after {width: 0;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;position: relative;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 div.sk-label label {font-family: monospace;font-weight: bold;background-color: white;display: inline-block;line-height: 1.2em;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 div.sk-label-container {position: relative;z-index: 2;text-align: center;}#sk-6a47235c-3c86-4cb4-b192-74317bb04308 div.sk-container {display: inline-block;position: relative;}</style><div id="sk-6a47235c-3c86-4cb4-b192-74317bb04308" class"sk-top-container"><div class="sk-container"><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="f167c740-7ad9-4029-8245-2f3ab40dcdd6" type="checkbox" ><label class="sk-toggleable__label" for="f167c740-7ad9-4029-8245-2f3ab40dcdd6">Pipeline</label><div class="sk-toggleable__content"><pre>Pipeline(steps=[('randomtreesembedding',
                     RandomTreesEmbedding(max_depth=3, n_estimators=10,
                                          random_state=0)),
                    ('logisticregression', LogisticRegression(max_iter=1000))])</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="eff453d0-e44f-4c7f-84a9-a64ee64b7511" type="checkbox" ><label class="sk-toggleable__label" for="eff453d0-e44f-4c7f-84a9-a64ee64b7511">RandomTreesEmbedding</label><div class="sk-toggleable__content"><pre>RandomTreesEmbedding(max_depth=3, n_estimators=10, random_state=0)</pre></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="3f16208a-eba0-4903-abcc-7b51077b58e1" type="checkbox" ><label class="sk-toggleable__label" for="3f16208a-eba0-4903-abcc-7b51077b58e1">LogisticRegression</label><div class="sk-toggleable__content"><pre>LogisticRegression(max_iter=1000)</pre></div></div></div></div></div></div></div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 102-106

Then, we can pipeline random forest or gradient boosting with a logistic
regression. However, the feature transformation will happen by calling the
method `apply`. The pipeline in scikit-learn expects a call to `transform`.
Therefore, we wrapped the call to `apply` within a `FunctionTransformer`.

.. GENERATED FROM PYTHON SOURCE LINES 106-125

.. code-block:: default


    from sklearn.preprocessing import FunctionTransformer
    from sklearn.preprocessing import OneHotEncoder


    def rf_apply(X, model):
        return model.apply(X)


    rf_leaves_yielder = FunctionTransformer(rf_apply, kw_args={"model": random_forest})

    rf_model = make_pipeline(
        rf_leaves_yielder,
        OneHotEncoder(handle_unknown="ignore"),
        LogisticRegression(max_iter=1000),
    )
    rf_model.fit(X_train_linear, y_train_linear)







.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <style>#sk-226fb7b0-8b52-4107-8001-9b97626caf35 {color: black;background-color: white;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 pre{padding: 0;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 div.sk-toggleable {background-color: white;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 div.sk-estimator:hover {background-color: #d4ebff;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 div.sk-parallel-item::after {content: "";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 div.sk-serial::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 div.sk-item {z-index: 1;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 div.sk-parallel::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 div.sk-parallel-item {display: flex;flex-direction: column;position: relative;background-color: white;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 div.sk-parallel-item:only-child::after {width: 0;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;position: relative;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 div.sk-label label {font-family: monospace;font-weight: bold;background-color: white;display: inline-block;line-height: 1.2em;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 div.sk-label-container {position: relative;z-index: 2;text-align: center;}#sk-226fb7b0-8b52-4107-8001-9b97626caf35 div.sk-container {display: inline-block;position: relative;}</style><div id="sk-226fb7b0-8b52-4107-8001-9b97626caf35" class"sk-top-container"><div class="sk-container"><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="43d4ff8b-0387-43d3-8d11-0a74548f29d3" type="checkbox" ><label class="sk-toggleable__label" for="43d4ff8b-0387-43d3-8d11-0a74548f29d3">Pipeline</label><div class="sk-toggleable__content"><pre>Pipeline(steps=[('functiontransformer',
                     FunctionTransformer(func=<function rf_apply at 0x7f0c971fc790>,
                                         kw_args={'model': RandomForestClassifier(max_depth=3,
                                                                                  n_estimators=10,
                                                                                  random_state=10)})),
                    ('onehotencoder', OneHotEncoder(handle_unknown='ignore')),
                    ('logisticregression', LogisticRegression(max_iter=1000))])</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="3ba7ff37-579b-4adb-852e-9ebafb71ae73" type="checkbox" ><label class="sk-toggleable__label" for="3ba7ff37-579b-4adb-852e-9ebafb71ae73">FunctionTransformer</label><div class="sk-toggleable__content"><pre>FunctionTransformer(func=<function rf_apply at 0x7f0c971fc790>,
                        kw_args={'model': RandomForestClassifier(max_depth=3,
                                                                 n_estimators=10,
                                                                 random_state=10)})</pre></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="3812da0f-9afc-47fb-ac40-45e3a338f707" type="checkbox" ><label class="sk-toggleable__label" for="3812da0f-9afc-47fb-ac40-45e3a338f707">OneHotEncoder</label><div class="sk-toggleable__content"><pre>OneHotEncoder(handle_unknown='ignore')</pre></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="a8b18ab1-e543-4990-9c85-24d8771b493e" type="checkbox" ><label class="sk-toggleable__label" for="a8b18ab1-e543-4990-9c85-24d8771b493e">LogisticRegression</label><div class="sk-toggleable__content"><pre>LogisticRegression(max_iter=1000)</pre></div></div></div></div></div></div></div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 126-141

.. code-block:: default

    def gbdt_apply(X, model):
        return model.apply(X)[:, :, 0]


    gbdt_leaves_yielder = FunctionTransformer(
        gbdt_apply, kw_args={"model": gradient_boosting}
    )

    gbdt_model = make_pipeline(
        gbdt_leaves_yielder,
        OneHotEncoder(handle_unknown="ignore"),
        LogisticRegression(max_iter=1000),
    )
    gbdt_model.fit(X_train_linear, y_train_linear)






.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <style>#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d {color: black;background-color: white;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d pre{padding: 0;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d div.sk-toggleable {background-color: white;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d div.sk-estimator:hover {background-color: #d4ebff;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d div.sk-parallel-item::after {content: "";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d div.sk-serial::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d div.sk-item {z-index: 1;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d div.sk-parallel::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d div.sk-parallel-item {display: flex;flex-direction: column;position: relative;background-color: white;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d div.sk-parallel-item:only-child::after {width: 0;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;position: relative;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d div.sk-label label {font-family: monospace;font-weight: bold;background-color: white;display: inline-block;line-height: 1.2em;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d div.sk-label-container {position: relative;z-index: 2;text-align: center;}#sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d div.sk-container {display: inline-block;position: relative;}</style><div id="sk-707aa21a-0e93-4b0d-bde3-54ebc2f7934d" class"sk-top-container"><div class="sk-container"><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="c84f4cd0-b310-4633-b8d7-f07c304c95b4" type="checkbox" ><label class="sk-toggleable__label" for="c84f4cd0-b310-4633-b8d7-f07c304c95b4">Pipeline</label><div class="sk-toggleable__content"><pre>Pipeline(steps=[('functiontransformer',
                     FunctionTransformer(func=<function gbdt_apply at 0x7f0c97ca6d30>,
                                         kw_args={'model': GradientBoostingClassifier(n_estimators=10,
                                                                                      random_state=10)})),
                    ('onehotencoder', OneHotEncoder(handle_unknown='ignore')),
                    ('logisticregression', LogisticRegression(max_iter=1000))])</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="d2063826-8e4a-42c0-b89c-03b2222f2968" type="checkbox" ><label class="sk-toggleable__label" for="d2063826-8e4a-42c0-b89c-03b2222f2968">FunctionTransformer</label><div class="sk-toggleable__content"><pre>FunctionTransformer(func=<function gbdt_apply at 0x7f0c97ca6d30>,
                        kw_args={'model': GradientBoostingClassifier(n_estimators=10,
                                                                     random_state=10)})</pre></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="87f34630-c192-45ac-927d-588bdfd5882d" type="checkbox" ><label class="sk-toggleable__label" for="87f34630-c192-45ac-927d-588bdfd5882d">OneHotEncoder</label><div class="sk-toggleable__content"><pre>OneHotEncoder(handle_unknown='ignore')</pre></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="9d1f3877-7ca2-45dd-9ded-1831b14d4ddc" type="checkbox" ><label class="sk-toggleable__label" for="9d1f3877-7ca2-45dd-9ded-1831b14d4ddc">LogisticRegression</label><div class="sk-toggleable__content"><pre>LogisticRegression(max_iter=1000)</pre></div></div></div></div></div></div></div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 142-143

We can finally show the different ROC curves for all the models.

.. GENERATED FROM PYTHON SOURCE LINES 143-164

.. code-block:: default


    import matplotlib.pyplot as plt
    from sklearn.metrics import RocCurveDisplay

    fig, ax = plt.subplots()

    models = [
        ("RT embedding -> LR", rt_model),
        ("RF", random_forest),
        ("RF embedding -> LR", rf_model),
        ("GBDT", gradient_boosting),
        ("GBDT embedding -> LR", gbdt_model),
    ]

    model_displays = {}
    for name, pipeline in models:
        model_displays[name] = RocCurveDisplay.from_estimator(
            pipeline, X_test, y_test, ax=ax, name=name
        )
    _ = ax.set_title("ROC curve")




.. image-sg:: /auto_examples/ensemble/images/sphx_glr_plot_feature_transformation_001.png
   :alt: ROC curve
   :srcset: /auto_examples/ensemble/images/sphx_glr_plot_feature_transformation_001.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 165-172

.. code-block:: default

    fig, ax = plt.subplots()
    for name, pipeline in models:
        model_displays[name].plot(ax=ax)

    ax.set_xlim(0, 0.2)
    ax.set_ylim(0.8, 1)
    _ = ax.set_title("ROC curve (zoomed in at top left)")



.. image-sg:: /auto_examples/ensemble/images/sphx_glr_plot_feature_transformation_002.png
   :alt: ROC curve (zoomed in at top left)
   :srcset: /auto_examples/ensemble/images/sphx_glr_plot_feature_transformation_002.png
   :class: sphx-glr-single-img






.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  2.703 seconds)


.. _sphx_glr_download_auto_examples_ensemble_plot_feature_transformation.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example



  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: plot_feature_transformation.py <plot_feature_transformation.py>`



  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: plot_feature_transformation.ipynb <plot_feature_transformation.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
